Re: [PATCH 1/3] Use reliable sem_wait interruption in nptl/tst-sem6.

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: "Ondřej Bílka" <neleai@seznam.cz>
To: Torvald Riegel <triegel@redhat.com>
Cc: GLIBC Devel <libc-alpha@sourceware.org>
Subject: Re: [PATCH 1/3] Use reliable sem_wait interruption in nptl/tst-sem6.
Date: Tue, 09 Dec 2014 16:50:00 -0000	[thread overview]
Message-ID: <20141209165033.GA20499@domone> (raw)
In-Reply-To: <1418120160.25868.132.camel@triegel.csb>

On Tue, Dec 09, 2014 at 11:16:00AM +0100, Torvald Riegel wrote:
> On Mon, 2014-12-08 at 23:28 +0100, OndÅ™ej BÃlka wrote:
> > On Mon, Dec 08, 2014 at 12:43:17PM +0100, Torvald Riegel wrote:
> > > On Sat, 2014-12-06 at 14:50 +0100, OndÅ™ej BÃlka wrote:
> > > > On Fri, Dec 05, 2014 at 07:52:57PM +0100, Torvald Riegel wrote:
> > > > >    alarm (1);
> > > > >  
> > > > >    int res = sem_wait (&s);
> > > > > -  if (res == 0)
> > > > > -    {
> > > > > -      puts ("wait succeeded");
> > > > > -      return 1;
> > > > > -    }
> > > > > -  if (res != -1 || errno != EINTR)
> > > > > +  /* We accept all allowed behavior: Implementations that return EINTR and
> > > > > +     those that rely on correct interruption through signals to use sem_post
> > > > > +     in the signal handler.  */
> > > > > +  if (res != 0 && !(res == -1 && errno == EINTR))
> > > > >      {
> > > > > -      puts ("wait didn't fail with EINTR");
> > > > > +      puts ("wait neiter succeeded nor failed with EINTR");
> > > > >        return 1;
> > > > >      }
> > > > >  
> > > > 
> > > > That does change test logic as it originally failed when wait succeeded.
> > > 
> > > Yes, but why do you think that this is inconsistent?  The previous test
> > > didn't add a token in the signal handler, so if wait succeeded, then the
> > > test should fail.
> > > 
> > > However, the correct way to interrupt the semaphore with a signal is to
> > > add a token.  My patch does that.  Second, if we do not want to make
> > > timing assumptions (which the existing test would do if we add a token
> > > to the semaphore in the signal handler), then we need to accept that the
> > > (first) signal handler execution might happen before sem_wait actually
> > > executes.  Therefore, we must not fail in this case.
> > > 
> > > We have to correctly interrupt with signals because as the futex
> > > documentation stands (allowing return of EINTR on spurious wake-ups),
> > > there's no way to implement the specific behavior of the existing
> > > implementation (which assumes EINTR is returned *only* on signals).
> > > 
> > > IOW, the existing test does white-box testing with timing assumptions;
> > > with this patch, we do make a slightly different black-box test with no
> > > timing assumptions.
> > 
> > Which misses point of test which is to find bugs.
> 
> Please read the POSIX spec.  It allows both outcomes, and without timing
> assumptions etc., we can't drive executions towards just one of the
> outcomes.
>
Which does not answer my objection. What extra bugs could this test catch,
compared to say tst-sem2? If there is no such bug you could just delete
that file.


> > What if in new fooarch 
> > assembly one forget to return -1 on interrupt which breaks user application 
> > which will assume that semaphore is locked,
> 
> I can strengthen the test on res==0, checking whether there is no token
> left.  I don't think it buys us much though.
> 
> Another option would be to disallow failure; however, this only works if
> sem_assume_only_signals_cause_futex_EINTR remains 0 and not set to a
> different value by a distribution (see Patch 3/3).
> 
> > one can deal with rare false 
> > positives in tests.
> 
> I think you'd be creating false negatives with what you have in mind.
> The false positive would be not failing when 0 is returned, incorrectly,
> ie your example.
>
Depends whats your default. Usually false positive means that test shows
there is disease/bug but not in reality.
 
> False negatives are a pain.  You can deal with them of course, but it
> stands in the way of doing continuous integration and such.  what we de
> facto do is just ignore all tests that we know can have false negatives,
> which doesn't make the test useful at all.
> 
> > One does not need justify this fact by forward progress/fairness
> > assumptions. Just assumption that OS which keeps all CPU idle for one
> > second while there is suitable task is simply broken.
> 
> Well, that *is* a timing assumption.  There is nothing broken about this
> in general.  Remember that we do have tests failing now and then just
> because of that.  Which is awful for testing.
> 
Without that assumption there is no guarantee that it will take more
than week to run test suite. That would make it useless.

As failing tests are concerned what is low hanging fruit? It is better
to first fix tests that fail with less load than this one and if one of these
is not worth fixing this is not either.

> > It would be better
> > to add serialization so no other test runs in parallel with this.
> 
> You can't prevent other load on the machine in general.
> 
Its not general case, its when you run test suite, its tradeoff between
how many bugs it can detect and needed mainteinance.


> Also, please review the actual background for this change.  See patch
> 3/3 for that.  The change to the test is not ideal, but please see the
> trade-offs.

The rarity of problem bugged me, as to trigger that behaviour one would
need run also highly parallel program so it looked unlikely that anybody
would report that. As it couldn't detect some bugs it previously could
its hard to see what is better.

next prev parent reply	other threads:[~2014-12-09 16:50 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-05 18:37 [PATCH 0/3] Fix semaphore destruction (BZ #12674) Torvald Riegel
2014-12-05 18:53 ` [PATCH 1/3] Use reliable sem_wait interruption in nptl/tst-sem6 Torvald Riegel
2014-12-06 13:50   ` Ondřej Bílka
2014-12-08 11:43     ` Torvald Riegel
2014-12-08 22:29       ` Ondřej Bílka
2014-12-09 10:16         ` Torvald Riegel
2014-12-09 16:50           ` Ondřej Bílka [this message]
2014-12-09 18:25             ` Torvald Riegel
2014-12-09 18:37               ` Rich Felker
2014-12-09 19:47                 ` Torvald Riegel
2014-12-09 20:19                   ` Rich Felker
2014-12-10  9:34                     ` Torvald Riegel
2014-12-05 19:03 ` [PATCH 2/3] Fix nptl/tst-sem4: always start with a fresh semaphore Torvald Riegel
2014-12-06 13:55   ` Ondřej Bílka
2014-12-05 19:24 ` [PATCH 3/3] Update to new generic semaphore algorithm Torvald Riegel
2014-12-15 21:59   ` Torvald Riegel
2018-05-08 14:00   ` Andreas Schwab
2018-05-08 14:22     ` Torvald Riegel
2014-12-05 20:30 ` [PATCH 0/3] Fix semaphore destruction (BZ #12674) Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141209165033.GA20499@domone \
    --to=neleai@seznam.cz \
    --cc=libc-alpha@sourceware.org \
    --cc=triegel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).