public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] libstdc++: testsuite: Address random failure in pthread_create() [PR54185]
@ 2020-08-13 22:15 Lewis Hyatt
  2020-08-18  8:43 ` Jonathan Wakely
  0 siblings, 1 reply; 4+ messages in thread
From: Lewis Hyatt @ 2020-08-13 22:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: libstdc++

[-- Attachment #1: Type: text/plain, Size: 544 bytes --]

Hello-

The attached patch was discussed briefly on PR 54185 here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54185#c14
The test case for this PR sometimes fails due to random failures in
pthread_create() that are not related to the original PR. This patch fixes
it up by ignoring those failures. The test case was designed to repeat the
same test 1000 times to attempt to reproduce a race condition, so I think is
OK if some of those iterations are simply skipped.

Thanks for taking a look at it; I can commit it if it makes sense.

-Lewis

[-- Attachment #2: pr54185_test.txt --]
[-- Type: text/plain, Size: 1954 bytes --]

libstdc++: testsuite: Address random failure in pthread_create() [PR54185]

The test for this PR calls pthread_create() many times in a row, which may fail
with EAGAIN sometimes. Avoid generating a test failure in this case.

libstdc++-v3/ChangeLog:

	PR libstdc++/54185
	* testsuite/30_threads/condition_variable/54185.cc: Make test robust
	to random pthread_create() failures.

diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
index ea0d5bb8740..cbd21e11e57 100644
--- a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
@@ -31,19 +31,25 @@
 std::condition_variable* cond = nullptr;
 std::mutex mx;
 int started = 0;
+bool notified = false;
 int constexpr NUM_THREADS = 10;
 
+static void finalize_cond()
+{
+  /* Lock should be held when calling this.  */
+  notified = true;
+  cond->notify_all();
+  delete cond;
+  cond = nullptr;
+}
+
 void do_thread_a()
 {
   std::unique_lock<std::mutex> lock(mx);
   if(++started >= NUM_THREADS)
-  {
-    cond->notify_all();
-    delete cond;
-    cond = nullptr;
-  }
+    finalize_cond();
   else
-    cond->wait(lock);
+    cond->wait(lock, [] { return notified; });
 }
 
 int main(){
@@ -51,11 +57,24 @@ int main(){
   for(int j = 0; j < 1000; ++j)
   {
     started = 0;
+    notified = false;
     cond = new std::condition_variable;
     for (int i = 0; i < NUM_THREADS; ++i)
-      vec.emplace_back(&do_thread_a);
-    for (int i = 0; i < NUM_THREADS; ++i)
-      vec[i].join();
+      {
+	try
+	  {
+	    vec.emplace_back(&do_thread_a);
+	  }
+	catch(const std::system_error&)
+	  {
+	    /* Thread creation may fail due to resource limits; just move on.  */
+	    std::unique_lock<std::mutex> lock(mx);
+	    finalize_cond();
+	    break;
+	  }
+      }
+    for(auto& thread: vec)
+      thread.join();
     vec.clear();
   }
 }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] libstdc++: testsuite: Address random failure in pthread_create() [PR54185]
  2020-08-13 22:15 [PATCH] libstdc++: testsuite: Address random failure in pthread_create() [PR54185] Lewis Hyatt
@ 2020-08-18  8:43 ` Jonathan Wakely
  2020-08-18 15:20   ` Lewis Hyatt
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Wakely @ 2020-08-18  8:43 UTC (permalink / raw)
  To: Lewis Hyatt; +Cc: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 1804 bytes --]

On 13/08/20 18:15 -0400, Lewis Hyatt via Libstdc++ wrote:
>Hello-
>
>The attached patch was discussed briefly on PR 54185 here:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54185#c14
>The test case for this PR sometimes fails due to random failures in
>pthread_create() that are not related to the original PR. This patch fixes
>it up by ignoring those failures. The test case was designed to repeat the
>same test 1000 times to attempt to reproduce a race condition, so I think is
>OK if some of those iterations are simply skipped.
>
>Thanks for taking a look at it; I can commit it if it makes sense.
>
>-Lewis

>libstdc++: testsuite: Address random failure in pthread_create() [PR54185]
>
>The test for this PR calls pthread_create() many times in a row, which may fail
>with EAGAIN sometimes. Avoid generating a test failure in this case.
>
>libstdc++-v3/ChangeLog:
>
>	PR libstdc++/54185
>	* testsuite/30_threads/condition_variable/54185.cc: Make test robust
>	to random pthread_create() failures.

Thanks for the patch. It certainly looks reasonable, but I wonder if
the attached version wouldn't be (very slightly) better. The
difference is that instead of just giving up at the first EAGAIN we
keep trying. This way we might be able to create a few more threads
before the loop finishes. If we still keep failing, it works the same.

I've also added a check that the failures are due to EAGAIN, and we'll
still terminate if there's some other problem. I'm assuming that your
failures are EAGAIN. Do you know why that's happening? Does your
system a low value for RLIMIT_NPROC or something?

The failures for that testcase on AIX appear to be different. It just
segfaults after destroying the condition_variable, which probably
means there's a POSIX conformance issue in AIX's pthread_cond_t.



[-- Attachment #2: patch.txt --]
[-- Type: text/x-patch, Size: 1586 bytes --]

diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
index ea0d5bb8740..8ccb79e6de6 100644
--- a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
@@ -31,31 +31,48 @@
 std::condition_variable* cond = nullptr;
 std::mutex mx;
 int started = 0;
+bool notified = false;
 int constexpr NUM_THREADS = 10;
 
-void do_thread_a()
+void do_thread_a(bool wait)
 {
   std::unique_lock<std::mutex> lock(mx);
-  if(++started >= NUM_THREADS)
+  if (++started >= NUM_THREADS)
   {
+    notified = true;
     cond->notify_all();
     delete cond;
     cond = nullptr;
   }
-  else
-    cond->wait(lock);
+  else if (wait)
+    cond->wait(lock, [] { return notified; });
 }
 
-int main(){
+int main()
+{
   std::vector<std::thread> vec;
-  for(int j = 0; j < 1000; ++j)
+  for (int j = 0; j < 1000; ++j)
   {
     started = 0;
+    notified = false;
     cond = new std::condition_variable;
     for (int i = 0; i < NUM_THREADS; ++i)
-      vec.emplace_back(&do_thread_a);
-    for (int i = 0; i < NUM_THREADS; ++i)
-      vec[i].join();
+      {
+	try
+	  {
+	    vec.emplace_back(&do_thread_a, true);
+	  }
+	catch(const std::system_error& e)
+	  {
+	    if (e.code() == std::errc::resource_unavailable_try_again)
+	      // Thread creation may fail due to resource limits; run serially.
+	      do_thread_a(false);
+	    else
+	      throw;
+	  }
+      }
+    for (auto& thread : vec)
+      thread.join();
     vec.clear();
   }
 }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] libstdc++: testsuite: Address random failure in pthread_create() [PR54185]
  2020-08-18  8:43 ` Jonathan Wakely
@ 2020-08-18 15:20   ` Lewis Hyatt
  2020-08-18 18:15     ` Jonathan Wakely
  0 siblings, 1 reply; 4+ messages in thread
From: Lewis Hyatt @ 2020-08-18 15:20 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: gcc-patches, libstdc++

On Tue, Aug 18, 2020 at 09:43:31AM +0100, Jonathan Wakely wrote:
> On 13/08/20 18:15 -0400, Lewis Hyatt via Libstdc++ wrote:
> > Hello-
> > 
> > The attached patch was discussed briefly on PR 54185 here:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54185#c14
> > The test case for this PR sometimes fails due to random failures in
> > pthread_create() that are not related to the original PR. This patch fixes
> > it up by ignoring those failures. The test case was designed to repeat the
> > same test 1000 times to attempt to reproduce a race condition, so I think is
> > OK if some of those iterations are simply skipped.
> > 
> > Thanks for taking a look at it; I can commit it if it makes sense.
> > 
> > -Lewis
> 
> > libstdc++: testsuite: Address random failure in pthread_create() [PR54185]
> > 
> > The test for this PR calls pthread_create() many times in a row, which may fail
> > with EAGAIN sometimes. Avoid generating a test failure in this case.
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > 	PR libstdc++/54185
> > 	* testsuite/30_threads/condition_variable/54185.cc: Make test robust
> > 	to random pthread_create() failures.
> 
> Thanks for the patch. It certainly looks reasonable, but I wonder if
> the attached version wouldn't be (very slightly) better. The
> difference is that instead of just giving up at the first EAGAIN we
> keep trying. This way we might be able to create a few more threads
> before the loop finishes. If we still keep failing, it works the same.
>
> I've also added a check that the failures are due to EAGAIN, and we'll
> still terminate if there's some other problem. I'm assuming that your
> failures are EAGAIN. Do you know why that's happening? Does your
> system a low value for RLIMIT_NPROC or something?
>

Right, good point to check for EAGAIN. Yes, that's the error I get. I don't
understand why it happens. It's not related to libstdc++, I can reproduce it
with the below:

======
#include <pthread.h>
void* do_nothing (void*) 
{
  return nullptr;
}
int main () {
  for (int i = 0; i != 1000; ++i)
    {
      for (int j = 0; j != 10; ++j)
	{
	  pthread_t thread;
	  const int err = pthread_create (&thread, nullptr, do_nothing, nullptr);
	  if (err) return 1;
	  pthread_join (thread, nullptr);
        }
    }
}
======

If I run this just once at a time, it never fails. But if I run it twice at
a time, it fails about 30% of the time, like:
root@host:/home/lewis# (./pthread_fail || echo ERR) & \
                       (./pthread_fail || echo ERR) & wait
[1] 25041
[2] 25042
ERR
ERR

All the rlimits are infinite or as high as possible, but I dug around a bit
and it seems this is a systemd thing, this system had systemd-logind
disabled (perhaps not in the correct way) and something about the
configuration led to the issue. Enabling systemd-logind resolves it for
me. So perhaps this was mostly specific to me. Sorry if I wasted your
time... if you still think it's worth doing something here I am happy to
help.

FWIW, regarding your extension to the patch, in case there are some
legitimate thread creation problems, one thing to keep in mind is that the
retrying after failure makes certain things worse. For instance, (with my
system in the previous state), what would happen is the 54185.cc hit the
pthread_create failure, then prior to this patch it just bailed out. With
either of these patches it tries more times, which can worsen issues in
unrelated test cases running in parallel, that may see random failures in
their own forks or thread creations. This test case is trying hard to
reproduce the race condition by running 1000 iterations, which seems
worthwhile given it's still failing on some systems like AIX, but on the
other hand it's possible doing 50 instead of 1000 would work too, and be
less prone to unrelated resource issues.

Thanks for taking a look at this.

-Lewis

> The failures for that testcase on AIX appear to be different. It just
> segfaults after destroying the condition_variable, which probably
> means there's a POSIX conformance issue in AIX's pthread_cond_t.
> 
> 

> diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
> index ea0d5bb8740..8ccb79e6de6 100644
> --- a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
> +++ b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
> @@ -31,31 +31,48 @@
>  std::condition_variable* cond = nullptr;
>  std::mutex mx;
>  int started = 0;
> +bool notified = false;
>  int constexpr NUM_THREADS = 10;
>  
> -void do_thread_a()
> +void do_thread_a(bool wait)
>  {
>    std::unique_lock<std::mutex> lock(mx);
> -  if(++started >= NUM_THREADS)
> +  if (++started >= NUM_THREADS)
>    {
> +    notified = true;
>      cond->notify_all();
>      delete cond;
>      cond = nullptr;
>    }
> -  else
> -    cond->wait(lock);
> +  else if (wait)
> +    cond->wait(lock, [] { return notified; });
>  }
>  
> -int main(){
> +int main()
> +{
>    std::vector<std::thread> vec;
> -  for(int j = 0; j < 1000; ++j)
> +  for (int j = 0; j < 1000; ++j)
>    {
>      started = 0;
> +    notified = false;
>      cond = new std::condition_variable;
>      for (int i = 0; i < NUM_THREADS; ++i)
> -      vec.emplace_back(&do_thread_a);
> -    for (int i = 0; i < NUM_THREADS; ++i)
> -      vec[i].join();
> +      {
> +	try
> +	  {
> +	    vec.emplace_back(&do_thread_a, true);
> +	  }
> +	catch(const std::system_error& e)
> +	  {
> +	    if (e.code() == std::errc::resource_unavailable_try_again)
> +	      // Thread creation may fail due to resource limits; run serially.
> +	      do_thread_a(false);
> +	    else
> +	      throw;
> +	  }
> +      }
> +    for (auto& thread : vec)
> +      thread.join();
>      vec.clear();
>    }
>  }


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] libstdc++: testsuite: Address random failure in pthread_create() [PR54185]
  2020-08-18 15:20   ` Lewis Hyatt
@ 2020-08-18 18:15     ` Jonathan Wakely
  0 siblings, 0 replies; 4+ messages in thread
From: Jonathan Wakely @ 2020-08-18 18:15 UTC (permalink / raw)
  To: Lewis Hyatt; +Cc: gcc-patches, libstdc++

On 18/08/20 11:20 -0400, Lewis Hyatt wrote:
>On Tue, Aug 18, 2020 at 09:43:31AM +0100, Jonathan Wakely wrote:
>> On 13/08/20 18:15 -0400, Lewis Hyatt via Libstdc++ wrote:
>> > Hello-
>> >
>> > The attached patch was discussed briefly on PR 54185 here:
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54185#c14
>> > The test case for this PR sometimes fails due to random failures in
>> > pthread_create() that are not related to the original PR. This patch fixes
>> > it up by ignoring those failures. The test case was designed to repeat the
>> > same test 1000 times to attempt to reproduce a race condition, so I think is
>> > OK if some of those iterations are simply skipped.
>> >
>> > Thanks for taking a look at it; I can commit it if it makes sense.
>> >
>> > -Lewis
>>
>> > libstdc++: testsuite: Address random failure in pthread_create() [PR54185]
>> >
>> > The test for this PR calls pthread_create() many times in a row, which may fail
>> > with EAGAIN sometimes. Avoid generating a test failure in this case.
>> >
>> > libstdc++-v3/ChangeLog:
>> >
>> > 	PR libstdc++/54185
>> > 	* testsuite/30_threads/condition_variable/54185.cc: Make test robust
>> > 	to random pthread_create() failures.
>>
>> Thanks for the patch. It certainly looks reasonable, but I wonder if
>> the attached version wouldn't be (very slightly) better. The
>> difference is that instead of just giving up at the first EAGAIN we
>> keep trying. This way we might be able to create a few more threads
>> before the loop finishes. If we still keep failing, it works the same.
>>
>> I've also added a check that the failures are due to EAGAIN, and we'll
>> still terminate if there's some other problem. I'm assuming that your
>> failures are EAGAIN. Do you know why that's happening? Does your
>> system a low value for RLIMIT_NPROC or something?
>>
>
>Right, good point to check for EAGAIN. Yes, that's the error I get. I don't
>understand why it happens. It's not related to libstdc++, I can reproduce it
>with the below:
>
>======
>#include <pthread.h>
>void* do_nothing (void*)
>{
>  return nullptr;
>}
>int main () {
>  for (int i = 0; i != 1000; ++i)
>    {
>      for (int j = 0; j != 10; ++j)
>	{
>	  pthread_t thread;
>	  const int err = pthread_create (&thread, nullptr, do_nothing, nullptr);
>	  if (err) return 1;
>	  pthread_join (thread, nullptr);
>        }
>    }
>}
>======
>
>If I run this just once at a time, it never fails. But if I run it twice at
>a time, it fails about 30% of the time, like:
>root@host:/home/lewis# (./pthread_fail || echo ERR) & \
>                       (./pthread_fail || echo ERR) & wait
>[1] 25041
>[2] 25042
>ERR
>ERR
>
>All the rlimits are infinite or as high as possible, but I dug around a bit
>and it seems this is a systemd thing, this system had systemd-logind
>disabled (perhaps not in the correct way) and something about the
>configuration led to the issue. Enabling systemd-logind resolves it for
>me. So perhaps this was mostly specific to me. Sorry if I wasted your
>time... if you still think it's worth doing something here I am happy to
>help.

I don't think it's a waste of time. Adding the 'notified' variable to
the test to prevent spurious wakeups is an improvement if nothing else.

>FWIW, regarding your extension to the patch, in case there are some
>legitimate thread creation problems, one thing to keep in mind is that the
>retrying after failure makes certain things worse. For instance, (with my
>system in the previous state), what would happen is the 54185.cc hit the
>pthread_create failure, then prior to this patch it just bailed out. With
>either of these patches it tries more times, which can worsen issues in
>unrelated test cases running in parallel, that may see random failures in
>their own forks or thread creations. This test case is trying hard to
>reproduce the race condition by running 1000 iterations, which seems
>worthwhile given it's still failing on some systems like AIX, but on the

On AIX it fails even with one iteration (not 1000) i.e. you simply
can't destroy a pthread_cond_t while there are threads still waiting
on it. We don't need 1000 iterations to hit that bug, it happens right
away.

>other hand it's possible doing 50 instead of 1000 would work too, and be
>less prone to unrelated resource issues.

Maybe. I'm a bit concerned that if the test started consistently
breaking out of the inner loop after one or two threads on everybody's
systems it would never actually try to delete a condition_variable
that is being waited on. And so it would never exercise the problem
case, and we'd never know. The test would PASS with no indication of
problems, but wouldn't actually test anything.

So I think I'd like it to keep trying to create threads. The meaning
of EAGAIN is "try again" after all :-)

>Thanks for taking a look at this.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-08-18 18:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-13 22:15 [PATCH] libstdc++: testsuite: Address random failure in pthread_create() [PR54185] Lewis Hyatt
2020-08-18  8:43 ` Jonathan Wakely
2020-08-18 15:20   ` Lewis Hyatt
2020-08-18 18:15     ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).