public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
@ 2023-04-27 20:06 H.J. Lu
  2023-04-27 20:42 ` Noah Goldstein
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: H.J. Lu @ 2023-04-27 20:06 UTC (permalink / raw)
  To: libc-alpha

There are reports for hang in __check_pf:

https://github.com/JoeDog/siege/issues/4

It is reproducible only under specific configurations:

1. Large number of cores (>= 64) and large number of threads (> 3X of
the number of cores) with long lived socket connection.
2. Low power (frequency) mode.
3. Power management is enabled.

While holding lock, __check_pf calls make_request which calls __sendto
and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
lock held by __check_pf won't be released and can cause deadlock when
thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
cleanup handler for __check_pf to unlock the lock when cancelled by
another thread.  This fixes BZ #20975 and the siege hang issue.
---
 sysdeps/unix/sysv/linux/Makefile   |  2 ++
 sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index aec7a94785..0160be8790 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -529,6 +529,8 @@ sysdep_headers += \
 sysdep_routines += \
   netlink_assert_response \
   # sysdep_routines
+
+CFLAGS-check_pf.c += -fexceptions
 endif
 
 # Don't compile the ctype glue code, since there is no old non-GNU C library.
diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
index b157c5126c..2b0b8b6368 100644
--- a/sysdeps/unix/sysv/linux/check_pf.c
+++ b/sysdeps/unix/sysv/linux/check_pf.c
@@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
   return NULL;
 }
 
+#ifdef __EXCEPTIONS
+static void
+cancel_handler (void *arg __attribute__((unused)))
+{
+  /* Release the lock.  */
+  __libc_lock_unlock (lock);
+}
+#endif
 
 void
 attribute_hidden
@@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
   struct cached_data *olddata = NULL;
   struct cached_data *data = NULL;
 
+#ifdef __EXCEPTIONS
+  /* Make sure that lock is released when the thread is cancelled.  */
+  __libc_cleanup_push (cancel_handler, NULL);
+#endif
   __libc_lock_lock (lock);
 
   if (cache_valid_p ())
@@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
 	}
     }
 
+#ifdef __EXCEPTIONS
+  __libc_cleanup_pop (0);
+#endif
   __libc_lock_unlock (lock);
 
   if (data != NULL)
-- 
2.40.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-04-27 20:06 [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975] H.J. Lu
@ 2023-04-27 20:42 ` Noah Goldstein
  2023-04-27 20:59   ` H.J. Lu
  2023-04-27 22:03 ` Noah Goldstein
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Noah Goldstein @ 2023-04-27 20:42 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha

On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> There are reports for hang in __check_pf:
>
> https://github.com/JoeDog/siege/issues/4
>
> It is reproducible only under specific configurations:
>
> 1. Large number of cores (>= 64) and large number of threads (> 3X of
> the number of cores) with long lived socket connection.
> 2. Low power (frequency) mode.
> 3. Power management is enabled.
>
> While holding lock, __check_pf calls make_request which calls __sendto
> and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
> lock held by __check_pf won't be released and can cause deadlock when
> thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
> cleanup handler for __check_pf to unlock the lock when cancelled by
> another thread.  This fixes BZ #20975 and the siege hang issue.
> ---
>  sysdeps/unix/sysv/linux/Makefile   |  2 ++
>  sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
>  2 files changed, 17 insertions(+)
>
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index aec7a94785..0160be8790 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -529,6 +529,8 @@ sysdep_headers += \
>  sysdep_routines += \
>    netlink_assert_response \
>    # sysdep_routines
> +
> +CFLAGS-check_pf.c += -fexceptions
>  endif
>
>  # Don't compile the ctype glue code, since there is no old non-GNU C library.
> diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> index b157c5126c..2b0b8b6368 100644
> --- a/sysdeps/unix/sysv/linux/check_pf.c
> +++ b/sysdeps/unix/sysv/linux/check_pf.c
> @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
>    return NULL;
>  }
>
> +#ifdef __EXCEPTIONS
> +static void
> +cancel_handler (void *arg __attribute__((unused)))
> +{
> +  /* Release the lock.  */
> +  __libc_lock_unlock (lock);
> +}
> +#endif
>
>  void
>  attribute_hidden
> @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
>    struct cached_data *olddata = NULL;
>    struct cached_data *data = NULL;
>
> +#ifdef __EXCEPTIONS
> +  /* Make sure that lock is released when the thread is cancelled.  */
> +  __libc_cleanup_push (cancel_handler, NULL);
> +#endif
>    __libc_lock_lock (lock);
>
>    if (cache_valid_p ())
> @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
>         }
>      }
>
> +#ifdef __EXCEPTIONS
> +  __libc_cleanup_pop (0);
> +#endif
>    __libc_lock_unlock (lock);
>
>    if (data != NULL)
> --
> 2.40.0
>

Can we add a test the will just XFAIL (or XPASS) if cores < 64?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-04-27 20:42 ` Noah Goldstein
@ 2023-04-27 20:59   ` H.J. Lu
  2023-04-27 21:44     ` Noah Goldstein
  0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2023-04-27 20:59 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha

On Thu, Apr 27, 2023 at 1:43 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > There are reports for hang in __check_pf:
> >
> > https://github.com/JoeDog/siege/issues/4
> >
> > It is reproducible only under specific configurations:
> >
> > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > the number of cores) with long lived socket connection.
> > 2. Low power (frequency) mode.
> > 3. Power management is enabled.
> >
> > While holding lock, __check_pf calls make_request which calls __sendto
> > and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
> > lock held by __check_pf won't be released and can cause deadlock when
> > thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
> > cleanup handler for __check_pf to unlock the lock when cancelled by
> > another thread.  This fixes BZ #20975 and the siege hang issue.
> > ---
> >  sysdeps/unix/sysv/linux/Makefile   |  2 ++
> >  sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> >  2 files changed, 17 insertions(+)
> >
> > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > index aec7a94785..0160be8790 100644
> > --- a/sysdeps/unix/sysv/linux/Makefile
> > +++ b/sysdeps/unix/sysv/linux/Makefile
> > @@ -529,6 +529,8 @@ sysdep_headers += \
> >  sysdep_routines += \
> >    netlink_assert_response \
> >    # sysdep_routines
> > +
> > +CFLAGS-check_pf.c += -fexceptions
> >  endif
> >
> >  # Don't compile the ctype glue code, since there is no old non-GNU C library.
> > diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> > index b157c5126c..2b0b8b6368 100644
> > --- a/sysdeps/unix/sysv/linux/check_pf.c
> > +++ b/sysdeps/unix/sysv/linux/check_pf.c
> > @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> >    return NULL;
> >  }
> >
> > +#ifdef __EXCEPTIONS
> > +static void
> > +cancel_handler (void *arg __attribute__((unused)))
> > +{
> > +  /* Release the lock.  */
> > +  __libc_lock_unlock (lock);
> > +}
> > +#endif
> >
> >  void
> >  attribute_hidden
> > @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> >    struct cached_data *olddata = NULL;
> >    struct cached_data *data = NULL;
> >
> > +#ifdef __EXCEPTIONS
> > +  /* Make sure that lock is released when the thread is cancelled.  */
> > +  __libc_cleanup_push (cancel_handler, NULL);
> > +#endif
> >    __libc_lock_lock (lock);
> >
> >    if (cache_valid_p ())
> > @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> >         }
> >      }
> >
> > +#ifdef __EXCEPTIONS
> > +  __libc_cleanup_pop (0);
> > +#endif
> >    __libc_lock_unlock (lock);
> >
> >    if (data != NULL)
> > --
> > 2.40.0
> >
>
> Can we add a test the will just XFAIL (or XPASS) if cores < 64?

There is no simple test for this.  This is one reason why it hasn't been fixed
since 2016.

-- 
H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-04-27 20:59   ` H.J. Lu
@ 2023-04-27 21:44     ` Noah Goldstein
  2023-04-27 21:50       ` H.J. Lu
  0 siblings, 1 reply; 13+ messages in thread
From: Noah Goldstein @ 2023-04-27 21:44 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha

On Thu, Apr 27, 2023 at 4:00 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 1:43 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> > >
> > > There are reports for hang in __check_pf:
> > >
> > > https://github.com/JoeDog/siege/issues/4
> > >
> > > It is reproducible only under specific configurations:
> > >
> > > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > > the number of cores) with long lived socket connection.
> > > 2. Low power (frequency) mode.
> > > 3. Power management is enabled.
> > >
> > > While holding lock, __check_pf calls make_request which calls __sendto
> > > and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
> > > lock held by __check_pf won't be released and can cause deadlock when
> > > thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
> > > cleanup handler for __check_pf to unlock the lock when cancelled by
> > > another thread.  This fixes BZ #20975 and the siege hang issue.
> > > ---
> > >  sysdeps/unix/sysv/linux/Makefile   |  2 ++
> > >  sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> > >  2 files changed, 17 insertions(+)
> > >
> > > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > > index aec7a94785..0160be8790 100644
> > > --- a/sysdeps/unix/sysv/linux/Makefile
> > > +++ b/sysdeps/unix/sysv/linux/Makefile
> > > @@ -529,6 +529,8 @@ sysdep_headers += \
> > >  sysdep_routines += \
> > >    netlink_assert_response \
> > >    # sysdep_routines
> > > +
> > > +CFLAGS-check_pf.c += -fexceptions
> > >  endif
> > >
> > >  # Don't compile the ctype glue code, since there is no old non-GNU C library.
> > > diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> > > index b157c5126c..2b0b8b6368 100644
> > > --- a/sysdeps/unix/sysv/linux/check_pf.c
> > > +++ b/sysdeps/unix/sysv/linux/check_pf.c
> > > @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> > >    return NULL;
> > >  }
> > >
> > > +#ifdef __EXCEPTIONS
> > > +static void
> > > +cancel_handler (void *arg __attribute__((unused)))
> > > +{
> > > +  /* Release the lock.  */
> > > +  __libc_lock_unlock (lock);
> > > +}
> > > +#endif
> > >
> > >  void
> > >  attribute_hidden
> > > @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > >    struct cached_data *olddata = NULL;
> > >    struct cached_data *data = NULL;
> > >
> > > +#ifdef __EXCEPTIONS
> > > +  /* Make sure that lock is released when the thread is cancelled.  */
> > > +  __libc_cleanup_push (cancel_handler, NULL);
> > > +#endif
> > >    __libc_lock_lock (lock);
> > >
> > >    if (cache_valid_p ())
> > > @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > >         }
> > >      }
> > >
> > > +#ifdef __EXCEPTIONS
> > > +  __libc_cleanup_pop (0);
> > > +#endif
> > >    __libc_lock_unlock (lock);
> > >
> > >    if (data != NULL)
> > > --
> > > 2.40.0
> > >
> >
> > Can we add a test the will just XFAIL (or XPASS) if cores < 64?
>
> There is no simple test for this.  This is one reason why it hasn't been fixed
> since 2016.
>

Did you verify this fixes issue in siege?
> --
> H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-04-27 21:44     ` Noah Goldstein
@ 2023-04-27 21:50       ` H.J. Lu
  0 siblings, 0 replies; 13+ messages in thread
From: H.J. Lu @ 2023-04-27 21:50 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha

On Thu, Apr 27, 2023 at 2:44 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 4:00 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Thu, Apr 27, 2023 at 1:43 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > >
> > > On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
> > > <libc-alpha@sourceware.org> wrote:
> > > >
> > > > There are reports for hang in __check_pf:
> > > >
> > > > https://github.com/JoeDog/siege/issues/4
> > > >
> > > > It is reproducible only under specific configurations:
> > > >
> > > > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > > > the number of cores) with long lived socket connection.
> > > > 2. Low power (frequency) mode.
> > > > 3. Power management is enabled.
> > > >
> > > > While holding lock, __check_pf calls make_request which calls __sendto
> > > > and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
> > > > lock held by __check_pf won't be released and can cause deadlock when
> > > > thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
> > > > cleanup handler for __check_pf to unlock the lock when cancelled by
> > > > another thread.  This fixes BZ #20975 and the siege hang issue.
> > > > ---
> > > >  sysdeps/unix/sysv/linux/Makefile   |  2 ++
> > > >  sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> > > >  2 files changed, 17 insertions(+)
> > > >
> > > > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > > > index aec7a94785..0160be8790 100644
> > > > --- a/sysdeps/unix/sysv/linux/Makefile
> > > > +++ b/sysdeps/unix/sysv/linux/Makefile
> > > > @@ -529,6 +529,8 @@ sysdep_headers += \
> > > >  sysdep_routines += \
> > > >    netlink_assert_response \
> > > >    # sysdep_routines
> > > > +
> > > > +CFLAGS-check_pf.c += -fexceptions
> > > >  endif
> > > >
> > > >  # Don't compile the ctype glue code, since there is no old non-GNU C library.
> > > > diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> > > > index b157c5126c..2b0b8b6368 100644
> > > > --- a/sysdeps/unix/sysv/linux/check_pf.c
> > > > +++ b/sysdeps/unix/sysv/linux/check_pf.c
> > > > @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> > > >    return NULL;
> > > >  }
> > > >
> > > > +#ifdef __EXCEPTIONS
> > > > +static void
> > > > +cancel_handler (void *arg __attribute__((unused)))
> > > > +{
> > > > +  /* Release the lock.  */
> > > > +  __libc_lock_unlock (lock);
> > > > +}
> > > > +#endif
> > > >
> > > >  void
> > > >  attribute_hidden
> > > > @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > > >    struct cached_data *olddata = NULL;
> > > >    struct cached_data *data = NULL;
> > > >
> > > > +#ifdef __EXCEPTIONS
> > > > +  /* Make sure that lock is released when the thread is cancelled.  */
> > > > +  __libc_cleanup_push (cancel_handler, NULL);
> > > > +#endif
> > > >    __libc_lock_lock (lock);
> > > >
> > > >    if (cache_valid_p ())
> > > > @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > > >         }
> > > >      }
> > > >
> > > > +#ifdef __EXCEPTIONS
> > > > +  __libc_cleanup_pop (0);
> > > > +#endif
> > > >    __libc_lock_unlock (lock);
> > > >
> > > >    if (data != NULL)
> > > > --
> > > > 2.40.0
> > > >
> > >
> > > Can we add a test the will just XFAIL (or XPASS) if cores < 64?
> >
> > There is no simple test for this.  This is one reason why it hasn't been fixed
> > since 2016.
> >
>
> Did you verify this fixes issue in siege?
> > --
> > H.J.

Yes.

-- 
H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-04-27 20:06 [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975] H.J. Lu
  2023-04-27 20:42 ` Noah Goldstein
@ 2023-04-27 22:03 ` Noah Goldstein
  2023-04-27 22:10   ` H.J. Lu
  2023-04-28  8:38 ` Florian Weimer
  2023-04-28 19:17 ` Noah Goldstein
  3 siblings, 1 reply; 13+ messages in thread
From: Noah Goldstein @ 2023-04-27 22:03 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha

On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> There are reports for hang in __check_pf:
>
> https://github.com/JoeDog/siege/issues/4
>
> It is reproducible only under specific configurations:
>
> 1. Large number of cores (>= 64) and large number of threads (> 3X of
> the number of cores) with long lived socket connection.
> 2. Low power (frequency) mode.
> 3. Power management is enabled.
>
> While holding lock, __check_pf calls make_request which calls __sendto
> and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
> lock held by __check_pf won't be released and can cause deadlock when
> thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
> cleanup handler for __check_pf to unlock the lock when cancelled by
> another thread.  This fixes BZ #20975 and the siege hang issue.
> ---
>  sysdeps/unix/sysv/linux/Makefile   |  2 ++
>  sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
>  2 files changed, 17 insertions(+)
>
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index aec7a94785..0160be8790 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -529,6 +529,8 @@ sysdep_headers += \
>  sysdep_routines += \
>    netlink_assert_response \
>    # sysdep_routines
> +
> +CFLAGS-check_pf.c += -fexceptions
>  endif
>
>  # Don't compile the ctype glue code, since there is no old non-GNU C library.
> diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> index b157c5126c..2b0b8b6368 100644
> --- a/sysdeps/unix/sysv/linux/check_pf.c
> +++ b/sysdeps/unix/sysv/linux/check_pf.c
> @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
>    return NULL;
>  }
>
> +#ifdef __EXCEPTIONS
> +static void
> +cancel_handler (void *arg __attribute__((unused)))
> +{
> +  /* Release the lock.  */
> +  __libc_lock_unlock (lock);
> +}
> +#endif
>
>  void
>  attribute_hidden
> @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
>    struct cached_data *olddata = NULL;
>    struct cached_data *data = NULL;
>
> +#ifdef __EXCEPTIONS
> +  /* Make sure that lock is released when the thread is cancelled.  */
> +  __libc_cleanup_push (cancel_handler, NULL);
> +#endif

Should we wait until we have successfully taken the lock to push this?

>    __libc_lock_lock (lock);
>
>    if (cache_valid_p ())
> @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
>         }
>      }
>
> +#ifdef __EXCEPTIONS
> +  __libc_cleanup_pop (0);
> +#endif
>    __libc_lock_unlock (lock);
>
>    if (data != NULL)
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-04-27 22:03 ` Noah Goldstein
@ 2023-04-27 22:10   ` H.J. Lu
  0 siblings, 0 replies; 13+ messages in thread
From: H.J. Lu @ 2023-04-27 22:10 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha

On Thu, Apr 27, 2023 at 3:03 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > There are reports for hang in __check_pf:
> >
> > https://github.com/JoeDog/siege/issues/4
> >
> > It is reproducible only under specific configurations:
> >
> > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > the number of cores) with long lived socket connection.
> > 2. Low power (frequency) mode.
> > 3. Power management is enabled.
> >
> > While holding lock, __check_pf calls make_request which calls __sendto
> > and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
> > lock held by __check_pf won't be released and can cause deadlock when
> > thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
> > cleanup handler for __check_pf to unlock the lock when cancelled by
> > another thread.  This fixes BZ #20975 and the siege hang issue.
> > ---
> >  sysdeps/unix/sysv/linux/Makefile   |  2 ++
> >  sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> >  2 files changed, 17 insertions(+)
> >
> > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > index aec7a94785..0160be8790 100644
> > --- a/sysdeps/unix/sysv/linux/Makefile
> > +++ b/sysdeps/unix/sysv/linux/Makefile
> > @@ -529,6 +529,8 @@ sysdep_headers += \
> >  sysdep_routines += \
> >    netlink_assert_response \
> >    # sysdep_routines
> > +
> > +CFLAGS-check_pf.c += -fexceptions
> >  endif
> >
> >  # Don't compile the ctype glue code, since there is no old non-GNU C library.
> > diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> > index b157c5126c..2b0b8b6368 100644
> > --- a/sysdeps/unix/sysv/linux/check_pf.c
> > +++ b/sysdeps/unix/sysv/linux/check_pf.c
> > @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> >    return NULL;
> >  }
> >
> > +#ifdef __EXCEPTIONS
> > +static void
> > +cancel_handler (void *arg __attribute__((unused)))
> > +{
> > +  /* Release the lock.  */
> > +  __libc_lock_unlock (lock);
> > +}
> > +#endif
> >
> >  void
> >  attribute_hidden
> > @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> >    struct cached_data *olddata = NULL;
> >    struct cached_data *data = NULL;
> >
> > +#ifdef __EXCEPTIONS
> > +  /* Make sure that lock is released when the thread is cancelled.  */
> > +  __libc_cleanup_push (cancel_handler, NULL);
> > +#endif
>
> Should we wait until we have successfully taken the lock to push this?

The cleanup handler is called only when thread cancellation happens
in __sendto or __recvmsg called from make_request after lock has
been taken.   The order here isn't critical.

>
> >    __libc_lock_lock (lock);
> >
> >    if (cache_valid_p ())
> > @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> >         }
> >      }
> >
> > +#ifdef __EXCEPTIONS
> > +  __libc_cleanup_pop (0);
> > +#endif
> >    __libc_lock_unlock (lock);
> >
> >    if (data != NULL)
> > --
> > 2.40.0
> >



-- 
H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-04-27 20:06 [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975] H.J. Lu
  2023-04-27 20:42 ` Noah Goldstein
  2023-04-27 22:03 ` Noah Goldstein
@ 2023-04-28  8:38 ` Florian Weimer
  2023-04-28 15:33   ` H.J. Lu
  2023-04-28 19:17 ` Noah Goldstein
  3 siblings, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2023-04-28  8:38 UTC (permalink / raw)
  To: H.J. Lu via Libc-alpha; +Cc: H.J. Lu

* H. J. Lu via Libc-alpha:

> There are reports for hang in __check_pf:
>
> https://github.com/JoeDog/siege/issues/4
>
> It is reproducible only under specific configurations:
>
> 1. Large number of cores (>= 64) and large number of threads (> 3X of
> the number of cores) with long lived socket connection.
> 2. Low power (frequency) mode.
> 3. Power management is enabled.
>
> While holding lock, __check_pf calls make_request which calls __sendto
> and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
> lock held by __check_pf won't be released and can cause deadlock when
> thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
> cleanup handler for __check_pf to unlock the lock when cancelled by
> another thread.  This fixes BZ #20975 and the siege hang issue.

It's probably easier to reproduce if the system has many network
interfaces with lots of addresses.

Doesn't getaddrinfo leak all kind of resources when canceled?  That's
more difficult to fix, though.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-04-28  8:38 ` Florian Weimer
@ 2023-04-28 15:33   ` H.J. Lu
  0 siblings, 0 replies; 13+ messages in thread
From: H.J. Lu @ 2023-04-28 15:33 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Fri, Apr 28, 2023 at 1:38 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > There are reports for hang in __check_pf:
> >
> > https://github.com/JoeDog/siege/issues/4
> >
> > It is reproducible only under specific configurations:
> >
> > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > the number of cores) with long lived socket connection.
> > 2. Low power (frequency) mode.
> > 3. Power management is enabled.
> >
> > While holding lock, __check_pf calls make_request which calls __sendto
> > and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
> > lock held by __check_pf won't be released and can cause deadlock when
> > thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
> > cleanup handler for __check_pf to unlock the lock when cancelled by
> > another thread.  This fixes BZ #20975 and the siege hang issue.
>
> It's probably easier to reproduce if the system has many network
> interfaces with lots of addresses.
>
> Doesn't getaddrinfo leak all kind of resources when canceled?  That's
> more difficult to fix, though.

We will work on it if there are reports.

-- 
H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-04-27 20:06 [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975] H.J. Lu
                   ` (2 preceding siblings ...)
  2023-04-28  8:38 ` Florian Weimer
@ 2023-04-28 19:17 ` Noah Goldstein
  2023-05-22 20:37   ` H.J. Lu
  3 siblings, 1 reply; 13+ messages in thread
From: Noah Goldstein @ 2023-04-28 19:17 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha

On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> There are reports for hang in __check_pf:
>
> https://github.com/JoeDog/siege/issues/4
>
> It is reproducible only under specific configurations:
>
> 1. Large number of cores (>= 64) and large number of threads (> 3X of
> the number of cores) with long lived socket connection.
> 2. Low power (frequency) mode.
> 3. Power management is enabled.
>
> While holding lock, __check_pf calls make_request which calls __sendto
> and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
> lock held by __check_pf won't be released and can cause deadlock when
> thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
> cleanup handler for __check_pf to unlock the lock when cancelled by
> another thread.  This fixes BZ #20975 and the siege hang issue.
> ---
>  sysdeps/unix/sysv/linux/Makefile   |  2 ++
>  sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
>  2 files changed, 17 insertions(+)
>
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index aec7a94785..0160be8790 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -529,6 +529,8 @@ sysdep_headers += \
>  sysdep_routines += \
>    netlink_assert_response \
>    # sysdep_routines
> +
> +CFLAGS-check_pf.c += -fexceptions
>  endif
>
>  # Don't compile the ctype glue code, since there is no old non-GNU C library.
> diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> index b157c5126c..2b0b8b6368 100644
> --- a/sysdeps/unix/sysv/linux/check_pf.c
> +++ b/sysdeps/unix/sysv/linux/check_pf.c
> @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
>    return NULL;
>  }
>
> +#ifdef __EXCEPTIONS
> +static void
> +cancel_handler (void *arg __attribute__((unused)))
> +{
> +  /* Release the lock.  */
> +  __libc_lock_unlock (lock);
> +}
> +#endif
>
>  void
>  attribute_hidden
> @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
>    struct cached_data *olddata = NULL;
>    struct cached_data *data = NULL;
>
> +#ifdef __EXCEPTIONS
> +  /* Make sure that lock is released when the thread is cancelled.  */
> +  __libc_cleanup_push (cancel_handler, NULL);
> +#endif
>    __libc_lock_lock (lock);
>
>    if (cache_valid_p ())
> @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
>         }
>      }
>
> +#ifdef __EXCEPTIONS
> +  __libc_cleanup_pop (0);
> +#endif
>    __libc_lock_unlock (lock);
>
>    if (data != NULL)
> --
> 2.40.0
>

LGTM.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-04-28 19:17 ` Noah Goldstein
@ 2023-05-22 20:37   ` H.J. Lu
  2023-05-23  8:58     ` Florian Weimer
  0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2023-05-22 20:37 UTC (permalink / raw)
  To: Noah Goldstein, Libc-stable Mailing List; +Cc: libc-alpha

On Fri, Apr 28, 2023 at 12:17 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > There are reports for hang in __check_pf:
> >
> > https://github.com/JoeDog/siege/issues/4
> >
> > It is reproducible only under specific configurations:
> >
> > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > the number of cores) with long lived socket connection.
> > 2. Low power (frequency) mode.
> > 3. Power management is enabled.
> >
> > While holding lock, __check_pf calls make_request which calls __sendto
> > and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
> > lock held by __check_pf won't be released and can cause deadlock when
> > thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
> > cleanup handler for __check_pf to unlock the lock when cancelled by
> > another thread.  This fixes BZ #20975 and the siege hang issue.
> > ---
> >  sysdeps/unix/sysv/linux/Makefile   |  2 ++
> >  sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> >  2 files changed, 17 insertions(+)
> >
> > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > index aec7a94785..0160be8790 100644
> > --- a/sysdeps/unix/sysv/linux/Makefile
> > +++ b/sysdeps/unix/sysv/linux/Makefile
> > @@ -529,6 +529,8 @@ sysdep_headers += \
> >  sysdep_routines += \
> >    netlink_assert_response \
> >    # sysdep_routines
> > +
> > +CFLAGS-check_pf.c += -fexceptions
> >  endif
> >
> >  # Don't compile the ctype glue code, since there is no old non-GNU C library.
> > diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> > index b157c5126c..2b0b8b6368 100644
> > --- a/sysdeps/unix/sysv/linux/check_pf.c
> > +++ b/sysdeps/unix/sysv/linux/check_pf.c
> > @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> >    return NULL;
> >  }
> >
> > +#ifdef __EXCEPTIONS
> > +static void
> > +cancel_handler (void *arg __attribute__((unused)))
> > +{
> > +  /* Release the lock.  */
> > +  __libc_lock_unlock (lock);
> > +}
> > +#endif
> >
> >  void
> >  attribute_hidden
> > @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> >    struct cached_data *olddata = NULL;
> >    struct cached_data *data = NULL;
> >
> > +#ifdef __EXCEPTIONS
> > +  /* Make sure that lock is released when the thread is cancelled.  */
> > +  __libc_cleanup_push (cancel_handler, NULL);
> > +#endif
> >    __libc_lock_lock (lock);
> >
> >    if (cache_valid_p ())
> > @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> >         }
> >      }
> >
> > +#ifdef __EXCEPTIONS
> > +  __libc_cleanup_pop (0);
> > +#endif
> >    __libc_lock_unlock (lock);
> >
> >    if (data != NULL)
> > --
> > 2.40.0
> >
>
> LGTM.

OK for release branches?

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-05-22 20:37   ` H.J. Lu
@ 2023-05-23  8:58     ` Florian Weimer
  2023-05-23 16:56       ` Noah Goldstein
  0 siblings, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2023-05-23  8:58 UTC (permalink / raw)
  To: H.J. Lu via Libc-stable; +Cc: Noah Goldstein, H.J. Lu, libc-alpha

* H. J. Lu via Libc-stable:

> OK for release branches?

It looks backportable to me.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
  2023-05-23  8:58     ` Florian Weimer
@ 2023-05-23 16:56       ` Noah Goldstein
  0 siblings, 0 replies; 13+ messages in thread
From: Noah Goldstein @ 2023-05-23 16:56 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-stable, H.J. Lu, libc-alpha

On Tue, May 23, 2023 at 3:58 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-stable:
>
> > OK for release branches?
>
> It looks backportable to me.
+1
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-05-23 16:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-27 20:06 [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975] H.J. Lu
2023-04-27 20:42 ` Noah Goldstein
2023-04-27 20:59   ` H.J. Lu
2023-04-27 21:44     ` Noah Goldstein
2023-04-27 21:50       ` H.J. Lu
2023-04-27 22:03 ` Noah Goldstein
2023-04-27 22:10   ` H.J. Lu
2023-04-28  8:38 ` Florian Weimer
2023-04-28 15:33   ` H.J. Lu
2023-04-28 19:17 ` Noah Goldstein
2023-05-22 20:37   ` H.J. Lu
2023-05-23  8:58     ` Florian Weimer
2023-05-23 16:56       ` Noah Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).