* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
2023-04-27 20:06 [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975] H.J. Lu
@ 2023-04-27 20:42 ` Noah Goldstein
2023-04-27 20:59 ` H.J. Lu
2023-04-27 22:03 ` Noah Goldstein
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Noah Goldstein @ 2023-04-27 20:42 UTC (permalink / raw)
To: H.J. Lu; +Cc: libc-alpha
On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> There are reports for hang in __check_pf:
>
> https://github.com/JoeDog/siege/issues/4
>
> It is reproducible only under specific configurations:
>
> 1. Large number of cores (>= 64) and large number of threads (> 3X of
> the number of cores) with long lived socket connection.
> 2. Low power (frequency) mode.
> 3. Power management is enabled.
>
> While holding lock, __check_pf calls make_request which calls __sendto
> and __recvmsg. Since __sendto and __recvmsg are cancellation points,
> lock held by __check_pf won't be released and can cause deadlock when
> thread cancellation happens in __sendto or __recvmsg. Add a cancellation
> cleanup handler for __check_pf to unlock the lock when cancelled by
> another thread. This fixes BZ #20975 and the siege hang issue.
> ---
> sysdeps/unix/sysv/linux/Makefile | 2 ++
> sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index aec7a94785..0160be8790 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -529,6 +529,8 @@ sysdep_headers += \
> sysdep_routines += \
> netlink_assert_response \
> # sysdep_routines
> +
> +CFLAGS-check_pf.c += -fexceptions
> endif
>
> # Don't compile the ctype glue code, since there is no old non-GNU C library.
> diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> index b157c5126c..2b0b8b6368 100644
> --- a/sysdeps/unix/sysv/linux/check_pf.c
> +++ b/sysdeps/unix/sysv/linux/check_pf.c
> @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> return NULL;
> }
>
> +#ifdef __EXCEPTIONS
> +static void
> +cancel_handler (void *arg __attribute__((unused)))
> +{
> + /* Release the lock. */
> + __libc_lock_unlock (lock);
> +}
> +#endif
>
> void
> attribute_hidden
> @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> struct cached_data *olddata = NULL;
> struct cached_data *data = NULL;
>
> +#ifdef __EXCEPTIONS
> + /* Make sure that lock is released when the thread is cancelled. */
> + __libc_cleanup_push (cancel_handler, NULL);
> +#endif
> __libc_lock_lock (lock);
>
> if (cache_valid_p ())
> @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> }
> }
>
> +#ifdef __EXCEPTIONS
> + __libc_cleanup_pop (0);
> +#endif
> __libc_lock_unlock (lock);
>
> if (data != NULL)
> --
> 2.40.0
>
Can we add a test the will just XFAIL (or XPASS) if cores < 64?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
2023-04-27 20:42 ` Noah Goldstein
@ 2023-04-27 20:59 ` H.J. Lu
2023-04-27 21:44 ` Noah Goldstein
0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2023-04-27 20:59 UTC (permalink / raw)
To: Noah Goldstein; +Cc: libc-alpha
On Thu, Apr 27, 2023 at 1:43 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > There are reports for hang in __check_pf:
> >
> > https://github.com/JoeDog/siege/issues/4
> >
> > It is reproducible only under specific configurations:
> >
> > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > the number of cores) with long lived socket connection.
> > 2. Low power (frequency) mode.
> > 3. Power management is enabled.
> >
> > While holding lock, __check_pf calls make_request which calls __sendto
> > and __recvmsg. Since __sendto and __recvmsg are cancellation points,
> > lock held by __check_pf won't be released and can cause deadlock when
> > thread cancellation happens in __sendto or __recvmsg. Add a cancellation
> > cleanup handler for __check_pf to unlock the lock when cancelled by
> > another thread. This fixes BZ #20975 and the siege hang issue.
> > ---
> > sysdeps/unix/sysv/linux/Makefile | 2 ++
> > sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> > 2 files changed, 17 insertions(+)
> >
> > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > index aec7a94785..0160be8790 100644
> > --- a/sysdeps/unix/sysv/linux/Makefile
> > +++ b/sysdeps/unix/sysv/linux/Makefile
> > @@ -529,6 +529,8 @@ sysdep_headers += \
> > sysdep_routines += \
> > netlink_assert_response \
> > # sysdep_routines
> > +
> > +CFLAGS-check_pf.c += -fexceptions
> > endif
> >
> > # Don't compile the ctype glue code, since there is no old non-GNU C library.
> > diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> > index b157c5126c..2b0b8b6368 100644
> > --- a/sysdeps/unix/sysv/linux/check_pf.c
> > +++ b/sysdeps/unix/sysv/linux/check_pf.c
> > @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> > return NULL;
> > }
> >
> > +#ifdef __EXCEPTIONS
> > +static void
> > +cancel_handler (void *arg __attribute__((unused)))
> > +{
> > + /* Release the lock. */
> > + __libc_lock_unlock (lock);
> > +}
> > +#endif
> >
> > void
> > attribute_hidden
> > @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > struct cached_data *olddata = NULL;
> > struct cached_data *data = NULL;
> >
> > +#ifdef __EXCEPTIONS
> > + /* Make sure that lock is released when the thread is cancelled. */
> > + __libc_cleanup_push (cancel_handler, NULL);
> > +#endif
> > __libc_lock_lock (lock);
> >
> > if (cache_valid_p ())
> > @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > }
> > }
> >
> > +#ifdef __EXCEPTIONS
> > + __libc_cleanup_pop (0);
> > +#endif
> > __libc_lock_unlock (lock);
> >
> > if (data != NULL)
> > --
> > 2.40.0
> >
>
> Can we add a test the will just XFAIL (or XPASS) if cores < 64?
There is no simple test for this. This is one reason why it hasn't been fixed
since 2016.
--
H.J.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
2023-04-27 20:59 ` H.J. Lu
@ 2023-04-27 21:44 ` Noah Goldstein
2023-04-27 21:50 ` H.J. Lu
0 siblings, 1 reply; 13+ messages in thread
From: Noah Goldstein @ 2023-04-27 21:44 UTC (permalink / raw)
To: H.J. Lu; +Cc: libc-alpha
On Thu, Apr 27, 2023 at 4:00 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 1:43 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> > >
> > > There are reports for hang in __check_pf:
> > >
> > > https://github.com/JoeDog/siege/issues/4
> > >
> > > It is reproducible only under specific configurations:
> > >
> > > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > > the number of cores) with long lived socket connection.
> > > 2. Low power (frequency) mode.
> > > 3. Power management is enabled.
> > >
> > > While holding lock, __check_pf calls make_request which calls __sendto
> > > and __recvmsg. Since __sendto and __recvmsg are cancellation points,
> > > lock held by __check_pf won't be released and can cause deadlock when
> > > thread cancellation happens in __sendto or __recvmsg. Add a cancellation
> > > cleanup handler for __check_pf to unlock the lock when cancelled by
> > > another thread. This fixes BZ #20975 and the siege hang issue.
> > > ---
> > > sysdeps/unix/sysv/linux/Makefile | 2 ++
> > > sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> > > 2 files changed, 17 insertions(+)
> > >
> > > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > > index aec7a94785..0160be8790 100644
> > > --- a/sysdeps/unix/sysv/linux/Makefile
> > > +++ b/sysdeps/unix/sysv/linux/Makefile
> > > @@ -529,6 +529,8 @@ sysdep_headers += \
> > > sysdep_routines += \
> > > netlink_assert_response \
> > > # sysdep_routines
> > > +
> > > +CFLAGS-check_pf.c += -fexceptions
> > > endif
> > >
> > > # Don't compile the ctype glue code, since there is no old non-GNU C library.
> > > diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> > > index b157c5126c..2b0b8b6368 100644
> > > --- a/sysdeps/unix/sysv/linux/check_pf.c
> > > +++ b/sysdeps/unix/sysv/linux/check_pf.c
> > > @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> > > return NULL;
> > > }
> > >
> > > +#ifdef __EXCEPTIONS
> > > +static void
> > > +cancel_handler (void *arg __attribute__((unused)))
> > > +{
> > > + /* Release the lock. */
> > > + __libc_lock_unlock (lock);
> > > +}
> > > +#endif
> > >
> > > void
> > > attribute_hidden
> > > @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > > struct cached_data *olddata = NULL;
> > > struct cached_data *data = NULL;
> > >
> > > +#ifdef __EXCEPTIONS
> > > + /* Make sure that lock is released when the thread is cancelled. */
> > > + __libc_cleanup_push (cancel_handler, NULL);
> > > +#endif
> > > __libc_lock_lock (lock);
> > >
> > > if (cache_valid_p ())
> > > @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > > }
> > > }
> > >
> > > +#ifdef __EXCEPTIONS
> > > + __libc_cleanup_pop (0);
> > > +#endif
> > > __libc_lock_unlock (lock);
> > >
> > > if (data != NULL)
> > > --
> > > 2.40.0
> > >
> >
> > Can we add a test the will just XFAIL (or XPASS) if cores < 64?
>
> There is no simple test for this. This is one reason why it hasn't been fixed
> since 2016.
>
Did you verify this fixes issue in siege?
> --
> H.J.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
2023-04-27 21:44 ` Noah Goldstein
@ 2023-04-27 21:50 ` H.J. Lu
0 siblings, 0 replies; 13+ messages in thread
From: H.J. Lu @ 2023-04-27 21:50 UTC (permalink / raw)
To: Noah Goldstein; +Cc: libc-alpha
On Thu, Apr 27, 2023 at 2:44 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 4:00 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Thu, Apr 27, 2023 at 1:43 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > >
> > > On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
> > > <libc-alpha@sourceware.org> wrote:
> > > >
> > > > There are reports for hang in __check_pf:
> > > >
> > > > https://github.com/JoeDog/siege/issues/4
> > > >
> > > > It is reproducible only under specific configurations:
> > > >
> > > > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > > > the number of cores) with long lived socket connection.
> > > > 2. Low power (frequency) mode.
> > > > 3. Power management is enabled.
> > > >
> > > > While holding lock, __check_pf calls make_request which calls __sendto
> > > > and __recvmsg. Since __sendto and __recvmsg are cancellation points,
> > > > lock held by __check_pf won't be released and can cause deadlock when
> > > > thread cancellation happens in __sendto or __recvmsg. Add a cancellation
> > > > cleanup handler for __check_pf to unlock the lock when cancelled by
> > > > another thread. This fixes BZ #20975 and the siege hang issue.
> > > > ---
> > > > sysdeps/unix/sysv/linux/Makefile | 2 ++
> > > > sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> > > > 2 files changed, 17 insertions(+)
> > > >
> > > > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > > > index aec7a94785..0160be8790 100644
> > > > --- a/sysdeps/unix/sysv/linux/Makefile
> > > > +++ b/sysdeps/unix/sysv/linux/Makefile
> > > > @@ -529,6 +529,8 @@ sysdep_headers += \
> > > > sysdep_routines += \
> > > > netlink_assert_response \
> > > > # sysdep_routines
> > > > +
> > > > +CFLAGS-check_pf.c += -fexceptions
> > > > endif
> > > >
> > > > # Don't compile the ctype glue code, since there is no old non-GNU C library.
> > > > diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> > > > index b157c5126c..2b0b8b6368 100644
> > > > --- a/sysdeps/unix/sysv/linux/check_pf.c
> > > > +++ b/sysdeps/unix/sysv/linux/check_pf.c
> > > > @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> > > > return NULL;
> > > > }
> > > >
> > > > +#ifdef __EXCEPTIONS
> > > > +static void
> > > > +cancel_handler (void *arg __attribute__((unused)))
> > > > +{
> > > > + /* Release the lock. */
> > > > + __libc_lock_unlock (lock);
> > > > +}
> > > > +#endif
> > > >
> > > > void
> > > > attribute_hidden
> > > > @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > > > struct cached_data *olddata = NULL;
> > > > struct cached_data *data = NULL;
> > > >
> > > > +#ifdef __EXCEPTIONS
> > > > + /* Make sure that lock is released when the thread is cancelled. */
> > > > + __libc_cleanup_push (cancel_handler, NULL);
> > > > +#endif
> > > > __libc_lock_lock (lock);
> > > >
> > > > if (cache_valid_p ())
> > > > @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > > > }
> > > > }
> > > >
> > > > +#ifdef __EXCEPTIONS
> > > > + __libc_cleanup_pop (0);
> > > > +#endif
> > > > __libc_lock_unlock (lock);
> > > >
> > > > if (data != NULL)
> > > > --
> > > > 2.40.0
> > > >
> > >
> > > Can we add a test the will just XFAIL (or XPASS) if cores < 64?
> >
> > There is no simple test for this. This is one reason why it hasn't been fixed
> > since 2016.
> >
>
> Did you verify this fixes issue in siege?
> > --
> > H.J.
Yes.
--
H.J.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
2023-04-27 20:06 [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975] H.J. Lu
2023-04-27 20:42 ` Noah Goldstein
@ 2023-04-27 22:03 ` Noah Goldstein
2023-04-27 22:10 ` H.J. Lu
2023-04-28 8:38 ` Florian Weimer
2023-04-28 19:17 ` Noah Goldstein
3 siblings, 1 reply; 13+ messages in thread
From: Noah Goldstein @ 2023-04-27 22:03 UTC (permalink / raw)
To: H.J. Lu; +Cc: libc-alpha
On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> There are reports for hang in __check_pf:
>
> https://github.com/JoeDog/siege/issues/4
>
> It is reproducible only under specific configurations:
>
> 1. Large number of cores (>= 64) and large number of threads (> 3X of
> the number of cores) with long lived socket connection.
> 2. Low power (frequency) mode.
> 3. Power management is enabled.
>
> While holding lock, __check_pf calls make_request which calls __sendto
> and __recvmsg. Since __sendto and __recvmsg are cancellation points,
> lock held by __check_pf won't be released and can cause deadlock when
> thread cancellation happens in __sendto or __recvmsg. Add a cancellation
> cleanup handler for __check_pf to unlock the lock when cancelled by
> another thread. This fixes BZ #20975 and the siege hang issue.
> ---
> sysdeps/unix/sysv/linux/Makefile | 2 ++
> sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index aec7a94785..0160be8790 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -529,6 +529,8 @@ sysdep_headers += \
> sysdep_routines += \
> netlink_assert_response \
> # sysdep_routines
> +
> +CFLAGS-check_pf.c += -fexceptions
> endif
>
> # Don't compile the ctype glue code, since there is no old non-GNU C library.
> diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> index b157c5126c..2b0b8b6368 100644
> --- a/sysdeps/unix/sysv/linux/check_pf.c
> +++ b/sysdeps/unix/sysv/linux/check_pf.c
> @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> return NULL;
> }
>
> +#ifdef __EXCEPTIONS
> +static void
> +cancel_handler (void *arg __attribute__((unused)))
> +{
> + /* Release the lock. */
> + __libc_lock_unlock (lock);
> +}
> +#endif
>
> void
> attribute_hidden
> @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> struct cached_data *olddata = NULL;
> struct cached_data *data = NULL;
>
> +#ifdef __EXCEPTIONS
> + /* Make sure that lock is released when the thread is cancelled. */
> + __libc_cleanup_push (cancel_handler, NULL);
> +#endif
Should we wait until we have successfully taken the lock to push this?
> __libc_lock_lock (lock);
>
> if (cache_valid_p ())
> @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> }
> }
>
> +#ifdef __EXCEPTIONS
> + __libc_cleanup_pop (0);
> +#endif
> __libc_lock_unlock (lock);
>
> if (data != NULL)
> --
> 2.40.0
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
2023-04-27 22:03 ` Noah Goldstein
@ 2023-04-27 22:10 ` H.J. Lu
0 siblings, 0 replies; 13+ messages in thread
From: H.J. Lu @ 2023-04-27 22:10 UTC (permalink / raw)
To: Noah Goldstein; +Cc: libc-alpha
On Thu, Apr 27, 2023 at 3:03 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > There are reports for hang in __check_pf:
> >
> > https://github.com/JoeDog/siege/issues/4
> >
> > It is reproducible only under specific configurations:
> >
> > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > the number of cores) with long lived socket connection.
> > 2. Low power (frequency) mode.
> > 3. Power management is enabled.
> >
> > While holding lock, __check_pf calls make_request which calls __sendto
> > and __recvmsg. Since __sendto and __recvmsg are cancellation points,
> > lock held by __check_pf won't be released and can cause deadlock when
> > thread cancellation happens in __sendto or __recvmsg. Add a cancellation
> > cleanup handler for __check_pf to unlock the lock when cancelled by
> > another thread. This fixes BZ #20975 and the siege hang issue.
> > ---
> > sysdeps/unix/sysv/linux/Makefile | 2 ++
> > sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> > 2 files changed, 17 insertions(+)
> >
> > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > index aec7a94785..0160be8790 100644
> > --- a/sysdeps/unix/sysv/linux/Makefile
> > +++ b/sysdeps/unix/sysv/linux/Makefile
> > @@ -529,6 +529,8 @@ sysdep_headers += \
> > sysdep_routines += \
> > netlink_assert_response \
> > # sysdep_routines
> > +
> > +CFLAGS-check_pf.c += -fexceptions
> > endif
> >
> > # Don't compile the ctype glue code, since there is no old non-GNU C library.
> > diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> > index b157c5126c..2b0b8b6368 100644
> > --- a/sysdeps/unix/sysv/linux/check_pf.c
> > +++ b/sysdeps/unix/sysv/linux/check_pf.c
> > @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> > return NULL;
> > }
> >
> > +#ifdef __EXCEPTIONS
> > +static void
> > +cancel_handler (void *arg __attribute__((unused)))
> > +{
> > + /* Release the lock. */
> > + __libc_lock_unlock (lock);
> > +}
> > +#endif
> >
> > void
> > attribute_hidden
> > @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > struct cached_data *olddata = NULL;
> > struct cached_data *data = NULL;
> >
> > +#ifdef __EXCEPTIONS
> > + /* Make sure that lock is released when the thread is cancelled. */
> > + __libc_cleanup_push (cancel_handler, NULL);
> > +#endif
>
> Should we wait until we have successfully taken the lock to push this?
The cleanup handler is called only when thread cancellation happens
in __sendto or __recvmsg called from make_request after lock has
been taken. The order here isn't critical.
>
> > __libc_lock_lock (lock);
> >
> > if (cache_valid_p ())
> > @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > }
> > }
> >
> > +#ifdef __EXCEPTIONS
> > + __libc_cleanup_pop (0);
> > +#endif
> > __libc_lock_unlock (lock);
> >
> > if (data != NULL)
> > --
> > 2.40.0
> >
--
H.J.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
2023-04-27 20:06 [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975] H.J. Lu
2023-04-27 20:42 ` Noah Goldstein
2023-04-27 22:03 ` Noah Goldstein
@ 2023-04-28 8:38 ` Florian Weimer
2023-04-28 15:33 ` H.J. Lu
2023-04-28 19:17 ` Noah Goldstein
3 siblings, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2023-04-28 8:38 UTC (permalink / raw)
To: H.J. Lu via Libc-alpha; +Cc: H.J. Lu
* H. J. Lu via Libc-alpha:
> There are reports for hang in __check_pf:
>
> https://github.com/JoeDog/siege/issues/4
>
> It is reproducible only under specific configurations:
>
> 1. Large number of cores (>= 64) and large number of threads (> 3X of
> the number of cores) with long lived socket connection.
> 2. Low power (frequency) mode.
> 3. Power management is enabled.
>
> While holding lock, __check_pf calls make_request which calls __sendto
> and __recvmsg. Since __sendto and __recvmsg are cancellation points,
> lock held by __check_pf won't be released and can cause deadlock when
> thread cancellation happens in __sendto or __recvmsg. Add a cancellation
> cleanup handler for __check_pf to unlock the lock when cancelled by
> another thread. This fixes BZ #20975 and the siege hang issue.
It's probably easier to reproduce if the system has many network
interfaces with lots of addresses.
Doesn't getaddrinfo leak all kind of resources when canceled? That's
more difficult to fix, though.
Thanks,
Florian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
2023-04-28 8:38 ` Florian Weimer
@ 2023-04-28 15:33 ` H.J. Lu
0 siblings, 0 replies; 13+ messages in thread
From: H.J. Lu @ 2023-04-28 15:33 UTC (permalink / raw)
To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha
On Fri, Apr 28, 2023 at 1:38 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > There are reports for hang in __check_pf:
> >
> > https://github.com/JoeDog/siege/issues/4
> >
> > It is reproducible only under specific configurations:
> >
> > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > the number of cores) with long lived socket connection.
> > 2. Low power (frequency) mode.
> > 3. Power management is enabled.
> >
> > While holding lock, __check_pf calls make_request which calls __sendto
> > and __recvmsg. Since __sendto and __recvmsg are cancellation points,
> > lock held by __check_pf won't be released and can cause deadlock when
> > thread cancellation happens in __sendto or __recvmsg. Add a cancellation
> > cleanup handler for __check_pf to unlock the lock when cancelled by
> > another thread. This fixes BZ #20975 and the siege hang issue.
>
> It's probably easier to reproduce if the system has many network
> interfaces with lots of addresses.
>
> Doesn't getaddrinfo leak all kind of resources when canceled? That's
> more difficult to fix, though.
We will work on it if there are reports.
--
H.J.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
2023-04-27 20:06 [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975] H.J. Lu
` (2 preceding siblings ...)
2023-04-28 8:38 ` Florian Weimer
@ 2023-04-28 19:17 ` Noah Goldstein
2023-05-22 20:37 ` H.J. Lu
3 siblings, 1 reply; 13+ messages in thread
From: Noah Goldstein @ 2023-04-28 19:17 UTC (permalink / raw)
To: H.J. Lu; +Cc: libc-alpha
On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> There are reports for hang in __check_pf:
>
> https://github.com/JoeDog/siege/issues/4
>
> It is reproducible only under specific configurations:
>
> 1. Large number of cores (>= 64) and large number of threads (> 3X of
> the number of cores) with long lived socket connection.
> 2. Low power (frequency) mode.
> 3. Power management is enabled.
>
> While holding lock, __check_pf calls make_request which calls __sendto
> and __recvmsg. Since __sendto and __recvmsg are cancellation points,
> lock held by __check_pf won't be released and can cause deadlock when
> thread cancellation happens in __sendto or __recvmsg. Add a cancellation
> cleanup handler for __check_pf to unlock the lock when cancelled by
> another thread. This fixes BZ #20975 and the siege hang issue.
> ---
> sysdeps/unix/sysv/linux/Makefile | 2 ++
> sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index aec7a94785..0160be8790 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -529,6 +529,8 @@ sysdep_headers += \
> sysdep_routines += \
> netlink_assert_response \
> # sysdep_routines
> +
> +CFLAGS-check_pf.c += -fexceptions
> endif
>
> # Don't compile the ctype glue code, since there is no old non-GNU C library.
> diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> index b157c5126c..2b0b8b6368 100644
> --- a/sysdeps/unix/sysv/linux/check_pf.c
> +++ b/sysdeps/unix/sysv/linux/check_pf.c
> @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> return NULL;
> }
>
> +#ifdef __EXCEPTIONS
> +static void
> +cancel_handler (void *arg __attribute__((unused)))
> +{
> + /* Release the lock. */
> + __libc_lock_unlock (lock);
> +}
> +#endif
>
> void
> attribute_hidden
> @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> struct cached_data *olddata = NULL;
> struct cached_data *data = NULL;
>
> +#ifdef __EXCEPTIONS
> + /* Make sure that lock is released when the thread is cancelled. */
> + __libc_cleanup_push (cancel_handler, NULL);
> +#endif
> __libc_lock_lock (lock);
>
> if (cache_valid_p ())
> @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> }
> }
>
> +#ifdef __EXCEPTIONS
> + __libc_cleanup_pop (0);
> +#endif
> __libc_lock_unlock (lock);
>
> if (data != NULL)
> --
> 2.40.0
>
LGTM.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975]
2023-04-28 19:17 ` Noah Goldstein
@ 2023-05-22 20:37 ` H.J. Lu
2023-05-23 8:58 ` Florian Weimer
0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2023-05-22 20:37 UTC (permalink / raw)
To: Noah Goldstein, Libc-stable Mailing List; +Cc: libc-alpha
On Fri, Apr 28, 2023 at 12:17 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 3:06 PM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > There are reports for hang in __check_pf:
> >
> > https://github.com/JoeDog/siege/issues/4
> >
> > It is reproducible only under specific configurations:
> >
> > 1. Large number of cores (>= 64) and large number of threads (> 3X of
> > the number of cores) with long lived socket connection.
> > 2. Low power (frequency) mode.
> > 3. Power management is enabled.
> >
> > While holding lock, __check_pf calls make_request which calls __sendto
> > and __recvmsg. Since __sendto and __recvmsg are cancellation points,
> > lock held by __check_pf won't be released and can cause deadlock when
> > thread cancellation happens in __sendto or __recvmsg. Add a cancellation
> > cleanup handler for __check_pf to unlock the lock when cancelled by
> > another thread. This fixes BZ #20975 and the siege hang issue.
> > ---
> > sysdeps/unix/sysv/linux/Makefile | 2 ++
> > sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++
> > 2 files changed, 17 insertions(+)
> >
> > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > index aec7a94785..0160be8790 100644
> > --- a/sysdeps/unix/sysv/linux/Makefile
> > +++ b/sysdeps/unix/sysv/linux/Makefile
> > @@ -529,6 +529,8 @@ sysdep_headers += \
> > sysdep_routines += \
> > netlink_assert_response \
> > # sysdep_routines
> > +
> > +CFLAGS-check_pf.c += -fexceptions
> > endif
> >
> > # Don't compile the ctype glue code, since there is no old non-GNU C library.
> > diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
> > index b157c5126c..2b0b8b6368 100644
> > --- a/sysdeps/unix/sysv/linux/check_pf.c
> > +++ b/sysdeps/unix/sysv/linux/check_pf.c
> > @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid)
> > return NULL;
> > }
> >
> > +#ifdef __EXCEPTIONS
> > +static void
> > +cancel_handler (void *arg __attribute__((unused)))
> > +{
> > + /* Release the lock. */
> > + __libc_lock_unlock (lock);
> > +}
> > +#endif
> >
> > void
> > attribute_hidden
> > @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > struct cached_data *olddata = NULL;
> > struct cached_data *data = NULL;
> >
> > +#ifdef __EXCEPTIONS
> > + /* Make sure that lock is released when the thread is cancelled. */
> > + __libc_cleanup_push (cancel_handler, NULL);
> > +#endif
> > __libc_lock_lock (lock);
> >
> > if (cache_valid_p ())
> > @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6,
> > }
> > }
> >
> > +#ifdef __EXCEPTIONS
> > + __libc_cleanup_pop (0);
> > +#endif
> > __libc_lock_unlock (lock);
> >
> > if (data != NULL)
> > --
> > 2.40.0
> >
>
> LGTM.
OK for release branches?
Thanks.
--
H.J.
^ permalink raw reply [flat|nested] 13+ messages in thread