From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) by sourceware.org (Postfix) with ESMTPS id CA77C3858C20 for ; Thu, 27 Apr 2023 21:44:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CA77C3858C20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x52c.google.com with SMTP id 4fb4d7f45d1cf-504e232fe47so16566488a12.2 for ; Thu, 27 Apr 2023 14:44:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682631891; x=1685223891; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rHzy84AoSMRGH0hR6wu+v5H5KFmOej2A+bGhr/ljlHU=; b=H//ygnoecLgS1IfVJlDV+odGKmG41kkK1cdDUWi7fciAC5D529q2jiZwkOrpz3WTDn wARQs39AQkbBA+kbciAIDuiczOpr3fzsjGsG2JVp/lRjRVSd99OnUDZ2cxs4ymqDtQnZ ztdEn34rz16C6lFY9/1A4R1MafTOSehrWyBx2iPWnNN0JcHL6EnzrqL9p0EdQezo/NWi KiNpEw/qn9Dg8dSTlyvcsA3bOf4ZOgNaqx04FK9lJzbmWIvpgxeWXSa72WI0wcY/MEWs +FP/ZRxaamTCw6099kPSm7AKrz3SrU3FUwHNtA6qNaHFxYJSbRM5mpr3+Xqy4+6cFTRo oFtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682631891; x=1685223891; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rHzy84AoSMRGH0hR6wu+v5H5KFmOej2A+bGhr/ljlHU=; b=Vlfkuuc/2aYbsXdoJE2TU+/Y5jallZ1VJGsT0JKh4BkZGoJoK9E/8eoWlX0aIP9hwr 4Qi4LQ8yPJIQ3YyeaqWtMyXR3c4kjayOuN3rWhm/g4cObtZ0MEvdubPwlUxZSmwtRAPT i4p/Somn5AvvNfYBs/6oJcegZbnN5znRo8sJSe2jvOm3kWjQclPE+QuPmQYpy+0qLFbt Vv2f+nAbB6fyb1+3btMiIwuTW7CaSxmZ/veZ0i6b6XLUsjYTyfsdxFLa5GS6pOPxI8Hp A3HrSD5rv3Tcwr0gsRkbmU6el9RMmdY5OTtAgRBNZLDaXznxfRidjAH2Lxfle+QSEyWZ L4Pw== X-Gm-Message-State: AC+VfDwVhs4QHj+BCT1zWYXwu9iSneyiemXMwFHM+wh964Wm2CXCnrvp 7QL3Fu/I5EqxV5iG1B96OT8OoS8qjVF/4zuLfjuhZ5N7 X-Google-Smtp-Source: ACHHUZ5s21zLK/5s1Dkb0UVSMU/CkwSXHN2rrn21TfdKux0iC3CBLdJTT/yZfJgM4X3eS1Boby8j/MFhLL48iitUDgw= X-Received: by 2002:a05:6402:2301:b0:506:be07:3473 with SMTP id l1-20020a056402230100b00506be073473mr3377977eda.4.1682631891280; Thu, 27 Apr 2023 14:44:51 -0700 (PDT) MIME-Version: 1.0 References: <20230427200615.1496059-1-hjl.tools@gmail.com> In-Reply-To: From: Noah Goldstein Date: Thu, 27 Apr 2023 16:44:40 -0500 Message-ID: Subject: Re: [PATCH] __check_pf: Add a cancellation cleanup handler [BZ #20975] To: "H.J. Lu" Cc: libc-alpha@sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Apr 27, 2023 at 4:00=E2=80=AFPM H.J. Lu wrote= : > > On Thu, Apr 27, 2023 at 1:43=E2=80=AFPM Noah Goldstein wrote: > > > > On Thu, Apr 27, 2023 at 3:06=E2=80=AFPM H.J. Lu via Libc-alpha > > wrote: > > > > > > There are reports for hang in __check_pf: > > > > > > https://github.com/JoeDog/siege/issues/4 > > > > > > It is reproducible only under specific configurations: > > > > > > 1. Large number of cores (>=3D 64) and large number of threads (> 3X = of > > > the number of cores) with long lived socket connection. > > > 2. Low power (frequency) mode. > > > 3. Power management is enabled. > > > > > > While holding lock, __check_pf calls make_request which calls __sendt= o > > > and __recvmsg. Since __sendto and __recvmsg are cancellation points, > > > lock held by __check_pf won't be released and can cause deadlock when > > > thread cancellation happens in __sendto or __recvmsg. Add a cancella= tion > > > cleanup handler for __check_pf to unlock the lock when cancelled by > > > another thread. This fixes BZ #20975 and the siege hang issue. > > > --- > > > sysdeps/unix/sysv/linux/Makefile | 2 ++ > > > sysdeps/unix/sysv/linux/check_pf.c | 15 +++++++++++++++ > > > 2 files changed, 17 insertions(+) > > > > > > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/lin= ux/Makefile > > > index aec7a94785..0160be8790 100644 > > > --- a/sysdeps/unix/sysv/linux/Makefile > > > +++ b/sysdeps/unix/sysv/linux/Makefile > > > @@ -529,6 +529,8 @@ sysdep_headers +=3D \ > > > sysdep_routines +=3D \ > > > netlink_assert_response \ > > > # sysdep_routines > > > + > > > +CFLAGS-check_pf.c +=3D -fexceptions > > > endif > > > > > > # Don't compile the ctype glue code, since there is no old non-GNU C= library. > > > diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/l= inux/check_pf.c > > > index b157c5126c..2b0b8b6368 100644 > > > --- a/sysdeps/unix/sysv/linux/check_pf.c > > > +++ b/sysdeps/unix/sysv/linux/check_pf.c > > > @@ -292,6 +292,14 @@ make_request (int fd, pid_t pid) > > > return NULL; > > > } > > > > > > +#ifdef __EXCEPTIONS > > > +static void > > > +cancel_handler (void *arg __attribute__((unused))) > > > +{ > > > + /* Release the lock. */ > > > + __libc_lock_unlock (lock); > > > +} > > > +#endif > > > > > > void > > > attribute_hidden > > > @@ -304,6 +312,10 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6, > > > struct cached_data *olddata =3D NULL; > > > struct cached_data *data =3D NULL; > > > > > > +#ifdef __EXCEPTIONS > > > + /* Make sure that lock is released when the thread is cancelled. = */ > > > + __libc_cleanup_push (cancel_handler, NULL); > > > +#endif > > > __libc_lock_lock (lock); > > > > > > if (cache_valid_p ()) > > > @@ -338,6 +350,9 @@ __check_pf (bool *seen_ipv4, bool *seen_ipv6, > > > } > > > } > > > > > > +#ifdef __EXCEPTIONS > > > + __libc_cleanup_pop (0); > > > +#endif > > > __libc_lock_unlock (lock); > > > > > > if (data !=3D NULL) > > > -- > > > 2.40.0 > > > > > > > Can we add a test the will just XFAIL (or XPASS) if cores < 64? > > There is no simple test for this. This is one reason why it hasn't been = fixed > since 2016. > Did you verify this fixes issue in siege? > -- > H.J.