From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by sourceware.org (Postfix) with ESMTPS id D73363857C52 for ; Wed, 10 Nov 2021 15:53:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D73363857C52 Received: by mail-pl1-x631.google.com with SMTP id t21so3370545plr.6 for ; Wed, 10 Nov 2021 07:53:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=QGfErq01z8r+4O3n/CtoSuztTZXfXP5WZzxt9dISsYk=; b=rfXa0tSlHJjGTMftzfX53QDxZL5FQInG1IQNcPVhP+Eu6GaJnWmHgzEi44svfl5r81 v/03GwU2AwaGA0xkP08CRBbmEA/zH8qG7a3xIq05rfZY3knlVRJZUYi0utZ9RdvBzfLE vEs3tEqjYi/oktRcb+5qO/zCSfx3gGFz47EHkAW1UdB84Zvh5/IWE/ABsQZGyENuX2Gw uuSWLA+ekCKX/ql8NXiIBFupR2FeGT7eF6p9KbIIJ6LHAqn26Gm4l77lJ1O5hyIdTC0x 1zoqWrY2HaenTNR/yH+C1ZBth/P46xuAusnghowgvntUi2tLoouOzN68Febb0CnwuZ4S xYOw== X-Gm-Message-State: AOAM5329OVvRSUJuzGcHHbMwcMZac5pNMi/8dp71Xg9pJKHYrdyPiulu n97vo/TxzY1SMtR6DpJ896+k6+4b1nt+VYSJfob3zvST X-Google-Smtp-Source: ABdhPJw7cptZc2tMoVNLBwpaDMJrqyfLyJ9u3rNbv5rOrBUds7RBvUfe/4zB9AHZKoVpnZfvVqJPwyBBfemBMl5+TAY= X-Received: by 2002:a17:90b:1e0e:: with SMTP id pg14mr17745212pjb.143.1636559582767; Wed, 10 Nov 2021 07:53:02 -0800 (PST) MIME-Version: 1.0 References: <20211110001614.2087610-1-hjl.tools@gmail.com> <20211110153559.GC4930@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> <20211110155032.GD4930@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> In-Reply-To: <20211110155032.GD4930@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> From: "H.J. Lu" Date: Wed, 10 Nov 2021 07:52:27 -0800 Message-ID: Subject: Re: [PATCH v4 0/3] Optimize CAS [BZ #28537] To: "Paul A. Clarke" Cc: GNU C Library , Florian Weimer , Arjan van de Ven , Andreas Schwab Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3023.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Nov 2021 15:53:06 -0000 On Wed, Nov 10, 2021 at 7:50 AM Paul A. Clarke wrote: > > On Wed, Nov 10, 2021 at 07:42:20AM -0800, H.J. Lu wrote: > > On Wed, Nov 10, 2021 at 7:36 AM Paul A. Clarke wrote: > > > > > > On Tue, Nov 09, 2021 at 04:16:11PM -0800, H.J. Lu via Libc-alpha wrot= e: > > > > CAS instruction is expensive. From the x86 CPU's point of view, ge= tting > > > > a cache line for writing is more expensive than reading. See Appen= dix > > > > A.2 Spinlock in: > > > > > > > > https://www.intel.com/content/dam/www/public/us/en/documents/white-= papers/xeon-lock-scaling-analysis-paper.pdf > > > > > > > > The full compare and swap will grab the cache line exclusive and ca= use > > > > excessive cache line bouncing. > > > > > > > > Optimize CAS in low level locks and pthread_mutex_lock.c: > > > > > > > > 1. Do an atomic load and skip CAS if compare may fail to reduce cac= he > > > > line bouncing on contended locks. > > > > 2. Replace atomic_compare_and_exchange_bool_acq with > > > > atomic_compare_and_exchange_val_acq to avoid the extra load. > > > > 3. Drop __glibc_unlikely in __lll_trylock and lll_cond_trylock sinc= e we > > > > don't know if it's actually rare; in the contended case it is clear= ly not > > > > rare. > > > > > > I see build errors: > > > > > > In file included from pthread_mutex_cond_lock.c:23: > > > ../nptl/pthread_mutex_lock.c: In function =E2=80=98__pthread_mutex_co= nd_lock_full=E2=80=99: > > > ../nptl/pthread_mutex_lock.c:442:6: error: a label can only be part o= f a statement and a declaration is not a statement > > > int private =3D (robust > > > ^~~ > > > ../nptl/pthread_mutex_lock.c:445:6: error: expected expression before= =E2=80=98int=E2=80=99 > > > int e =3D __futex_lock_pi64 (&mutex->__data.__lock, 0 /* ununse= d */, > > > ^~~ > > > ../nptl/pthread_mutex_lock.c:447:10: error: =E2=80=98e=E2=80=99 undec= lared (first use in this function) > > > if (e =3D=3D ESRCH || e =3D=3D EDEADLK) > > > ^ > > > ../nptl/pthread_mutex_lock.c:447:10: note: each undeclared identifier= is reported only once for each function it appears in > > > In file included from ../include/assert.h:1, > > > from ../nptl/pthread_mutex_lock.c:18, > > > from pthread_mutex_cond_lock.c:23: > > > ../assert/assert.h:105:34: error: left-hand operand of comma expressi= on has no effect [-Werror=3Dunused-value] > > > ((void) sizeof ((expr) ? 1 : 0), __extension__ ({ \ > > > ^ > > > ../nptl/pthread_mutex_lock.c:449:3: note: in expansion of macro =E2= =80=98assert=E2=80=99 > > > assert (e !=3D EDEADLK > > > ^~~~~~ > > > ../assert/assert.h:105:34: error: left-hand operand of comma expressi= on has no effect [-Werror=3Dunused-value] > > > ((void) sizeof ((expr) ? 1 : 0), __extension__ ({ \ > > > ^ > > > ../nptl/pthread_mutex_lock.c:454:3: note: in expansion of macro =E2= =80=98assert=E2=80=99 > > > assert (e !=3D ESRCH || !robust); > > > ^~~~~~ > > > > On PPC? > > Yes, powerpc64le. Failing command: > gcc pthread_mutex_cond_lock.c -c -std=3Dgnu11 -fgnu89-inline -g -O2 -Wall= -Wwrite-strings -Wundef -Werror -fmerge-all-constants -frounding-math -fno= -stack-protector -fno-common -Wstrict-prototypes -Wold-style-definition -fm= ath-errno -mabi=3Dieeelongdouble -Wno-psabi -mno-gnu-attribute -mlong-doubl= e-128 -ftls-model=3Dinitial-exec -I../include -I/home/pc/locks/build-patche= d/nptl -I/home/pc/locks/build-patched -I../sysdeps/unix/sysv/linux/powerpc/= powerpc64/le/fpu -I../sysdeps/unix/sysv/linux/powerpc/powerpc64/fpu -I../sy= sdeps/unix/sysv/linux/powerpc/powerpc64/le -I../sysdeps/unix/sysv/linux/pow= erpc/powerpc64 -I../sysdeps/unix/sysv/linux/wordsize-64 -I../sysdeps/unix/s= ysv/linux/powerpc -I../sysdeps/powerpc/nptl -I../sysdeps/unix/sysv/linux/in= clude -I../sysdeps/unix/sysv/linux -I../sysdeps/nptl -I../sysdeps/pthread -= I../sysdeps/gnu -I../sysdeps/unix/inet -I../sysdeps/unix/sysv -I../sysdeps/= unix/powerpc -I../sysdeps/unix -I../sysdeps/posix -I../sysdeps/powerpc/powe= rpc64/le/power8/fpu/multiarch -I../sysdeps/powerpc/powerpc64/le/power7/fpu/= multiarch -I../sysdeps/powerpc/powerpc64/le/fpu/multiarch -I../sysdeps/powe= rpc/powerpc64/le/power8/fpu -I../sysdeps/powerpc/powerpc64/le/power7/fpu -I= ../sysdeps/powerpc/powerpc64/le/fpu -I../sysdeps/powerpc/powerpc64/fpu -I..= /sysdeps/powerpc/powerpc64/le/power8/multiarch -I../sysdeps/powerpc/powerpc= 64/le/power7/multiarch -I../sysdeps/powerpc/powerpc64/le/multiarch -I../sys= deps/powerpc/powerpc64/multiarch -I../sysdeps/powerpc/powerpc64/le/power8 -= I../sysdeps/powerpc/powerpc64/power8 -I../sysdeps/powerpc/powerpc64/le/powe= r7 -I../sysdeps/powerpc/powerpc64/power7 -I../sysdeps/powerpc/powerpc64/pow= er6 -I../sysdeps/powerpc/powerpc64/power4 -I../sysdeps/powerpc/power4 -I../= sysdeps/powerpc/powerpc64/le -I../sysdeps/powerpc/powerpc64 -I../sysdeps/wo= rdsize-64 -I../sysdeps/powerpc/fpu -I../sysdeps/powerpc -I../sysdeps/ieee75= 4/ldbl-128ibm-compat -I../sysdeps/ieee754/ldbl-128ibm/include -I../sysdeps/= ieee754/ldbl-128ibm -I../sysdeps/ieee754/ldbl-opt -I../sysdeps/ieee754/dbl-= 64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/ieee754/float128 -I../sysdeps/i= eee754 -I../sysdeps/generic -I.. -I../libio -I. -D_LIBC_REENTRANT -include = /home/pc/locks/build-patched/libc-modules.h -DMODULE_NAME=3Dlibc -include .= ./include/libc-symbols.h -DTOP_NAMESPACE=3Dglibc -o /home/pc/locks/build-pa= tched/nptl/pthread_mutex_cond_lock.o -MD -MP -MF /home/pc/locks/build-patch= ed/nptl/pthread_mutex_cond_lock.o.dt -MT /home/pc/locks/build-patched/nptl/= pthread_mutex_cond_lock.o > > PC I am testing it now. --=20 H.J.