From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id B4866385383F for ; Tue, 22 Jun 2021 15:20:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B4866385383F Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-243-V2bT3ezyNTO4OFmFcZ7q1w-1; Tue, 22 Jun 2021 11:20:53 -0400 X-MC-Unique: V2bT3ezyNTO4OFmFcZ7q1w-1 Received: by mail-wm1-f70.google.com with SMTP id j6-20020a05600c1906b029019e9c982271so861685wmq.0 for ; Tue, 22 Jun 2021 08:20:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=sJuwmtXJeRjnT0lnHIxyp1Y1/2QlDNa/X02/kpvx2gk=; b=UhMeR/9CZ/iVWrkW+2b+vMkTOJ3KYqpp/baVpcygvPuQca7+bxRhfpfmqykuiJ/1TI Wl/1WEUKP66AVxA8F85JS3bQtfyTQHrPxRE4Hm6RMA6WWb8bcD+Ov+dApyewj0G++31q FE4OOMYhc26GgkRMQZ/vHGepXGCQFqREHmv5+R/VpAdcKOKqD7XiihDXjRbBXpV2rP16 OtNWCCDCYNXqL9/7KQtpLCBlzUXwItqhabp45QS//Dk54LwLG8o1DMWQezt7dLu5yXMy 0GVTss726R0cvI2aL/Gy5lse8C8wOfgIRYkvEgDvhYq+FlSL0QkNJd6nD7DmGO4ddAT/ UPZw== X-Gm-Message-State: AOAM531/vSsDn1sFMgKqHSfSlJ5lRa6HoYQCeFIlGMn/cz6mwq4r3hWa 9udkG1G34GWxkDOCX1FgNtt196ufe0Bw1AnxVJ03JK6ensvJOHBY+8iwjZUfWakroSPzo/1AgaF bXEpuHvw409R1z7lCUXo2uKKpf5/RMd0= X-Received: by 2002:a5d:45c5:: with SMTP id b5mr5458165wrs.221.1624375252562; Tue, 22 Jun 2021 08:20:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxBMxGWg8CBPeYqIYouV7brsyyDs3EbS2Vj/VKE3zaV2sIb4bkMLVr4JWsRDq39L3Ew+AotHc5jYJUK5wuMKEo= X-Received: by 2002:a5d:45c5:: with SMTP id b5mr5458129wrs.221.1624375252213; Tue, 22 Jun 2021 08:20:52 -0700 (PDT) MIME-Version: 1.0 References: <1879624.3VsfAaAtOV@minbar> <14287489.dOz4zN4RiP@minbar> In-Reply-To: <14287489.dOz4zN4RiP@minbar> From: Jonathan Wakely Date: Tue, 22 Jun 2021 16:20:41 +0100 Message-ID: Subject: Re: [PATCH v2] libstdc++: Improve std::lock algorithm To: Matthias Kretz Cc: "libstdc++" , gcc Patches X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Jun 2021 15:20:57 -0000 On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote: > On Tuesday, 22 June 2021 14:51:26 CEST Jonathan Wakely wrote: > > With your suggestion > > to also drop std::tuple the number of parameters decides which > > function we call. And we don't instantiate std::tuple. And we can also > > get rid of the __try_to_lock function, which was only used to deduce > > the lock type rather than use tuple_element to get it. That's much > > nicer. > > =F0=9F=91=8D > > > > How about optimizing a likely common case where all lockables have th= e > > > same > > > type? In that case we don't require recursion and can manage stack > usage > > > much > > > simpler: > > The stack usage is bounded by the number of mutexes being locked, > > which is unlikely to get large, but we can do that. > > Right. I meant simpler, because it takes a while of staring at the > recursive > implementation to understand how it works. :) > > > We can do it for try_lock too: > > > > template > > int > > try_lock(_L1& __l1, _L2& __l2, _L3&... __l3) > > { > > #if __cplusplus >=3D 201703L > > if constexpr (is_same_v<_L1, _L2> > > && (is_same_v<_L1, _L3> && ...)) > > { > > constexpr int _Np =3D 2 + sizeof...(_L3); > > unique_lock<_L1> __locks[_Np] =3D { > > {__l1, try_to_lock}, {__l2, try_to_lock}, {__l3, > > try_to_lock}... }; > > This does a try_lock on all lockabes even if any of them fails. I think > that's > not only more expensive but also non-conforming. I think you need to defe= r > locking and then loop from beginning to end to break the loop on the firs= t > unsuccessful try_lock. > Oops, good point. I'll add a test for that too. Here's the fixed code: template inline int __try_lock_impl(_L0& __l0, _Lockables&... __lockables) { #if __cplusplus >=3D 201703L if constexpr ((is_same_v<_L0, _Lockables> && ...)) { constexpr int _Np =3D 1 + sizeof...(_Lockables); unique_lock<_L0> __locks[_Np] =3D { {__l0, defer_lock}, {__lockables, defer_lock}... }; for (int __i =3D 0; __i < _Np; ++__i) if (!__locks[__i].try_lock()) { const int __failed =3D __i; while (__i--) __locks[__i].unlock(); return __i; } for (auto& __l : __locks) __l.release(); return -1; } else #endif > > for (int __i =3D 0; __i < _Np; ++__i) > > if (!__locks[__i]) > > return __i; > > for (auto& __l : __locks) > > __l.release(); > > return -1; > > } > > else > > #endif > > return __detail::__try_lock_impl(__l1, __l2, __l3...); > > } > > > > > if constexpr ((is_same_v<_L0, _L1> && ...)) > > > { > > > constexpr int _Np =3D 1 + sizeof...(_L1); > > > std::array, _Np> __locks =3D { > > > {__l0, defer_lock}, {__l1, defer_lock}... > > > }; > > > int __first =3D 0; > > > do { > > > __locks[__first].lock(); > > > for (int __j =3D 1; __j < _Np; ++__j) > > > { > > > const int __idx =3D (__first + __j) % _Np; > > > if (!__locks[__idx].try_lock()) > > > { > > > for (int __k =3D __idx; __k !=3D __first; > > > __k =3D __k =3D=3D 1 ? _Np : __k - 1) > > > __locks[__k - 1].unlock(); > > > > This loop doesn't work if any try_lock fails when first=3D=3D0, because > > the loop termination condition is never reached. > > Uh yes. Which is the same reason why the __j loop doesn't start from > __first + > 1. > > > I find this a bit easier to understand than the loop above, and > > correct (I think): > > > > for (int __k =3D __j; __k !=3D 0; --__k) > > __locks[(__first + __k - 1) % _Np].unlock(); > > Yes, if only we had a wrapping integer type that wraps at an arbitrary N. > Like > unsigned int but with parameter, like: > > for (__wrapping_uint<_Np> __k =3D __idx; __k !=3D __first; --__k) > __locks[__k - 1].unlock(); > > This is the loop I wanted to write, except --__k is simpler to write and > __k - > 1 would also wrap around to _Np - 1 for __k =3D=3D 0. But if this is the = only > place it's not important enough to abstract. > We might be able to use __wrapping_uint in std::seed_seq::generate too, and maybe some other places in . But we can add that later if we decide it's worth it. > > [...] > > @@ -620,15 +632,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > * @post All arguments are locked. > > * > > * All arguments are locked via a sequence of calls to lock(), > > try_lock() - * and unlock(). If the call exits via an exception any > > locks that were - * obtained will be released. > > + * and unlock(). If this function exits via an exception any locks > that > > + * were obtained will be released. > > */ > > template > > void > > lock(_L1& __l1, _L2& __l2, _L3&... __l3) > > { > > - int __i =3D 0; > > - __detail::__lock_impl(__i, 0, __l1, __l2, __l3...); > > +#if __cplusplus >=3D 201703L > > I also considered moving it down here. Makes sense unless you want to cal= l > __detail::__lock_impl from other functions. And if we want to make it wor= k > for > pre-C++11 we could do > > using __homogeneous > =3D __and_, is_same<_L1, _L3>...>; > int __i =3D 0; > __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...); > > We don't need tag dispatching, we could just do: if _GLIBCXX17_CONSTEXPR (homogeneous::value) ... else ... because both branches are valid for the homogeneous case, i.e. we aren't using if-constexpr to avoid invalid instantiations. So for pre-C++17 it would still compile both branches (and instantiate the recursive function templates) but would use the iterative code at run time. But given that the default -std option is gnu++17 now, I'm OK with the iterative version only being used for C++17.