From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jwakely@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [216.205.24.124])
 by sourceware.org (Postfix) with ESMTP id B4866385383F
 for <libstdc++@gcc.gnu.org>; Tue, 22 Jun 2021 15:20:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B4866385383F
Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com
 [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-243-V2bT3ezyNTO4OFmFcZ7q1w-1; Tue, 22 Jun 2021 11:20:53 -0400
X-MC-Unique: V2bT3ezyNTO4OFmFcZ7q1w-1
Received: by mail-wm1-f70.google.com with SMTP id
 j6-20020a05600c1906b029019e9c982271so861685wmq.0
 for <libstdc++@gcc.gnu.org>; Tue, 22 Jun 2021 08:20:53 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=sJuwmtXJeRjnT0lnHIxyp1Y1/2QlDNa/X02/kpvx2gk=;
 b=UhMeR/9CZ/iVWrkW+2b+vMkTOJ3KYqpp/baVpcygvPuQca7+bxRhfpfmqykuiJ/1TI
 Wl/1WEUKP66AVxA8F85JS3bQtfyTQHrPxRE4Hm6RMA6WWb8bcD+Ov+dApyewj0G++31q
 FE4OOMYhc26GgkRMQZ/vHGepXGCQFqREHmv5+R/VpAdcKOKqD7XiihDXjRbBXpV2rP16
 OtNWCCDCYNXqL9/7KQtpLCBlzUXwItqhabp45QS//Dk54LwLG8o1DMWQezt7dLu5yXMy
 0GVTss726R0cvI2aL/Gy5lse8C8wOfgIRYkvEgDvhYq+FlSL0QkNJd6nD7DmGO4ddAT/
 UPZw==
X-Gm-Message-State: AOAM531/vSsDn1sFMgKqHSfSlJ5lRa6HoYQCeFIlGMn/cz6mwq4r3hWa
 9udkG1G34GWxkDOCX1FgNtt196ufe0Bw1AnxVJ03JK6ensvJOHBY+8iwjZUfWakroSPzo/1AgaF
 bXEpuHvw409R1z7lCUXo2uKKpf5/RMd0=
X-Received: by 2002:a5d:45c5:: with SMTP id b5mr5458165wrs.221.1624375252562; 
 Tue, 22 Jun 2021 08:20:52 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJxBMxGWg8CBPeYqIYouV7brsyyDs3EbS2Vj/VKE3zaV2sIb4bkMLVr4JWsRDq39L3Ew+AotHc5jYJUK5wuMKEo=
X-Received: by 2002:a5d:45c5:: with SMTP id b5mr5458129wrs.221.1624375252213; 
 Tue, 22 Jun 2021 08:20:52 -0700 (PDT)
MIME-Version: 1.0
References: <YNDND7Y3Nqk5MpBN@redhat.com> <1879624.3VsfAaAtOV@minbar>
 <CACb0b4n1YOxxRbG9e81eECnvBTQwnG2ck9eBHnNc063+BFLjSw@mail.gmail.com>
 <14287489.dOz4zN4RiP@minbar>
In-Reply-To: <14287489.dOz4zN4RiP@minbar>
From: Jonathan Wakely <jwakely@redhat.com>
Date: Tue, 22 Jun 2021 16:20:41 +0100
Message-ID: <CACb0b4mEP=BX9uNECKuQDRhGiXJ=qnhGTPQaXmUnA5zRbcJgYA@mail.gmail.com>
Subject: Re: [PATCH v2] libstdc++: Improve std::lock algorithm
To: Matthias Kretz <m.kretz@gsi.de>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>, gcc Patches <gcc-patches@gcc.gnu.org>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE,
 RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: libstdc++@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libstdc++ mailing list <libstdc++.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/libstdc++>,
 <mailto:libstdc++-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/libstdc++/>
List-Post: <mailto:libstdc++@gcc.gnu.org>
List-Help: <mailto:libstdc++-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/libstdc++>,
 <mailto:libstdc++-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Jun 2021 15:20:57 -0000

On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote:

> On Tuesday, 22 June 2021 14:51:26 CEST Jonathan Wakely wrote:
> > With your suggestion
> > to also drop std::tuple the number of parameters decides which
> > function we call. And we don't instantiate std::tuple. And we can also
> > get rid of the __try_to_lock function, which was only used to deduce
> > the lock type rather than use tuple_element to get it. That's much
> > nicer.
>
> =F0=9F=91=8D
>
> > > How about optimizing a likely common case where all lockables have th=
e
> > > same
> > > type? In that case we don't require recursion and can manage stack
> usage
> > > much
> > > simpler:
> > The stack usage is bounded by the number of mutexes being locked,
> > which is unlikely to get large, but we can do that.
>
> Right. I meant simpler, because it takes a while of staring at the
> recursive
> implementation to understand how it works. :)
>
> > We can do it for try_lock too:
> >
> >   template<typename _L1, typename _L2, typename... _L3>
> >     int
> >     try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
> >     {
> > #if __cplusplus >=3D 201703L
> >       if constexpr (is_same_v<_L1, _L2>
> >                     && (is_same_v<_L1, _L3> && ...))
> >         {
> >           constexpr int _Np =3D 2 + sizeof...(_L3);
> >           unique_lock<_L1> __locks[_Np] =3D {
> >               {__l1, try_to_lock}, {__l2, try_to_lock}, {__l3,
> > try_to_lock}... };
>
> This does a try_lock on all lockabes even if any of them fails. I think
> that's
> not only more expensive but also non-conforming. I think you need to defe=
r
> locking and then loop from beginning to end to break the loop on the firs=
t
> unsuccessful try_lock.
>

Oops, good point. I'll add a test for that too. Here's the fixed code:

    template<typename _L0, typename... _Lockables>
      inline int
      __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
      {
#if __cplusplus >=3D 201703L
        if constexpr ((is_same_v<_L0, _Lockables> && ...))
          {
            constexpr int _Np =3D 1 + sizeof...(_Lockables);
            unique_lock<_L0> __locks[_Np] =3D {
                {__l0, defer_lock}, {__lockables, defer_lock}...
            };
            for (int __i =3D 0; __i < _Np; ++__i)
              if (!__locks[__i].try_lock())
                {
                  const int __failed =3D __i;
                  while (__i--)
                    __locks[__i].unlock();
                  return __i;
                }
            for (auto& __l : __locks)
              __l.release();
            return -1;
          }
        else
#endif


> >           for (int __i =3D 0; __i < _Np; ++__i)
> >             if (!__locks[__i])
> >               return __i;
> >           for (auto& __l : __locks)
> >             __l.release();
> >           return -1;
> >         }
> >       else
> > #endif
> >       return __detail::__try_lock_impl(__l1, __l2, __l3...);
> >     }
> >
> > > if constexpr ((is_same_v<_L0, _L1> && ...))
> > > {
> > > constexpr int _Np =3D 1 + sizeof...(_L1);
> > > std::array<unique_lock<_L0>, _Np> __locks =3D {
> > > {__l0, defer_lock}, {__l1, defer_lock}...
> > > };
> > > int __first =3D 0;
> > > do {
> > > __locks[__first].lock();
> > > for (int __j =3D 1; __j < _Np; ++__j)
> > > {
> > > const int __idx =3D (__first + __j) % _Np;
> > > if (!__locks[__idx].try_lock())
> > > {
> > > for (int __k =3D __idx; __k !=3D __first;
> > > __k =3D __k =3D=3D 1 ? _Np : __k - 1)
> > > __locks[__k - 1].unlock();
> >
> > This loop doesn't work if any try_lock fails when first=3D=3D0, because
> > the loop termination condition is never reached.
>
> Uh yes. Which is the same reason why the __j loop doesn't start from
> __first +
> 1.
>
> > I find this a bit easier to understand than the loop above, and
> > correct (I think):
> >
> >   for (int __k =3D __j; __k !=3D 0; --__k)
> >     __locks[(__first + __k - 1) % _Np].unlock();
>
> Yes, if only we had a wrapping integer type that wraps at an arbitrary N.
> Like
> unsigned int but with parameter, like:
>
>   for (__wrapping_uint<_Np> __k =3D __idx; __k !=3D __first; --__k)
>     __locks[__k - 1].unlock();
>
> This is the loop I wanted to write, except --__k is simpler to write and
> __k -
> 1 would also wrap around to _Np - 1 for __k =3D=3D 0. But if this is the =
only
> place it's not important enough to abstract.
>

We might be able to use __wrapping_uint in std::seed_seq::generate too, and
maybe some other places in <random>. But we can add that later if we decide
it's worth it.


> > [...]
> > @@ -620,15 +632,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >     *  @post All arguments are locked.
> >     *
> >     *  All arguments are locked via a sequence of calls to lock(),
> > try_lock() -   *  and unlock().  If the call exits via an exception any
> > locks that were -   *  obtained will be released.
> > +   *  and unlock().  If this function exits via an exception any locks
> that
> > +   *  were obtained will be released.
> >     */
> >    template<typename _L1, typename _L2, typename... _L3>
> >      void
> >      lock(_L1& __l1, _L2& __l2, _L3&... __l3)
> >      {
> > -      int __i =3D 0;
> > -      __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
> > +#if __cplusplus >=3D 201703L
>
> I also considered moving it down here. Makes sense unless you want to cal=
l
> __detail::__lock_impl from other functions. And if we want to make it wor=
k
> for
> pre-C++11 we could do
>
>   using __homogeneous
>     =3D __and_<is_same<_L1, _L2>, is_same<_L1, _L3>...>;
>   int __i =3D 0;
>   __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);
>
>
We don't need tag dispatching, we could just do:

if _GLIBCXX17_CONSTEXPR (homogeneous::value)
 ...
else
 ...

because both branches are valid for the homogeneous case, i.e. we aren't
using if-constexpr to avoid invalid instantiations. So for pre-C++17 it
would still compile both branches (and instantiate the recursive function
templates) but would use the iterative code at run time.

But given that the default -std option is gnu++17 now, I'm OK with the
iterative version only being used for C++17.