public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] libstdc++: Optimize std::gcd
@ 2024-06-07  5:29 Stephen Face
  2024-06-07  8:57 ` Sam James
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Face @ 2024-06-07  5:29 UTC (permalink / raw)
  To: libstdc++; +Cc: gcc-patches

This patch is to optimize the runtime execution of gcd. Mathematically,
it computes with the same algorithm as before, but subtractions and
branches are rearranged to encourage generation of code that can use
flags from the subtractions for conditional moves. Additionally, most
pairs of integers are coprime, so this patch also includes a check for
one of the integers to be equal to 1, and then it will exit the loop
early in this case.

libstdc++-v3/ChangeLog:

	* include/std/numeric(__gcd): Optimize.
---
I have tested this on x86_64-linux and aarch64-linux. I have tested the
timing with random distributions of small inputs and large inputs on a
couple of machines with -O2 and found decreases in execution time from
20% to 60% depending on the machine and distribution of inputs.

 libstdc++-v3/include/std/numeric | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index c912db4a519..3c9e8387a0e 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -148,19 +148,20 @@ namespace __detail
 
       while (true)
 	{
-	  if (__m > __n)
-	    {
-	      _Tp __tmp = __m;
-	      __m = __n;
-	      __n = __tmp;
-	    }
+	  _Tp __m_minus_n = __m - __n;
+	  if (__m_minus_n == 0)
+	    return __m << __k;
 
-	  __n -= __m;
+	  _Tp __next_m = __m < __n ? __m : __n;
 
-	  if (__n == 0)
-	    return __m << __k;
+	  if (__next_m == 1)
+	    return __next_m << __k;
+
+	  _Tp __n_minus_m = __n - __m;
+	  __n = __n < __m ? __m_minus_n : __n_minus_m;
+	  __m = __next_m;
 
-	  __n >>= std::__countr_zero(__n);
+	  __n >>= std::__countr_zero(__m_minus_n);
 	}
     }
 } // namespace __detail
-- 
2.45.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] libstdc++: Optimize std::gcd
  2024-06-07  5:29 [PATCH] libstdc++: Optimize std::gcd Stephen Face
@ 2024-06-07  8:57 ` Sam James
  2024-06-07  9:30   ` Jonathan Wakely
  0 siblings, 1 reply; 5+ messages in thread
From: Sam James @ 2024-06-07  8:57 UTC (permalink / raw)
  To: Stephen Face; +Cc: libstdc++, gcc-patches

Stephen Face <shpface@gmail.com> writes:

> This patch is to optimize the runtime execution of gcd. Mathematically,
> it computes with the same algorithm as before, but subtractions and
> branches are rearranged to encourage generation of code that can use
> flags from the subtractions for conditional moves. Additionally, most
> pairs of integers are coprime, so this patch also includes a check for
> one of the integers to be equal to 1, and then it will exit the loop
> early in this case.

Is it worth filing a bug for the missed optimisation? You shouldn't have
to write things in a specific order. Thanks.

>
> libstdc++-v3/ChangeLog:
>
> 	* include/std/numeric(__gcd): Optimize.
> ---
> I have tested this on x86_64-linux and aarch64-linux. I have tested the
> timing with random distributions of small inputs and large inputs on a
> couple of machines with -O2 and found decreases in execution time from
> 20% to 60% depending on the machine and distribution of inputs.
>
>  libstdc++-v3/include/std/numeric | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
> index c912db4a519..3c9e8387a0e 100644
> --- a/libstdc++-v3/include/std/numeric
> +++ b/libstdc++-v3/include/std/numeric
> @@ -148,19 +148,20 @@ namespace __detail
>  
>        while (true)
>  	{
> -	  if (__m > __n)
> -	    {
> -	      _Tp __tmp = __m;
> -	      __m = __n;
> -	      __n = __tmp;
> -	    }
> +	  _Tp __m_minus_n = __m - __n;
> +	  if (__m_minus_n == 0)
> +	    return __m << __k;
>  
> -	  __n -= __m;
> +	  _Tp __next_m = __m < __n ? __m : __n;
>  
> -	  if (__n == 0)
> -	    return __m << __k;
> +	  if (__next_m == 1)
> +	    return __next_m << __k;
> +
> +	  _Tp __n_minus_m = __n - __m;
> +	  __n = __n < __m ? __m_minus_n : __n_minus_m;
> +	  __m = __next_m;
>  
> -	  __n >>= std::__countr_zero(__n);
> +	  __n >>= std::__countr_zero(__m_minus_n);
>  	}
>      }
>  } // namespace __detail

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] libstdc++: Optimize std::gcd
  2024-06-07  8:57 ` Sam James
@ 2024-06-07  9:30   ` Jonathan Wakely
  2024-06-07 18:42     ` Stephen Face
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Wakely @ 2024-06-07  9:30 UTC (permalink / raw)
  To: Sam James; +Cc: Stephen Face, libstdc++, gcc-patches

On Fri, 7 Jun 2024 at 09:57, Sam James <sam@gentoo.org> wrote:
>
> Stephen Face <shpface@gmail.com> writes:
>
> > This patch is to optimize the runtime execution of gcd. Mathematically,
> > it computes with the same algorithm as before, but subtractions and
> > branches are rearranged to encourage generation of code that can use
> > flags from the subtractions for conditional moves. Additionally, most
> > pairs of integers are coprime, so this patch also includes a check for
> > one of the integers to be equal to 1, and then it will exit the loop
> > early in this case.
>
> Is it worth filing a bug for the missed optimisation? You shouldn't have
> to write things in a specific order. Thanks.

Yes, I think a bug report would be good. But 20%-60% decreases in run
time seems significant enough that we should take the libstdc++ patch
now, rather than wait for a possible compiler fix to come later.

Stephen, could you please confirm whether you have a copyright
assignment in place for GCC, and if not whether you would be will to
complete one, or alternatively contribute this under the DCO terms:
https://gcc.gnu.org/dco.html
Thanks!


>
> >
> > libstdc++-v3/ChangeLog:
> >
> >       * include/std/numeric(__gcd): Optimize.
> > ---
> > I have tested this on x86_64-linux and aarch64-linux. I have tested the
> > timing with random distributions of small inputs and large inputs on a
> > couple of machines with -O2 and found decreases in execution time from
> > 20% to 60% depending on the machine and distribution of inputs.
> >
> >  libstdc++-v3/include/std/numeric | 21 +++++++++++----------
> >  1 file changed, 11 insertions(+), 10 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
> > index c912db4a519..3c9e8387a0e 100644
> > --- a/libstdc++-v3/include/std/numeric
> > +++ b/libstdc++-v3/include/std/numeric
> > @@ -148,19 +148,20 @@ namespace __detail
> >
> >        while (true)
> >       {
> > -       if (__m > __n)
> > -         {
> > -           _Tp __tmp = __m;
> > -           __m = __n;
> > -           __n = __tmp;
> > -         }
> > +       _Tp __m_minus_n = __m - __n;
> > +       if (__m_minus_n == 0)
> > +         return __m << __k;
> >
> > -       __n -= __m;
> > +       _Tp __next_m = __m < __n ? __m : __n;
> >
> > -       if (__n == 0)
> > -         return __m << __k;
> > +       if (__next_m == 1)
> > +         return __next_m << __k;
> > +
> > +       _Tp __n_minus_m = __n - __m;
> > +       __n = __n < __m ? __m_minus_n : __n_minus_m;
> > +       __m = __next_m;
> >
> > -       __n >>= std::__countr_zero(__n);
> > +       __n >>= std::__countr_zero(__m_minus_n);
> >       }
> >      }
> >  } // namespace __detail
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] libstdc++: Optimize std::gcd
  2024-06-07  9:30   ` Jonathan Wakely
@ 2024-06-07 18:42     ` Stephen Face
  2024-06-07 18:48       ` Jonathan Wakely
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Face @ 2024-06-07 18:42 UTC (permalink / raw)
  To: Jonathan Wakely, Sam James; +Cc: libstdc++, gcc-patches

On 6/7/24 2:30 AM, Jonathan Wakely wrote:
> On Fri, 7 Jun 2024 at 09:57, Sam James <sam@gentoo.org> wrote:
>>
>> Stephen Face <shpface@gmail.com> writes:
>>
>>> This patch is to optimize the runtime execution of gcd. Mathematically,
>>> it computes with the same algorithm as before, but subtractions and
>>> branches are rearranged to encourage generation of code that can use
>>> flags from the subtractions for conditional moves. Additionally, most
>>> pairs of integers are coprime, so this patch also includes a check for
>>> one of the integers to be equal to 1, and then it will exit the loop
>>> early in this case.
>>
>> Is it worth filing a bug for the missed optimisation? You shouldn't have
>> to write things in a specific order. Thanks.

This patch is not a pure reordering. It has the added comparison with 1.
Also, the argument to the trailing zero count is now the result of the
subtraction before the comparison of m and n. This shortens the chain of
dependencies, which can help the loop run faster.

> 
> Yes, I think a bug report would be good. But 20%-60% decreases in run
> time seems significant enough that we should take the libstdc++ patch
> now, rather than wait for a possible compiler fix to come later.
> 
> Stephen, could you please confirm whether you have a copyright
> assignment in place for GCC, and if not whether you would be will to
> complete one, or alternatively contribute this under the DCO terms:
> https://gcc.gnu.org/dco.html
> Thanks!

I do not have a copyright assignment in place for GCC. I can certify to
the DCO terms for this contribution.

    Signed-off-by: Stephen Face <shpface@gmail.com>

> 
> 
>>
>>>
>>> libstdc++-v3/ChangeLog:
>>>
>>>       * include/std/numeric(__gcd): Optimize.
>>> ---
>>> I have tested this on x86_64-linux and aarch64-linux. I have tested the
>>> timing with random distributions of small inputs and large inputs on a
>>> couple of machines with -O2 and found decreases in execution time from
>>> 20% to 60% depending on the machine and distribution of inputs.
>>>
>>>  libstdc++-v3/include/std/numeric | 21 +++++++++++----------
>>>  1 file changed, 11 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
>>> index c912db4a519..3c9e8387a0e 100644
>>> --- a/libstdc++-v3/include/std/numeric
>>> +++ b/libstdc++-v3/include/std/numeric
>>> @@ -148,19 +148,20 @@ namespace __detail
>>>
>>>        while (true)
>>>       {
>>> -       if (__m > __n)
>>> -         {
>>> -           _Tp __tmp = __m;
>>> -           __m = __n;
>>> -           __n = __tmp;
>>> -         }
>>> +       _Tp __m_minus_n = __m - __n;
>>> +       if (__m_minus_n == 0)
>>> +         return __m << __k;
>>>
>>> -       __n -= __m;
>>> +       _Tp __next_m = __m < __n ? __m : __n;
>>>
>>> -       if (__n == 0)
>>> -         return __m << __k;
>>> +       if (__next_m == 1)
>>> +         return __next_m << __k;
>>> +
>>> +       _Tp __n_minus_m = __n - __m;
>>> +       __n = __n < __m ? __m_minus_n : __n_minus_m;
>>> +       __m = __next_m;
>>>
>>> -       __n >>= std::__countr_zero(__n);
>>> +       __n >>= std::__countr_zero(__m_minus_n);
>>>       }
>>>      }
>>>  } // namespace __detail
>>
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] libstdc++: Optimize std::gcd
  2024-06-07 18:42     ` Stephen Face
@ 2024-06-07 18:48       ` Jonathan Wakely
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan Wakely @ 2024-06-07 18:48 UTC (permalink / raw)
  To: Stephen Face; +Cc: Sam James, libstdc++, gcc-patches

On Fri, 7 Jun 2024 at 19:42, Stephen Face <shpface@gmail.com> wrote:
>
> On 6/7/24 2:30 AM, Jonathan Wakely wrote:
> > On Fri, 7 Jun 2024 at 09:57, Sam James <sam@gentoo.org> wrote:
> >>
> >> Stephen Face <shpface@gmail.com> writes:
> >>
> >>> This patch is to optimize the runtime execution of gcd. Mathematically,
> >>> it computes with the same algorithm as before, but subtractions and
> >>> branches are rearranged to encourage generation of code that can use
> >>> flags from the subtractions for conditional moves. Additionally, most
> >>> pairs of integers are coprime, so this patch also includes a check for
> >>> one of the integers to be equal to 1, and then it will exit the loop
> >>> early in this case.
> >>
> >> Is it worth filing a bug for the missed optimisation? You shouldn't have
> >> to write things in a specific order. Thanks.
>
> This patch is not a pure reordering. It has the added comparison with 1.
> Also, the argument to the trailing zero count is now the result of the
> subtraction before the comparison of m and n. This shortens the chain of
> dependencies, which can help the loop run faster.
>
> >
> > Yes, I think a bug report would be good. But 20%-60% decreases in run
> > time seems significant enough that we should take the libstdc++ patch
> > now, rather than wait for a possible compiler fix to come later.
> >
> > Stephen, could you please confirm whether you have a copyright
> > assignment in place for GCC, and if not whether you would be will to
> > complete one, or alternatively contribute this under the DCO terms:
> > https://gcc.gnu.org/dco.html
> > Thanks!
>
> I do not have a copyright assignment in place for GCC. I can certify to
> the DCO terms for this contribution.
>
>     Signed-off-by: Stephen Face <shpface@gmail.com>

Thank you! I'll take care of this patch next week.


>
> >
> >
> >>
> >>>
> >>> libstdc++-v3/ChangeLog:
> >>>
> >>>       * include/std/numeric(__gcd): Optimize.
> >>> ---
> >>> I have tested this on x86_64-linux and aarch64-linux. I have tested the
> >>> timing with random distributions of small inputs and large inputs on a
> >>> couple of machines with -O2 and found decreases in execution time from
> >>> 20% to 60% depending on the machine and distribution of inputs.
> >>>
> >>>  libstdc++-v3/include/std/numeric | 21 +++++++++++----------
> >>>  1 file changed, 11 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
> >>> index c912db4a519..3c9e8387a0e 100644
> >>> --- a/libstdc++-v3/include/std/numeric
> >>> +++ b/libstdc++-v3/include/std/numeric
> >>> @@ -148,19 +148,20 @@ namespace __detail
> >>>
> >>>        while (true)
> >>>       {
> >>> -       if (__m > __n)
> >>> -         {
> >>> -           _Tp __tmp = __m;
> >>> -           __m = __n;
> >>> -           __n = __tmp;
> >>> -         }
> >>> +       _Tp __m_minus_n = __m - __n;
> >>> +       if (__m_minus_n == 0)
> >>> +         return __m << __k;
> >>>
> >>> -       __n -= __m;
> >>> +       _Tp __next_m = __m < __n ? __m : __n;
> >>>
> >>> -       if (__n == 0)
> >>> -         return __m << __k;
> >>> +       if (__next_m == 1)
> >>> +         return __next_m << __k;
> >>> +
> >>> +       _Tp __n_minus_m = __n - __m;
> >>> +       __n = __n < __m ? __m_minus_n : __n_minus_m;
> >>> +       __m = __next_m;
> >>>
> >>> -       __n >>= std::__countr_zero(__n);
> >>> +       __n >>= std::__countr_zero(__m_minus_n);
> >>>       }
> >>>      }
> >>>  } // namespace __detail
> >>
> >
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-06-07 18:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-07  5:29 [PATCH] libstdc++: Optimize std::gcd Stephen Face
2024-06-07  8:57 ` Sam James
2024-06-07  9:30   ` Jonathan Wakely
2024-06-07 18:42     ` Stephen Face
2024-06-07 18:48       ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).