public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects
@ 2021-08-15 13:17 dartdart26 at gmail dot com
  2021-08-15 13:38 ` [Bug libstdc++/101923] " dartdart26 at gmail dot com
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: dartdart26 at gmail dot com @ 2021-08-15 13:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

            Bug ID: 101923
           Summary: std::function's move ctor is slower than the copy one
                    for empty source objects
           Product: gcc
           Version: 9.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dartdart26 at gmail dot com
  Target Milestone: ---

std::function's move constructor calls swap() irrespective of whether the
source object is empty or not. In contrast, the copy constructor first checks
if the source object is empty and if it is, nothing is being done as the `this`
object is constructed in an empty state by _Function_base().

Calling swap() on an empty source requires more work, because some data needs
to be copied - for example, the POD data cannot be moved.

Could the move constructor check if the source is empty too, as the copy one
does? Please let me know if I am missing a rule that prevents that.

I have noticed that on version 9.3.0, but I see the code is the same in current
master at:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/bits/std_function.h;hb=c22bcfd2f7dc9bb5ad394720f4a612327dc898ba#l391

I have tested on a MacBook M1 and the copy ctor for empty sources is almost 2x
faster than the move ctor:

-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
copy            0.945 ns        0.945 ns    555789159
move             1.83 ns         1.83 ns    382183169

I have made an YouTube video for describing my findings and the benchmark
results:
https://www.youtube.com/watch?v=WA3mKab-tn8

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libstdc++/101923] std::function's move ctor is slower than the copy one for empty source objects
  2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
@ 2021-08-15 13:38 ` dartdart26 at gmail dot com
  2021-08-15 17:54 ` nok.raven at gmail dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dartdart26 at gmail dot com @ 2021-08-15 13:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

--- Comment #1 from Petar Ivanov <dartdart26 at gmail dot com> ---
Benchmark code (using Google Benchmark):

#include <benchmark/benchmark.h>

#include <functional>
#include <utility>

struct Car {};

static void copy(benchmark::State& state) {
  for (auto _ : state) {
    const auto f = std::function<void(const Car&)>{};
    const auto copied = f;
    benchmark::DoNotOptimize(copied);
  }
}

static void move(benchmark::State& state) {
  for (auto _ : state) {
    auto f = std::function<void(const Car&)>{};
    const auto moved = std::move(f);
    benchmark::DoNotOptimize(moved);
  }
}

BENCHMARK(copy);
BENCHMARK(move);

BENCHMARK_MAIN();

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libstdc++/101923] std::function's move ctor is slower than the copy one for empty source objects
  2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
  2021-08-15 13:38 ` [Bug libstdc++/101923] " dartdart26 at gmail dot com
@ 2021-08-15 17:54 ` nok.raven at gmail dot com
  2021-08-16  7:18 ` dartdart26 at gmail dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: nok.raven at gmail dot com @ 2021-08-15 17:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

Nikita Kniazev <nok.raven at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nok.raven at gmail dot com

--- Comment #2 from Nikita Kniazev <nok.raven at gmail dot com> ---
There is no difference in the produced code on trunk (except move ops order)
https://godbolt.org/z/esfjhr9ae

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libstdc++/101923] std::function's move ctor is slower than the copy one for empty source objects
  2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
  2021-08-15 13:38 ` [Bug libstdc++/101923] " dartdart26 at gmail dot com
  2021-08-15 17:54 ` nok.raven at gmail dot com
@ 2021-08-16  7:18 ` dartdart26 at gmail dot com
  2021-08-16  7:30 ` [Bug tree-optimization/101923] " pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dartdart26 at gmail dot com @ 2021-08-16  7:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

--- Comment #3 from Petar Ivanov <dartdart26 at gmail dot com> ---
Thank you for pointing the output on x86!

Following that, I checked O2 and O3 on ARM64 and I see differences, though I
cannot say what their actual impact is:

02: https://godbolt.org/z/P9Garznef

O3: https://godbolt.org/z/Yb1q33YP3

In terms of x86, I ran the benchmark in Quick Bench (I assume x86 as that what
the disassembly is) and the results are similar to my findings on ARM64 - move
being slower:
https://quick-bench.com/q/vK9eSYngutKGo4QSPcdra9gUOI0

The benchmark code seems correct to me, but I might be missing something, might
be misusing DoNotOptimize() or there might be some side effects.

I am sure this is not a big deal. I was just wondering if adding an if
statement is doable and, if yes, it seems like a quick and easy win.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects
  2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
                   ` (2 preceding siblings ...)
  2021-08-16  7:18 ` dartdart26 at gmail dot com
@ 2021-08-16  7:30 ` pinskia at gcc dot gnu.org
  2021-08-17  6:09 ` dartdart26 at gmail dot com
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-16  7:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
          Component|libstdc++                   |tree-optimization

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Hmm

  __tmp = MEM[(union _Any_data & {ref-all})&f];
  MEM[(union _Any_data * {ref-all})&f] = MEM[(union _Any_data &
{ref-all})&moved];
  MEM[(union _Any_data * {ref-all})&moved] = __tmp;
  __tmp ={v} {CLOBBER};
  _13 = MEM[(void (*type) (const union _Any_data & {ref-all}, const struct Car
&) &)&f + 24];
  _14 = MEM[(void (*type) (const union _Any_data & {ref-all}, const struct Car
&) &)&moved + 24];
  MEM[(void (*<Te9f8>) (const union _Any_data & {ref-all}, const struct Car &)
&)&f + 24] = _14;
  MEM[(void (*<Te9f8>) (const union _Any_data & {ref-all}, const struct Car &)
&)&moved + 24] = _13;

So a missed optimization at the gimple level.
But note the arm64 compiler on godbolt is a few months old, 20210528.  There
might have been some fixes which improve this already.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects
  2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
                   ` (3 preceding siblings ...)
  2021-08-16  7:30 ` [Bug tree-optimization/101923] " pinskia at gcc dot gnu.org
@ 2021-08-17  6:09 ` dartdart26 at gmail dot com
  2021-08-17 10:00 ` redi at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dartdart26 at gmail dot com @ 2021-08-17  6:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

--- Comment #5 from Petar Ivanov <dartdart26 at gmail dot com> ---
(In reply to Andrew Pinski from comment #4)
> Hmm
> 
>   __tmp = MEM[(union _Any_data & {ref-all})&f];
>   MEM[(union _Any_data * {ref-all})&f] = MEM[(union _Any_data &
> {ref-all})&moved];
>   MEM[(union _Any_data * {ref-all})&moved] = __tmp;
>   __tmp ={v} {CLOBBER};
>   _13 = MEM[(void (*type) (const union _Any_data & {ref-all}, const struct
> Car &) &)&f + 24];
>   _14 = MEM[(void (*type) (const union _Any_data & {ref-all}, const struct
> Car &) &)&moved + 24];
>   MEM[(void (*<Te9f8>) (const union _Any_data & {ref-all}, const struct Car
> &) &)&f + 24] = _14;
>   MEM[(void (*<Te9f8>) (const union _Any_data & {ref-all}, const struct Car
> &) &)&moved + 24] = _13;
> 
> So a missed optimization at the gimple level.
> But note the arm64 compiler on godbolt is a few months old, 20210528.  There
> might have been some fixes which improve this already.

I see, thank you.

Do you think the x86 results on quick bench are something worth improving? From
a user's perspective, I assume the expectation is that moves are at least as
fast as copies.

Could you please advise on how I can proceed with this report? Can a change be
made in libstdc++ or should it be considered a compiler issue?

Thank you!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects
  2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
                   ` (4 preceding siblings ...)
  2021-08-17  6:09 ` dartdart26 at gmail dot com
@ 2021-08-17 10:00 ` redi at gcc dot gnu.org
  2021-08-17 10:25 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2021-08-17 10:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

--- Comment #6 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Petar Ivanov from comment #5)
> Could you please advise on how I can proceed with this report? Can a change
> be made in libstdc++ or should it be considered a compiler issue?

Both, I think.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects
  2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
                   ` (5 preceding siblings ...)
  2021-08-17 10:00 ` redi at gcc dot gnu.org
@ 2021-08-17 10:25 ` redi at gcc dot gnu.org
  2021-08-17 13:24 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2021-08-17 10:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

--- Comment #7 from Jonathan Wakely <redi at gcc dot gnu.org> ---
We can do better than just making the swap conditional:

      function(function&& __x) noexcept
      : _Function_base(), _M_invoker(__x._M_invoker)
      {
        if (static_cast<bool>(__x))
          {
            _M_functor = __x._M_functor;
            _M_manager = __x._M_manager;
            __x._M_manager = nullptr;
            __x._M_invoker = nullptr;
          }
      }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects
  2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
                   ` (6 preceding siblings ...)
  2021-08-17 10:25 ` redi at gcc dot gnu.org
@ 2021-08-17 13:24 ` cvs-commit at gcc dot gnu.org
  2021-08-18  6:02 ` dartdart26 at gmail dot com
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-08-17 13:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:0808b0df9c4d31f4c362b9c85fb538b6aafcb517

commit r12-2959-g0808b0df9c4d31f4c362b9c85fb538b6aafcb517
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Aug 17 11:30:56 2021 +0100

    libstdc++: Optimize std::function move constructor [PR101923]

    PR 101923 points out that the unconditional swap in the std::function
    move constructor makes it slower than copying an empty std::function.
    The copy constructor has to check for the empty case before doing
    anything, and that makes it very fast for the empty case.

    Adding the same check to the move constructor avoids copying the
    _Any_data POD when we don't need to. We can also inline the effects of
    swap, by copying each member and then zeroing the pointer members.

    This makes moving an empty object at least as fast as copying an empty
    object.

    Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

    libstdc++-v3/ChangeLog:

            PR libstdc++/101923
            * include/bits/std_function.h (function(function&&)): Check for
            non-empty parameter before doing any work.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects
  2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
                   ` (7 preceding siblings ...)
  2021-08-17 13:24 ` cvs-commit at gcc dot gnu.org
@ 2021-08-18  6:02 ` dartdart26 at gmail dot com
  2021-10-12 10:59 ` cvs-commit at gcc dot gnu.org
  2022-12-29 22:14 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: dartdart26 at gmail dot com @ 2021-08-18  6:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

Petar Ivanov <dartdart26 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #9 from Petar Ivanov <dartdart26 at gmail dot com> ---
(In reply to CVS Commits from comment #8)
> The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:0808b0df9c4d31f4c362b9c85fb538b6aafcb517
> 
> commit r12-2959-g0808b0df9c4d31f4c362b9c85fb538b6aafcb517
> Author: Jonathan Wakely <jwakely@redhat.com>
> Date:   Tue Aug 17 11:30:56 2021 +0100
> 
>     libstdc++: Optimize std::function move constructor [PR101923]
>     

Thank you!

On ARM64, it is now identical to copy:

-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
copy            0.948 ns        0.948 ns    558822565
move            0.952 ns        0.952 ns    729210032

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects
  2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
                   ` (8 preceding siblings ...)
  2021-08-18  6:02 ` dartdart26 at gmail dot com
@ 2021-10-12 10:59 ` cvs-commit at gcc dot gnu.org
  2022-12-29 22:14 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-10-12 10:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:73b0f810a17a5f529fc8342a2df31276d3538851

commit r11-9111-g73b0f810a17a5f529fc8342a2df31276d3538851
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Aug 17 11:30:56 2021 +0100

    libstdc++: Optimize std::function move constructor [PR101923]

    PR 101923 points out that the unconditional swap in the std::function
    move constructor makes it slower than copying an empty std::function.
    The copy constructor has to check for the empty case before doing
    anything, and that makes it very fast for the empty case.

    Adding the same check to the move constructor avoids copying the
    _Any_data POD when we don't need to. We can also inline the effects of
    swap, by copying each member and then zeroing the pointer members.

    This makes moving an empty object at least as fast as copying an empty
    object.

    Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

    libstdc++-v3/ChangeLog:

            PR libstdc++/101923
            * include/bits/std_function.h (function(function&&)): Check for
            non-empty parameter before doing any work.

    (cherry picked from commit 0808b0df9c4d31f4c362b9c85fb538b6aafcb517)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects
  2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
                   ` (9 preceding siblings ...)
  2021-10-12 10:59 ` cvs-commit at gcc dot gnu.org
@ 2022-12-29 22:14 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-12-29 22:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-12-29 22:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-15 13:17 [Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects dartdart26 at gmail dot com
2021-08-15 13:38 ` [Bug libstdc++/101923] " dartdart26 at gmail dot com
2021-08-15 17:54 ` nok.raven at gmail dot com
2021-08-16  7:18 ` dartdart26 at gmail dot com
2021-08-16  7:30 ` [Bug tree-optimization/101923] " pinskia at gcc dot gnu.org
2021-08-17  6:09 ` dartdart26 at gmail dot com
2021-08-17 10:00 ` redi at gcc dot gnu.org
2021-08-17 10:25 ` redi at gcc dot gnu.org
2021-08-17 13:24 ` cvs-commit at gcc dot gnu.org
2021-08-18  6:02 ` dartdart26 at gmail dot com
2021-10-12 10:59 ` cvs-commit at gcc dot gnu.org
2022-12-29 22:14 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).