public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible
@ 2024-04-23  8:14 hubicka at gcc dot gnu.org
  2024-04-23  8:30 ` [Bug libstdc++/114821] " redi at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: hubicka at gcc dot gnu.org @ 2024-04-23  8:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

            Bug ID: 114821
           Summary: _M_realloc_append should use memcpy instead of loop to
                    copy data when possible
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

In thestcase

#include <vector>
typedef unsigned int uint32_t;
std::pair<uint32_t, uint32_t> pair;
void
test()
{
        std::vector<std::pair<uint32_t, uint32_t>> stack;
        stack.push_back (pair);
        while (!stack.empty()) {
                std::pair<uint32_t, uint32_t> cur = stack.back();
                stack.pop_back();
                if (!cur.first)
                {
                        cur.second++;
                        stack.push_back (cur);
                        stack.push_back (cur);
                }
                if (cur.second > 10000)
                        break;
        }
}
int
main()
{
        for (int i = 0; i < 10000; i++)
          test();
}

We produce _M_reallloc_append which uses loop to copy data instead of memcpy.
This is bigger and slower.  The reason why __relocate_a does not use memcpy
seems to be fact that pair has copy constructor. It still can be pattern
matched by ldist but it fails with:

(compute_affine_dependence
  ref_a: *__first_1, stmt_a: *__cur_37 = *__first_1;
  ref_b: *__cur_37, stmt_b: *__cur_37 = *__first_1;
) -> dependence analysis failed

So we can not disambiguate old and new vector memory and prove that loop is
indeed memcpy loop. I think this is valid since operator new is not required to
return new memory, but I think adding __restrict should solve this.

Problem is that I got lost on where to add them, since relocate_a uses
iterators instead of pointers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
@ 2024-04-23  8:30 ` redi at gcc dot gnu.org
  2024-04-23  8:52 ` hubicka at gcc dot gnu.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-04-23  8:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Using memcpy on any std::pair is undefined behaviour because it's not trivially
copyable. 

That's not because it has a copy constructor, its copy constructor is defaulted
and so trivial if the data members are trivially copy constructible:
      constexpr pair(const pair&) = default;    ///< Copy constructor

It's because it has a non-trivial assignment operator:

      /// Copy assignment operator
      constexpr pair&
      operator=(const pair& __p)
      noexcept(_S_nothrow_assignable<const _T1&, const _T2&>())
      requires (_S_assignable<const _T1&, const _T2&>())
      {
        first = __p.first;
        second = __p.second;
        return *this;
      }


I think this exact point was discussed when Marc introduced the relocate
optimizations. We could maybe cheat and say that we know it's safe to memcpy
std::pair<int, int> even though the language says it's undefined, because we
know what our std::pair implementation looks like.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
  2024-04-23  8:30 ` [Bug libstdc++/114821] " redi at gcc dot gnu.org
@ 2024-04-23  8:52 ` hubicka at gcc dot gnu.org
  2024-04-23 10:41 ` redi at gcc dot gnu.org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: hubicka at gcc dot gnu.org @ 2024-04-23  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
What I am shooting for is to optimize it later in loop distribution. We can
recognize memcpy loop if we can figure out that source and destination memory
are different.

We can help here with restrict, but I was bit lost in how to get them done.

This seems to do the trick, but for some reason I get memmove

diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h
b/libstdc++-v3/include/bits/stl_uninitialized.h
index 7f84da31578..1a6223ea892 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -1130,7 +1130,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
        }
       return __result + __count;
     }
+
+  template <typename _Tp, typename _Allocator>
+    _GLIBCXX20_CONSTEXPR
+    inline __enable_if_t<std::__is_bitwise_relocatable<_Tp>::value, _Tp*>
+    __relocate_a(_Tp * __restrict __first, _Tp *__last,
+                _Tp * __restrict __result, _Allocator& __alloc) noexcept
+    {
+      ptrdiff_t __count = __last - __first;
+      if (__count > 0)
+       {
+#ifdef __cpp_lib_is_constant_evaluated
+         if (std::is_constant_evaluated())
+           {
+             for (; __first != __last; ++__first, (void)++__result)
+               {
+                 // manually inline relocate_object_a to not lose restrict
qualifiers
+                 typedef std::allocator_traits<_Allocator> __traits;
+                 __traits::construct(__alloc, __result, std::move(*__first));
+                 __traits::destroy(__alloc, std::__addressof(*__first));
+               }
+             return __result;
+           }
 #endif
+         __builtin_memcpy(__result, __first, __count * sizeof(_Tp));
+       }
+      return __result + __count;
+    }
+#endif
+
+  template <typename _Tp, typename _Allocator>
+    _GLIBCXX20_CONSTEXPR
+#if _GLIBCXX_HOSTED
+    inline __enable_if_t<!std::__is_bitwise_relocatable<_Tp>::value, _Tp*>
+#else
+    inline _Tp *
+#endif
+    __relocate_a(_Tp * __restrict __first, _Tp *__last,
+                _Tp * __restrict __result, _Allocator& __alloc)
+    noexcept(noexcept(std::allocator_traits<_Allocator>::construct(__alloc,
+                        __result, std::move(*__first)))
+            && noexcept(std::allocator_traits<_Allocator>::destroy(
+                           __alloc, std::__addressof(*__first))))
+    {
+      for (; __first != __last; ++__first, (void)++__result)
+       {
+         // manually inline relocate_object_a to not lose restrict qualifiers
+         typedef std::allocator_traits<_Allocator> __traits;
+         __traits::construct(__alloc, __result, std::move(*__first));
+         __traits::destroy(__alloc, std::__addressof(*__first));
+       }
+      return __result;
+    }

   template <typename _InputIterator, typename _ForwardIterator,
            typename _Allocator>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
  2024-04-23  8:30 ` [Bug libstdc++/114821] " redi at gcc dot gnu.org
  2024-04-23  8:52 ` hubicka at gcc dot gnu.org
@ 2024-04-23 10:41 ` redi at gcc dot gnu.org
  2024-04-23 10:51 ` redi at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-04-23 10:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #2)
> This seems to do the trick, but for some reason I get memmove

What's the second overload for, and why does it depend on _GLIBCXX_HOSTED?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-04-23 10:41 ` redi at gcc dot gnu.org
@ 2024-04-23 10:51 ` redi at gcc dot gnu.org
  2024-04-23 11:01 ` redi at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-04-23 10:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #2)
> --- a/libstdc++-v3/include/bits/stl_uninitialized.h
> +++ b/libstdc++-v3/include/bits/stl_uninitialized.h
> @@ -1130,7 +1130,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>         }
>        return __result + __count;
>      }
> +
> +  template <typename _Tp, typename _Allocator>
> +    _GLIBCXX20_CONSTEXPR
> +    inline __enable_if_t<std::__is_bitwise_relocatable<_Tp>::value, _Tp*>
> +    __relocate_a(_Tp * __restrict __first, _Tp *__last,
> +                _Tp * __restrict __result, _Allocator& __alloc) noexcept

This is wrong, we can't optimize arbitrary allocators, only when the allocator
is std::allocator<_Tp>. That's what the existing code is doing with the
indirection from __relocate_a to __relocate_a_1.

> +    {
> +      ptrdiff_t __count = __last - __first;
> +      if (__count > 0)
> +       {
> +#ifdef __cpp_lib_is_constant_evaluated
> +         if (std::is_constant_evaluated())
> +           {
> +             for (; __first != __last; ++__first, (void)++__result)

You don't need the (void) case here because __first and __result are both
pointers. That's only needed for the generic __relocate_a that deals with
arbitrary iterator types.

> +               {
> +                 // manually inline relocate_object_a to not lose restrict
> qualifiers

We don't care about restrict when is_constant_evaluated is true, we're not
optimizing this code. It just gets interpreted at compile time. There is no
reason to inline __relocate_object_a for the is_constant_evaluated case.


> +                 typedef std::allocator_traits<_Allocator> __traits;
> +                 __traits::construct(__alloc, __result,
> std::move(*__first));
> +                 __traits::destroy(__alloc, std::__addressof(*__first));
> +               }
> +             return __result;
> +           }
>  #endif
> +         __builtin_memcpy(__result, __first, __count * sizeof(_Tp));
> +       }
> +      return __result + __count;
> +    }
> +#endif
> +
> +  template <typename _Tp, typename _Allocator>
> +    _GLIBCXX20_CONSTEXPR
> +#if _GLIBCXX_HOSTED
> +    inline __enable_if_t<!std::__is_bitwise_relocatable<_Tp>::value, _Tp*>
> +#else
> +    inline _Tp *
> +#endif
> +    __relocate_a(_Tp * __restrict __first, _Tp *__last,
> +                _Tp * __restrict __result, _Allocator& __alloc)
> +    noexcept(noexcept(std::allocator_traits<_Allocator>::construct(__alloc,
> +                        __result, std::move(*__first)))
> +            && noexcept(std::allocator_traits<_Allocator>::destroy(
> +                           __alloc, std::__addressof(*__first))))
> +    {
> +      for (; __first != __last; ++__first, (void)++__result)
> +       {
> +         // manually inline relocate_object_a to not lose restrict
> qualifiers
> +         typedef std::allocator_traits<_Allocator> __traits;
> +         __traits::construct(__alloc, __result, std::move(*__first));
> +         __traits::destroy(__alloc, std::__addressof(*__first));
> +       }

I don't understand this overload at all. Is this overload the one that actually
gets used for your testcase? I think it should be, because std::pair<int, int>
is not bitwise_relocatable.

Why does the restrict qualifier get lost if we don't inline relocate_object_a?
That function is restrict-qualified as well.


> +      return __result;
> +    }
>  
>    template <typename _InputIterator, typename _ForwardIterator,
>             typename _Allocator>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-04-23 10:51 ` redi at gcc dot gnu.org
@ 2024-04-23 11:01 ` redi at gcc dot gnu.org
  2024-04-23 12:08 ` hubicka at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-04-23 11:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
If the problem is simply that the __restrict qualifiers are not present because
we go through the generic function taking iterators, then we can just add:

--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -1109,8 +1109,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template <typename _Tp, typename _Up>
     _GLIBCXX20_CONSTEXPR
     inline __enable_if_t<std::__is_bitwise_relocatable<_Tp>::value, _Tp*>
-    __relocate_a_1(_Tp* __first, _Tp* __last,
-                  _Tp* __result,
+    __relocate_a_1(_Tp* __restrict __first, _Tp* __last,
+                  _Tp* __restrict __result,
                   [[__maybe_unused__]] allocator<_Up>& __alloc) noexcept
     {
       ptrdiff_t __count = __last - __first;
@@ -1147,6 +1147,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
                                 std::__niter_base(__result), __alloc);
     }

+  template <typename _Tp, typename _Up>
+    _GLIBCXX20_CONSTEXPR
+    inline _ForwardIterator
+    __relocate_a(_Tp* __restrict __first, _Tp* __last,
+                _Tp* __restrict __result,
+                [[__maybe_unused__]] allocator<_Up>& __alloc) noexcept
+    noexcept(std::__is_bitwise_relocatable<_Tp>::value)
+    {
+      return std::__relocate_a_1(__first, __last, __result, __alloc);
+    }
+
   /// @endcond
 #endif // C++11


If the problem is that std::pair<int, int> is not bitwise_relocatable, then we
could change that (as Marc suggested as a possible future enhancement):

--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -1082,6 +1082,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     struct __is_bitwise_relocatable
     : is_trivial<_Tp> { };

+  template<typename _Tp, typename _Up>
+    struct __is_bitwise_relocatable<pair<_Tp, _Up>, void>
+    : __and_<is_trivial<_Tp>, is_trivial<_Up>>
+    { };
+
   template <typename _InputIterator, typename _ForwardIterator,
            typename _Allocator>
     _GLIBCXX20_CONSTEXPR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-04-23 11:01 ` redi at gcc dot gnu.org
@ 2024-04-23 12:08 ` hubicka at gcc dot gnu.org
  2024-04-23 12:33 ` redi at gcc dot gnu.org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: hubicka at gcc dot gnu.org @ 2024-04-23 12:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thanks. I though the relocate_a only cares about the fact if the pointed-to
type can be bitwise copied.  It would be nice to early produce memcpy from
libstdc++ for std::pair, so the second patch makes sense to me (I did not test
if it works)

I think it would be still nice to tell GCC that the copy loop never gets
overlapping memory locations so the cases which are not early optimized to
memcpy can still be optimized later (or vectorized if it does really something
non-trivial).

So i tried your second patch fixed so it compiles:
diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h
b/libstdc++-v3/include/bits/stl_uninitialized.h
index 7f84da31578..0d2e588ae5e 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -1109,8 +1109,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template <typename _Tp, typename _Up>
     _GLIBCXX20_CONSTEXPR
     inline __enable_if_t<std::__is_bitwise_relocatable<_Tp>::value, _Tp*>
-    __relocate_a_1(_Tp* __first, _Tp* __last,
-                  _Tp* __result,
+    __relocate_a_1(_Tp* __restrict __first, _Tp* __last,
+                  _Tp* __restrict __result,
                   [[__maybe_unused__]] allocator<_Up>& __alloc) noexcept
     {
       ptrdiff_t __count = __last - __first;
@@ -1147,6 +1147,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
                                 std::__niter_base(__result), __alloc);
     }

+  template <typename _Tp, typename _Up>
+    _GLIBCXX20_CONSTEXPR
+    inline _Tp*
+    __relocate_a(_Tp* __restrict __first, _Tp* __last,
+                _Tp* __restrict __result,
+                allocator<_Up>& __alloc)
+    noexcept(std::__is_bitwise_relocatable<_Tp>::value)
+    {
+      return std::__relocate_a_1(__first, __last, __result, __alloc);
+    }
+
   /// @endcond
 #endif // C++11

it does not make ldist to hit, so the restrict info is still lost.  I think the
problem is that if you call relocate_object the restrict reduces scope, so we
only know that the elements are pairwise disjoint, not that the vectors are.
This is because restrict is interpreted early pre-inlining, but it is really
Richard's area.

It seems that the patch makes us to go through __uninitialized_copy_a instead
of __uninit_copy. I am not even sure how these are different, so I need to
stare at the code bit more to make sense of it :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-04-23 12:08 ` hubicka at gcc dot gnu.org
@ 2024-04-23 12:33 ` redi at gcc dot gnu.org
  2024-04-23 12:38 ` hubicka at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-04-23 12:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

--- Comment #7 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Created attachment 58014
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58014&action=edit
Make std::pair relocatable and simplify __relocate_a

Does this help?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-04-23 12:33 ` redi at gcc dot gnu.org
@ 2024-04-23 12:38 ` hubicka at gcc dot gnu.org
  2024-04-23 12:41 ` hubicka at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: hubicka at gcc dot gnu.org @ 2024-04-23 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I had wrong noexcept specifier.  This version works, but I still need to inline
relocate_object_a into the loop

diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h
b/libstdc++-v3/include/bits/stl_uninitialized.h
index 7f84da31578..f02d4fb878f 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -1100,8 +1100,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
          "relocation is only possible for values of the same type");
       _ForwardIterator __cur = __result;
       for (; __first != __last; ++__first, (void)++__cur)
-       std::__relocate_object_a(std::__addressof(*__cur),
-                                std::__addressof(*__first), __alloc);
+       {
+         typedef std::allocator_traits<_Allocator> __traits;
+         __traits::construct(__alloc, std::__addressof(*__cur),
std::move(*std::__addressof(*__first)));
+         __traits::destroy(__alloc,
std::__addressof(*std::__addressof(*__first)));
+       }
       return __cur;
     }

@@ -1109,8 +1112,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template <typename _Tp, typename _Up>
     _GLIBCXX20_CONSTEXPR
     inline __enable_if_t<std::__is_bitwise_relocatable<_Tp>::value, _Tp*>
-    __relocate_a_1(_Tp* __first, _Tp* __last,
-                  _Tp* __result,
+    __relocate_a_1(_Tp* __restrict __first, _Tp* __last,
+                  _Tp* __restrict __result,
                   [[__maybe_unused__]] allocator<_Up>& __alloc) noexcept
     {
       ptrdiff_t __count = __last - __first;
@@ -1147,6 +1150,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
                                 std::__niter_base(__result), __alloc);
     }

+  template <typename _Tp, typename _Up>
+    _GLIBCXX20_CONSTEXPR
+    inline _Tp*
+    __relocate_a(_Tp* __restrict __first, _Tp* __last,
+                _Tp* __restrict __result,
+                allocator<_Up>& __alloc)
+    noexcept(noexcept(__relocate_a_1(__first, __last, __result, __alloc)))
+    {
+      return std::__relocate_a_1(__first, __last, __result, __alloc);
+    }
+
   /// @endcond
 #endif // C++11

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-04-23 12:38 ` hubicka at gcc dot gnu.org
@ 2024-04-23 12:41 ` hubicka at gcc dot gnu.org
  2024-04-23 12:42 ` redi at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: hubicka at gcc dot gnu.org @ 2024-04-23 12:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

--- Comment #9 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Your patch gives me error compiling testcase

jh@ryzen3:/tmp> ~/trunk-install/bin/g++ -O3 ~/t.C 
In file included from /home/jh/trunk-install/include/c++/14.0.1/vector:65,
                 from /home/jh/t.C:1:
/home/jh/trunk-install/include/c++/14.0.1/bits/stl_uninitialized.h: In
instantiation of ‘_ForwardIterator std::__relocate_a(_InputIterator,
_InputIterator, _ForwardIterator, _Allocator&) [with _InputIterator = const
pair<unsigned int, unsigned int>*; _ForwardIterator = pair<unsigned int,
unsigned int>*; _Allocator = allocator<pair<unsigned int, unsigned int> >;
_Traits = allocator_traits<allocator<pair<unsigned int, unsigned int> > >]’:
/home/jh/trunk-install/include/c++/14.0.1/bits/stl_uninitialized.h:1127:31:  
required from ‘_Tp* std::__relocate_a(_Tp*, _Tp*, _Tp*, allocator<_T2>&) [with
_Tp = pair<unsigned int, unsigned int>; _Up = pair<unsigned int, unsigned
int>]’
 1127 |       return std::__relocate_a(__cfirst, __clast, __result, __alloc);
      |              ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/jh/trunk-install/include/c++/14.0.1/bits/stl_vector.h:509:26:   required
from ‘static std::vector<_Tp, _Alloc>::pointer std::vector<_Tp,
_Alloc>::_S_relocate(pointer, pointer, pointer, _Tp_alloc_type&) [with _Tp =
std::pair<unsigned int, unsigned int>; _Alloc =
std::allocator<std::pair<unsigned int, unsigned int> >; pointer =
std::pair<unsigned int, unsigned int>*; _Tp_alloc_type =
std::vector<std::pair<unsigned int, unsigned int> >::_Tp_alloc_type]’
  509 |         return std::__relocate_a(__first, __last, __result, __alloc);
      |                ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/jh/trunk-install/include/c++/14.0.1/bits/vector.tcc:647:32:   required
from ‘void std::vector<_Tp, _Alloc>::_M_realloc_append(_Args&& ...) [with _Args
= {const std::pair<unsigned int, unsigned int>&}; _Tp = std::pair<unsigned int,
unsigned int>; _Alloc = std::allocator<std::pair<unsigned int, unsigned int>
>]’
  647 |             __new_finish = _S_relocate(__old_start, __old_finish,
      |                            ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
  648 |                                        __new_start,
_M_get_Tp_allocator());
      |                                       
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/jh/trunk-install/include/c++/14.0.1/bits/stl_vector.h:1294:21:   required
from ‘void std::vector<_Tp, _Alloc>::push_back(const value_type&) [with _Tp =
std::pair<unsigned int, unsigned int>; _Alloc =
std::allocator<std::pair<unsigned int, unsigned int> >; value_type =
std::pair<unsigned int, unsigned int>]’
 1294 |           _M_realloc_append(__x);
      |           ~~~~~~~~~~~~~~~~~^~~~~
/home/jh/t.C:8:25:   required from here
    8 |         stack.push_back (pair);
      |         ~~~~~~~~~~~~~~~~^~~~~~
/home/jh/trunk-install/include/c++/14.0.1/bits/stl_uninitialized.h:1084:56:
error: use of deleted function ‘const _Tp* std::addressof(const _Tp&&) [with
_Tp = pair<unsigned int, unsigned int>]’
 1084 |                                         
std::addressof(std::move(*__first))))
      |                                         
~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
In file included from
/home/jh/trunk-install/include/c++/14.0.1/bits/stl_pair.h:61,
                 from
/home/jh/trunk-install/include/c++/14.0.1/bits/stl_algobase.h:64,
                 from /home/jh/trunk-install/include/c++/14.0.1/vector:62:
/home/jh/trunk-install/include/c++/14.0.1/bits/move.h:168:16: note: declared
here
  168 |     const _Tp* addressof(const _Tp&&) = delete;
      |                ^~~~~~~~~
/home/jh/trunk-install/include/c++/14.0.1/bits/stl_uninitialized.h:1084:56:
note: use ‘-fdiagnostics-all-candidates’ to display considered candidates
 1084 |                                         
std::addressof(std::move(*__first))))
      |                                         
~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~


It is easy to check if conversion happens - just compile it and see if there is
memcpy or memmove in the optimized dump file (or final assembly)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2024-04-23 12:41 ` hubicka at gcc dot gnu.org
@ 2024-04-23 12:42 ` redi at gcc dot gnu.org
  2024-04-23 13:11 ` redi at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-04-23 12:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

--- Comment #10 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #7)
> Created attachment 58014 [details]
> Make std::pair relocatable and simplify __relocate_a
> 
> Does this help?

Oh hang on, that patch is wrong. I'll fix it and check the results myself ...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2024-04-23 12:42 ` redi at gcc dot gnu.org
@ 2024-04-23 13:11 ` redi at gcc dot gnu.org
  2024-04-23 15:53 ` redi at gcc dot gnu.org
  2024-04-24 14:23 ` hubicka at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-04-23 13:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #58014|0                           |1
        is obsolete|                            |

--- Comment #11 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Created attachment 58015
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58015&action=edit
Make std::pair relocatable and simplify __relocate_a

Fixed patch.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2024-04-23 13:11 ` redi at gcc dot gnu.org
@ 2024-04-23 15:53 ` redi at gcc dot gnu.org
  2024-04-24 14:23 ` hubicka at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-04-23 15:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #58015|0                           |1
        is obsolete|                            |

--- Comment #12 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Created attachment 58019
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58019&action=edit
Make std::pair relocatable and simplify __relocate_a

More comprehensive patch.

With this, I see memcpy in the -fdump-tree-optimized dump.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible
  2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2024-04-23 15:53 ` redi at gcc dot gnu.org
@ 2024-04-24 14:23 ` hubicka at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: hubicka at gcc dot gnu.org @ 2024-04-24 14:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821

--- Comment #13 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thanks a lot, looks great!
Do we still auto-detect memmove when the copy constructor turns out to be
memcpy equivalent after optimization?

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-04-24 14:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-23  8:14 [Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible hubicka at gcc dot gnu.org
2024-04-23  8:30 ` [Bug libstdc++/114821] " redi at gcc dot gnu.org
2024-04-23  8:52 ` hubicka at gcc dot gnu.org
2024-04-23 10:41 ` redi at gcc dot gnu.org
2024-04-23 10:51 ` redi at gcc dot gnu.org
2024-04-23 11:01 ` redi at gcc dot gnu.org
2024-04-23 12:08 ` hubicka at gcc dot gnu.org
2024-04-23 12:33 ` redi at gcc dot gnu.org
2024-04-23 12:38 ` hubicka at gcc dot gnu.org
2024-04-23 12:41 ` hubicka at gcc dot gnu.org
2024-04-23 12:42 ` redi at gcc dot gnu.org
2024-04-23 13:11 ` redi at gcc dot gnu.org
2024-04-23 15:53 ` redi at gcc dot gnu.org
2024-04-24 14:23 ` hubicka at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).