public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/110287] New: _M_check_len is expensive
@ 2023-06-16 14:43 hubicka at gcc dot gnu.org
  2023-06-16 14:48 ` [Bug libstdc++/110287] " hubicka at gcc dot gnu.org
                   ` (14 more replies)
  0 siblings, 15 replies; 17+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-06-16 14:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

            Bug ID: 110287
           Summary: _M_check_len is expensive
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

I am looking into ineffective codegen for loops controlled by std::vec based
stack (see testcase in PR109849). 

The problem is that we fail to inline enough of implementation of
std::push_back to fully SRA the stack which makes it very slow. The reason is
that _M_realloc_insert currently does not fit in inlining limits for -O2. It
seems to be inlined by clang.

The first issue is rather large _M_check_len that currently does:

size_type std::vector<std::pair<unsigned int, unsigned int> >::_M_check_len
(const struct vector * const this, size_type __n, const char * __s)
{
  const size_type __len;
  const long unsigned int D.27747;
  long unsigned int _1;
  long unsigned int __n.3_2;
  long unsigned int _3;
  size_type iftmp.4_4;
  long unsigned int _5;
  long unsigned int _8;
  long int _10;
  long int _13;
  struct pair * _14;
  struct pair * _15;
  const long unsigned int & _18;

  <bb 2> [local count: 1073741824]:
  _15 = this_7(D)->D.26707._M_impl.D.26014._M_finish;
  _14 = this_7(D)->D.26707._M_impl.D.26014._M_start;
  _13 = _15 - _14;
  _10 = _13 /[ex] 8;
  _8 = (long unsigned int) _10;
  _1 = 1152921504606846975 - _8;
  __n.3_2 = __n;
  if (_1 < __n.3_2)
    goto <bb 3>; [0.04%]
  else
    goto <bb 4>; [99.96%]

  <bb 3> [local count: 429496]:
  std::__throw_length_error (__s_16(D));

  <bb 4> [local count: 1073312329]:
  D.27747 = _8;
  if (__n.3_2 > _8)
    goto <bb 5>; [34.00%]
  else
    goto <bb 6>; [66.00%]

  <bb 5> [local count: 364926196]:

  <bb 6> [local count: 1073312330]:
  # _18 = PHI <&D.27747(4), &__n(5)>
  _3 = *_18;
  __len_11 = _3 + _8;
  D.27747 ={v} {CLOBBER(eol)};
  if (_8 > __len_11)
    goto <bb 8>; [35.00%]
  else
    goto <bb 7>; [65.00%]

  <bb 7> [local count: 697653013]:
  _5 = MIN_EXPR <__len_11, 1152921504606846975>;

  <bb 8> [local count: 1073312330]:
  # iftmp.4_4 = PHI <1152921504606846975(6), _5(7)>
  return iftmp.4_4;

}

Whis is used by _M_realloc_insert:

_20 = std::vector<std::pair<unsigned int, unsigned int> >::_M_check_len
(this_18(D), 1, "vector::_M_realloc_insert");

So _n is 1. The test:

  _1 = 1152921504606846975 - _8;
  __n.3_2 = __n;
  if (_1 < __n.3_2)
    goto <bb 3>; [0.04%]
  else
    goto <bb 4>; [99.96%]

  <bb 3> [local count: 429496]:
  std::__throw_length_error (__s_16(D));

Can IMO be never true, since we would need to have already allocated vector of
1152921504606846975 elements which will not fit in memory anyway.
This brings in the EH handling and error message.

Perhaps for constantly sized _M_check_len and constantly sizes vector elements
we can use __builtin_constant_p to avoid calling _M_check_len from
_M_realloc_insert and invent _M_safe_check_len that avoids this test.

(for IPA optimizations to work out that there will be no EH and call is cheaper
we need to have the test at caller side instead in _M_check_len).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
@ 2023-06-16 14:48 ` hubicka at gcc dot gnu.org
  2023-06-16 15:37 ` hubicka at gcc dot gnu.org
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-06-16 14:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

--- Comment #1 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Another problem is:

  D.27747 = _8;
  if (__n.3_2 > _8)
    goto <bb 5>; [34.00%]
  else
    goto <bb 6>; [66.00%]

  <bb 5> [local count: 364926196]:

  <bb 6> [local count: 1073312330]:
  # _18 = PHI <&D.27747(4), &__n(5)>
  _3 = *_18;

which is simply computation of maximum but done in a way not understood until
phiprop

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
  2023-06-16 14:48 ` [Bug libstdc++/110287] " hubicka at gcc dot gnu.org
@ 2023-06-16 15:37 ` hubicka at gcc dot gnu.org
  2023-06-17 12:55 ` redi at gcc dot gnu.org
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-06-16 15:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
With patch in PR110289 to optimize the std::max int MAX_EXPR and the throw
commented out I get:

size_type std::vector<std::pair<unsigned int, unsigned int> >::_M_check_len
(const struct vector * const this, size_type __n, const char * __s)
{
  const size_type __len;
  size_type iftmp.2_1;
  long unsigned int _2;
  long unsigned int _5;
  long unsigned int _7;
  struct pair * _8;
  struct pair * _9;
  long int _10;
  long int _11;
  long unsigned int _12;

  <bb 2> [local count: 1073741824]:
  _8 = this_4(D)->D.26707._M_impl.D.26014._M_finish;
  _9 = this_4(D)->D.26707._M_impl.D.26014._M_start;
  _10 = _8 - _9;
  _11 = _10 /[ex] 8;
  _12 = (long unsigned int) _11;
  _7 = __n;
  _5 = MAX_EXPR <_7, _12>;
  __len_6 = _5 + _12;
  if (__len_6 < _12)
    goto <bb 4>; [35.00%]
  else
    goto <bb 3>; [65.00%]

  <bb 3> [local count: 697932185]:
  _2 = MIN_EXPR <__len_6, 1152921504606846975>;

  <bb 4> [local count: 1073741824]:
  # iftmp.2_1 = PHI <1152921504606846975(2), _2(3)>
  return iftmp.2_1;

}

which fits early inlining limits. So modifying libstdc++ to avoid the throw
would be great.

Again optimized _M_check_len could avoid:
  <bb 3> [local count: 697932185]:
  _2 = MIN_EXPR <__len_6, 1152921504606846975>;

  <bb 4> [local count: 1073741824]:
  # iftmp.2_1 = PHI <1152921504606846975(2), _2(3)>
  return iftmp.2_1;

since __len can not get that large without running out of memory. Which would
help further inlining.

Testcase from PR109849 now runs in 0.39s while without these two changes it
runs in 0.528s, clang's build runs in 0.057s, so still a long way to go.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
  2023-06-16 14:48 ` [Bug libstdc++/110287] " hubicka at gcc dot gnu.org
  2023-06-16 15:37 ` hubicka at gcc dot gnu.org
@ 2023-06-17 12:55 ` redi at gcc dot gnu.org
  2023-06-17 13:01 ` redi at gcc dot gnu.org
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2023-06-17 12:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Do you mean something like this?

diff --git a/libstdc++-v3/include/bits/stl_vector.h
b/libstdc++-v3/include/bits/stl_vector.h
index 70ced3d101f..a4dbfeb8b5b 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1902,6 +1902,21 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
        return (__len < size() || __len > max_size()) ? max_size() : __len;
       }

+      // Called by _M_insert_aux etc.
+      _GLIBCXX20_CONSTEXPR
+      size_type
+      _M_check_len_1(const char* __s) const
+      {
+       if (__builtin_constant_p(size()))
+         {
+           if (size() == 0)
+             return 1;
+           else if (size() < max_size() / 2)
+             return size() * 2;
+         }
+       return _M_check_len(1, __s);
+      }
+
       // Called by constructors to check initial size.
       static _GLIBCXX20_CONSTEXPR size_type
       _S_check_init_len(size_type __n, const allocator_type& __a)
diff --git a/libstdc++-v3/include/bits/vector.tcc
b/libstdc++-v3/include/bits/vector.tcc
index acd11e2dc68..1f0b3123c3b 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -458,8 +458,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _M_realloc_insert(iterator __position, const _Tp& __x)
 #endif
     {
-      const size_type __len =
-       _M_check_len(size_type(1), "vector::_M_realloc_insert");
+      const size_type __len = _M_check_len_1("vector::_M_realloc_insert");
       pointer __old_start = this->_M_impl._M_start;
       pointer __old_finish = this->_M_impl._M_finish;
       const size_type __elems_before = __position - begin();
@@ -946,7 +945,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
       else
        {
          const size_type __len =
-           _M_check_len(size_type(1), "vector<bool>::_M_insert_aux");
+           _M_check_len_1("vector<bool>::_M_insert_aux");
          _Bit_pointer __q = this->_M_allocate(__len);
          iterator __start(std::__addressof(*__q), 0);
          iterator __i = _M_copy_aligned(begin(), __position, __start);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-06-17 12:55 ` redi at gcc dot gnu.org
@ 2023-06-17 13:01 ` redi at gcc dot gnu.org
  2023-06-18 22:41 ` hubicka at ucw dot cz
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2023-06-17 13:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #3)
> @@ -946,7 +945,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>        else
>         {
>           const size_type __len =
> -           _M_check_len(size_type(1), "vector<bool>::_M_insert_aux");
> +           _M_check_len_1("vector<bool>::_M_insert_aux");
>           _Bit_pointer __q = this->_M_allocate(__len);
>           iterator __start(std::__addressof(*__q), 0);
>           iterator __i = _M_copy_aligned(begin(), __position, __start);

Oops, this hunk is for vector<bool> which is different. Ignore this part.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-06-17 13:01 ` redi at gcc dot gnu.org
@ 2023-06-18 22:41 ` hubicka at ucw dot cz
  2023-06-19 10:11 ` redi at gcc dot gnu.org
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: hubicka at ucw dot cz @ 2023-06-18 22:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

--- Comment #5 from Jan Hubicka <hubicka at ucw dot cz> ---
> Do you mean something like this?
I sent my own version, but yours looks nicer.
> 
> diff --git a/libstdc++-v3/include/bits/stl_vector.h
> b/libstdc++-v3/include/bits/stl_vector.h
> index 70ced3d101f..a4dbfeb8b5b 100644
> --- a/libstdc++-v3/include/bits/stl_vector.h
> +++ b/libstdc++-v3/include/bits/stl_vector.h
> @@ -1902,6 +1902,21 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>         return (__len < size() || __len > max_size()) ? max_size() : __len;
>        }
> 
> +      // Called by _M_insert_aux etc.
> +      _GLIBCXX20_CONSTEXPR
> +      size_type
> +      _M_check_len_1(const char* __s) const
> +      {
> +       if (__builtin_constant_p(size()))
Perhaps ruling this out for 32bit ports? Or can we assume that half of
address space can never be allocated in single block?
> +         {
> +           if (size() == 0)
> +             return 1;
> +           else if (size() < max_size() / 2)
I think even this conditional is safe to be assumed to be true,
since we can not allocate half of 64bit address space.  
In general it is importnat to not fall through to _M_check_len.

As I noticed, we may want to use assume attribute to make clear that
retval <= max_size ()
to avoid other unnecesary throws downstream.

Honza

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-06-18 22:41 ` hubicka at ucw dot cz
@ 2023-06-19 10:11 ` redi at gcc dot gnu.org
  2023-06-19 10:23   ` Jan Hubicka
  2023-06-19 10:24 ` hubicka at ucw dot cz
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2023-06-19 10:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

--- Comment #6 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #5)
> > Do you mean something like this?
> I sent my own version, but yours looks nicer.
> > 
> > diff --git a/libstdc++-v3/include/bits/stl_vector.h
> > b/libstdc++-v3/include/bits/stl_vector.h
> > index 70ced3d101f..a4dbfeb8b5b 100644
> > --- a/libstdc++-v3/include/bits/stl_vector.h
> > +++ b/libstdc++-v3/include/bits/stl_vector.h
> > @@ -1902,6 +1902,21 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >         return (__len < size() || __len > max_size()) ? max_size() : __len;
> >        }
> > 
> > +      // Called by _M_insert_aux etc.
> > +      _GLIBCXX20_CONSTEXPR
> > +      size_type
> > +      _M_check_len_1(const char* __s) const
> > +      {
> > +       if (__builtin_constant_p(size()))
> Perhaps ruling this out for 32bit ports? Or can we assume that half of
> address space can never be allocated in single block?

No, I don't think we can assume that.

> > +         {
> > +           if (size() == 0)
> > +             return 1;
> > +           else if (size() < max_size() / 2)
> I think even this conditional is safe to be assumed to be true,
> since we can not allocate half of 64bit address space.  

There is no guarantee that std::vector<T, A>::max_size() is PTRDIFF_MAX. It
depends on the Allocator type, A. A user-defined allocator could have
max_size() == 100.


> In general it is importnat to not fall through to _M_check_len.
> 
> As I noticed, we may want to use assume attribute to make clear that
> retval <= max_size ()
> to avoid other unnecesary throws downstream.
> 
> Honza

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-19 10:11 ` redi at gcc dot gnu.org
@ 2023-06-19 10:23   ` Jan Hubicka
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Hubicka @ 2023-06-19 10:23 UTC (permalink / raw)
  To: redi at gcc dot gnu.org; +Cc: gcc-bugs

> 
> There is no guarantee that std::vector<T, A>::max_size() is PTRDIFF_MAX. It
> depends on the Allocator type, A. A user-defined allocator could have
> max_size() == 100.

If inliner we see path to the throw functions, it will not determine
_M_check_len as early inlinable.
Perhaps we can __builtin_constant_p it as well and check that
max_size () * sizeof ()
is close to ptrdiff_max.  

Thanks for the comments on the patches.  I will try to update the patch.

I was wondering about the allocators. As shown in the mail, optimiznig
_M_check_len still leaves two independent throws for insanely large
ops.  Since allocator is user replaceable, I guess we can not add new
member function for safe_allocate or so.

We can use __builtin_unreachable to set the value range on the return
value.  For that to work during early optimizations we need 

 1) extend early VRP to retrofit the value determined by
    __builtin_unreachable to the SSA name defned earlier based on fact
    that the execution can not legally terminate in between
 2) teaching inliner to ignore conditionals guaring __builtin_unreacable
 3) add support for return functions to propagate the value range from
    _M_check_len to _M_reallocate_insert.
    so it is correctly propagated to allocator call.

This is not very easy, but can be generally useful elsewhere.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-06-19 10:11 ` redi at gcc dot gnu.org
@ 2023-06-19 10:24 ` hubicka at ucw dot cz
  2023-06-23 16:09 ` hubicka at gcc dot gnu.org
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: hubicka at ucw dot cz @ 2023-06-19 10:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

--- Comment #7 from Jan Hubicka <hubicka at ucw dot cz> ---
> 
> There is no guarantee that std::vector<T, A>::max_size() is PTRDIFF_MAX. It
> depends on the Allocator type, A. A user-defined allocator could have
> max_size() == 100.

If inliner we see path to the throw functions, it will not determine
_M_check_len as early inlinable.
Perhaps we can __builtin_constant_p it as well and check that
max_size () * sizeof ()
is close to ptrdiff_max.  

Thanks for the comments on the patches.  I will try to update the patch.

I was wondering about the allocators. As shown in the mail, optimiznig
_M_check_len still leaves two independent throws for insanely large
ops.  Since allocator is user replaceable, I guess we can not add new
member function for safe_allocate or so.

We can use __builtin_unreachable to set the value range on the return
value.  For that to work during early optimizations we need 

 1) extend early VRP to retrofit the value determined by
    __builtin_unreachable to the SSA name defned earlier based on fact
    that the execution can not legally terminate in between
 2) teaching inliner to ignore conditionals guaring __builtin_unreacable
 3) add support for return functions to propagate the value range from
    _M_check_len to _M_reallocate_insert.
    so it is correctly propagated to allocator call.

This is not very easy, but can be generally useful elsewhere.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-06-19 10:24 ` hubicka at ucw dot cz
@ 2023-06-23 16:09 ` hubicka at gcc dot gnu.org
  2023-11-19 15:14 ` hubicka at gcc dot gnu.org
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-06-23 16:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
Bug 110287 depends on bug 110289, which changed state.

Bug 110289 Summary: Phiprop may be good idea in early opts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110289

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-06-23 16:09 ` hubicka at gcc dot gnu.org
@ 2023-11-19 15:14 ` hubicka at gcc dot gnu.org
  2023-11-19 15:46 ` hubicka at gcc dot gnu.org
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-11-19 15:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
With return value range propagation
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637265.html
reduces --param max-inline-insns-auto needed for _M_realloc_insert to be
inlined on my testcase from 39 to 35.

This is done by eliminating two unnecesary trow calls by propagating fact that
check_len does not return incredibly large values.

Default inline limit at -O3 is 30, so we are not that far and I think we really
ought to solve this for next release since push_back is such a common case.

Is it known that check_len can not return 0 in this situation? Adding 
if (ret <= 0)
  __builtin_unreachable
saves another 2 instructions because _M_realloc_insert otherwise contain a code
path for case that vector gets increased to 0 elements.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-11-19 15:14 ` hubicka at gcc dot gnu.org
@ 2023-11-19 15:46 ` hubicka at gcc dot gnu.org
  2023-11-21 13:36 ` hubicka at gcc dot gnu.org
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-11-19 15:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

--- Comment #9 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
This is _M_realloc insert at release_ssa time:

eleased 63 names, 165.79%, removed 63 holes
void std::vector<pair_t>::_M_realloc_insert<const pair_t&> (struct vector *
const this, struct iterator __position, const struct pair_t & __args#0)
{ 
  struct pair_t * const __position;
  struct pair_t * __new_finish;
  struct pair_t * __old_finish;
  struct pair_t * __old_start;
  long unsigned int _1; 
  struct pair_t * _2;
  struct pair_t * _3;
  long int _4;
  long unsigned int _5;
  struct pair_t * _6;
  const size_type _10;
  long int _13;
  struct pair_t * iftmp.5_15;
  struct pair_t * _17; 
  struct _Vector_impl * _18;
  long unsigned int _22;
  long int _23;
  long unsigned int _24;
  long unsigned int _25;
  struct pair_t * _26;
  long unsigned int _36;

  <bb 2> [local count: 1073741824]:
  __position_27 = MEM[(struct __normal_iterator *)&__position];
  _10 = std::vector<pair_t>::_M_check_len (this_8(D), 1,
"vector::_M_realloc_insert");
  __old_start_11 = this_8(D)->D.25975._M_impl.D.25282._M_start;
  __old_finish_12 = this_8(D)->D.25975._M_impl.D.25282._M_finish;
  _13 = __position_27 - __old_start_11;
  if (_10 != 0)
    goto <bb 3>; [54.67%]
  else
    goto <bb 4>; [45.33%]

  <bb 3> [local count: 587014656]:
  _18 = &MEM[(struct _Vector_base *)this_8(D)]._M_impl;
  _17 = std::__new_allocator<pair_t>::allocate (_18, _10, 0B);

  <bb 4> [local count: 1073741824]:
  # iftmp.5_15 = PHI <0B(2), _17(3)>
  _1 = (long unsigned int) _13;
  _2 = iftmp.5_15 + _1;
  *_2 = *__args#0_14(D);
  if (_13 > 0)
    goto <bb 5>; [41.48%]
  else
    goto <bb 6>; [58.52%]

  <bb 5> [local count: 445388112]:
  __builtin_memmove (iftmp.5_15, __old_start_11, _1);

  <bb 6> [local count: 1073741824]:
  _36 = _1 + 8;
  __new_finish_16 = iftmp.5_15 + _36;
  _23 = __old_finish_12 - __position_27;
  if (_23 > 0)
    goto <bb 7>; [41.48%]
  else
    goto <bb 8>; [58.52%]

  <bb 7> [local count: 445388112]:
  _24 = (long unsigned int) _23;
  __builtin_memcpy (__new_finish_16, __position_27, _24);

  <bb 8> [local count: 1073741824]:
  _25 = (long unsigned int) _23;
  _26 = __new_finish_16 + _25;
  _3 = this_8(D)->D.25975._M_impl.D.25282._M_end_of_storage;
  _4 = _3 - __old_start_11;
  if (__old_start_11 != 0B)
    goto <bb 9>; [53.47%]
  else
    goto <bb 10>; [46.53%]

  <bb 9> [local count: 574129752]:
  _22 = (long unsigned int) _4;
  operator delete (__old_start_11, _22);

  <bb 10> [local count: 1073741824]:
  this_8(D)->D.25975._M_impl.D.25282._M_start = iftmp.5_15;
  this_8(D)->D.25975._M_impl.D.25282._M_finish = _26;
  _5 = _10 * 8;
  _6 = iftmp.5_15 + _5;
  this_8(D)->D.25975._M_impl.D.25282._M_end_of_storage = _6;
  return;

}

First it is not clear to me why we need memmove at all?

So first issue is:
  <bb 2> [local count: 1073741824]:
  __position_27 = MEM[(struct __normal_iterator *)&__position];
  _10 = std::vector<pair_t>::_M_check_len (this_8(D), 1,
"vector::_M_realloc_insert");
  __old_start_11 = this_8(D)->D.25975._M_impl.D.25282._M_start;
  __old_finish_12 = this_8(D)->D.25975._M_impl.D.25282._M_finish;
  _13 = __position_27 - __old_start_11;
  if (_10 != 0)
    goto <bb 3>; [54.67%]
  else
    goto <bb 4>; [45.33%]

Without inlining _M_check_len early we can not work out return value range,
since we need to know that paramter 2 is 1 and not 0.
Adding __builtin_unreachable check after helps to reduce

if (_10 != 0)

but I need to do something about inliner accounting the conditional to function
body size.

  <bb 4> [local count: 1073741824]:
  # iftmp.5_15 = PHI <0B(2), _17(3)>
  _1 = (long unsigned int) _13;
  _2 = iftmp.5_15 + _1;
  *_2 = *__args#0_14(D);
  if (_13 > 0)
    goto <bb 5>; [41.48%]
  else
    goto <bb 6>; [58.52%]

  <bb 5> [local count: 445388112]:
  __builtin_memmove (iftmp.5_15, __old_start_11, _1);

Is this code about inserting value to the middle?  Since push_back always
initializes iterator to point to the end, this seems quite sily to do.
Can't we do somehting like _M_realloc_append?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-11-19 15:46 ` hubicka at gcc dot gnu.org
@ 2023-11-21 13:36 ` hubicka at gcc dot gnu.org
  2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-11-21 13:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-11-21

--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
We now produce reasonable code for _M_check_len and propagate value range of
the return value. This helps us to notice that later allocator call will not
throw exception on invalid size, so we are down from 3 throw calls to one.

Current code is:

size_type std::vector<pair_t>::_M_check_len (const struct vector * const this,
size_type __n, const char * __s)
{
  const size_type __len;
  long unsigned int _1;
  long unsigned int __n.3_2;
  size_type iftmp.4_3;
  long unsigned int _4;
  long unsigned int _7;
  long unsigned int _8;
  long int _9;
  long int _11;
  struct pair_t * _12;
  struct pair_t * _13;

  <bb 2> [local count: 1073741824]:
  _13 = this_6(D)->D.26060._M_impl.D.25361._M_finish;
  _12 = this_6(D)->D.26060._M_impl.D.25361._M_start;
  _11 = _13 - _12;
  _9 = _11 /[ex] 8;
  _7 = (long unsigned int) _9;
  _1 = 1152921504606846975 - _7;
  __n.3_2 = __n;
  if (_1 < __n.3_2)
    goto <bb 3>; [0.00%]
  else
    goto <bb 4>; [100.00%]

  <bb 3> [count: 0]:
  std::__throw_length_error (__s_14(D));

  <bb 4> [local count: 1073741824]:
  _8 = MAX_EXPR <__n.3_2, _7>;
  __len_10 = _7 + _8;
  if (_7 > __len_10)
    goto <bb 6>; [35.00%]
  else
    goto <bb 5>; [65.00%]

  <bb 5> [local count: 697932184]:
  _4 = MIN_EXPR <__len_10, 1152921504606846975>;

  <bb 6> [local count: 1073741824]:
  # iftmp.4_3 = PHI <1152921504606846975(4), _4(5)>
  return iftmp.4_3;

}
I still think we could play games with the 2^63 being too large for the
standard allocator and turn __throw_length_error into __builtin_unreachable for
that case. This would help early inliner to inline this function and save some
throw calls in real code.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-11-21 13:36 ` hubicka at gcc dot gnu.org
@ 2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
  2023-11-21 15:12 ` hubicka at gcc dot gnu.org
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-21 14:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:1d82fc2e6824bf83159389729c31a942f7b91b04

commit r14-5679-g1d82fc2e6824bf83159389729c31a942f7b91b04
Author: Jan Hubicka <jh@suse.cz>
Date:   Tue Nov 21 15:17:16 2023 +0100

    optimize std::vector::push_back

    this patch speeds up the push_back at -O3 significantly by making the
    reallocation to be inlined by default.  _M_realloc_insert is general
    insertion that takes iterator pointing to location where the value
    should be inserted.  As such it contains code to move other entries around
    that is quite large.

    Since appending to the end of array is common operation, I think we should
    have specialized code for that.  Sadly it is really hard to work out this
    from IPA passes, since we basically care whether the iterator points to
    the same place as the end pointer, which are both passed by reference.
    This is inter-procedural value numbering that is quite out of reach.

    I also added extra check making it clear that the new length of the vector
    is non-zero.  This saves extra conditionals.  Again it is quite hard case
    since _M_check_len seem to be able to return 0 if its parameter is 0.
    This never happens here, but we are not able to propagate this early nor
    at IPA stage.

    libstdc++-v3/ChangeLog:

            PR libstdc++/110287
            PR middle-end/109811
            PR middle-end/109849
            * include/bits/stl_vector.h (_M_realloc_append): New member
function.
            (push_back): Use it.
            * include/bits/vector.tcc: (emplace_back): Use it.
            (_M_realloc_insert): Let compiler know that new vector size is
non-zero.
            (_M_realloc_append): New member function.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
@ 2023-11-21 15:12 ` hubicka at gcc dot gnu.org
  2023-11-27  9:44 ` rguenth at gcc dot gnu.org
  2023-11-27 14:39 ` rguenth at gcc dot gnu.org
  14 siblings, 0 replies; 17+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-11-21 15:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
Bug 110287 depends on bug 110377, which changed state.

Bug 110377 Summary: Early VRP and IPA-PROP should work out value ranges from __builtin_unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110377

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-11-21 15:12 ` hubicka at gcc dot gnu.org
@ 2023-11-27  9:44 ` rguenth at gcc dot gnu.org
  2023-11-27 14:39 ` rguenth at gcc dot gnu.org
  14 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-27  9:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
Bug 110287 depends on bug 112706, which changed state.

Bug 112706 Summary: missed simplification in FRE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112706

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/110287] _M_check_len is expensive
  2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-11-27  9:44 ` rguenth at gcc dot gnu.org
@ 2023-11-27 14:39 ` rguenth at gcc dot gnu.org
  14 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-27 14:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
Bug 110287 depends on bug 112653, which changed state.

Bug 112653 Summary: PTA should handle correctly escape information of values returned by a function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-11-27 14:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-16 14:43 [Bug libstdc++/110287] New: _M_check_len is expensive hubicka at gcc dot gnu.org
2023-06-16 14:48 ` [Bug libstdc++/110287] " hubicka at gcc dot gnu.org
2023-06-16 15:37 ` hubicka at gcc dot gnu.org
2023-06-17 12:55 ` redi at gcc dot gnu.org
2023-06-17 13:01 ` redi at gcc dot gnu.org
2023-06-18 22:41 ` hubicka at ucw dot cz
2023-06-19 10:11 ` redi at gcc dot gnu.org
2023-06-19 10:23   ` Jan Hubicka
2023-06-19 10:24 ` hubicka at ucw dot cz
2023-06-23 16:09 ` hubicka at gcc dot gnu.org
2023-11-19 15:14 ` hubicka at gcc dot gnu.org
2023-11-19 15:46 ` hubicka at gcc dot gnu.org
2023-11-21 13:36 ` hubicka at gcc dot gnu.org
2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
2023-11-21 15:12 ` hubicka at gcc dot gnu.org
2023-11-27  9:44 ` rguenth at gcc dot gnu.org
2023-11-27 14:39 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).