public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view
@ 2023-01-20 22:20 Mark_B53 at yahoo dot com
  2023-01-21 11:48 ` [Bug tree-optimization/108487] [10/11/12/13 Regression] " amonakov at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Mark_B53 at yahoo dot com @ 2023-01-20 22:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

            Bug ID: 108487
           Summary: ~20-30x slowdown in populating std::vector from
                    std::ranges::iota_view
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Mark_B53 at yahoo dot com
  Target Milestone: ---

Using -std=c++20 -O3, comparing gcc 12.2 vs. gcc 10.3:
 * fn2 is 20-30x slower on gcc 12.2 (i.e. 2000-3000% more) 
 * fn1 is ~20% slower on gcc 12.2 

This test was run on an 52 core Intel Xeon Gold 6278C CPU.  Tests on
www.godbolt.org directionally align with these findings.  It seems the slowdown
was introduced in 10.4 & 11.1.  The trunk has identical performance to 12.2.

#include <vector>
#include <ranges>
#include <ctime>
#include <iostream>

__attribute__((noinline)) std::vector<int> fn1(int n)
{
    auto v = std::vector<int>(n);
    for(int i = 0; i != n; ++i)
        v[i] = i;
    return v;
}

__attribute__((noinline)) std::vector<int> fn2(int n)
{
    auto rng = std::ranges::iota_view{0, n};
    return std::vector<int>{rng.begin(), rng.end()};
}

int main() {
    int n = 100'000;
    int times = 100'000;

    auto t0 = std::clock();
    for (int i = 0; i < times; ++i)
        fn1(n);            
    auto t1 = std::clock();
    for (int i = 0; i < times; ++i)
        fn2(n);            
    auto t2 = std::clock();
    std::cout << t1 - t0 << '\n';
    std::cout << t2 - t1 << '\n';
    return 0;
}

P.S. 20% slowdown for a common vector population is still significant IMO.  I
am not sure that qualifies as a bug.  I did not file one on account of the
'fn1' slowdown.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
@ 2023-01-21 11:48 ` amonakov at gcc dot gnu.org
  2023-01-21 13:32 ` Mark_B53 at yahoo dot com
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-01-21 11:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |tree-optimization
           Keywords|                            |needs-bisection
            Summary|~20-30x slowdown in         |[10/11/12/13 Regression]
                   |populating std::vector from |~20-30x slowdown in
                   |std::ranges::iota_view      |populating std::vector from
                   |                            |std::ranges::iota_view
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Regarding fn1, would you mind re-running the test on your Xeon CPU with fn2
removed from the source code and -falign-loops=32 added to gcc command line?
For fn1, assembly of the inner loop should be identical, so I think the 20% you
were seeing may result from different loop alignment with respect to 32b fetch
boundary.

Also please note that cloud instances backing godbolt.org have different CPUs,
so timing results from different runs are not directly comparable.

Regarding fn2, this may partially be a library issue, compiling preprocessed
source from gcc-10.4 using gcc-10.2 also exhibits the problem. Inner loop
becomes significantly more complicated. Bisecting should be helpful.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
  2023-01-21 11:48 ` [Bug tree-optimization/108487] [10/11/12/13 Regression] " amonakov at gcc dot gnu.org
@ 2023-01-21 13:32 ` Mark_B53 at yahoo dot com
  2023-01-21 15:38 ` [Bug libstdc++/108487] " amonakov at gcc dot gnu.org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Mark_B53 at yahoo dot com @ 2023-01-21 13:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

--- Comment #2 from MARK BOURGEAULT <Mark_B53 at yahoo dot com> ---
>> For fn1, assembly of the inner loop should be identical, so I think the 20% you were seeing may result from different loop alignment with respect to 32b fetch boundary
Yes, it does appear that this is the explanation for the difference.  Here are
the full results:

original code
 * gcc 10.3 -std=c++20 -O3 => fn1 = ~2000ms, fn2 = ~1000ms
 * gcc 10.3 -std=c++20 -O3 -falign-loops=32 => fn1 = ~2000ms, fn2 = ~1000ms
 * gcc 12.2 -std=c++20 -O3 => fn1 = ~2500ms, fn2 = ~32000ms
 * gcc 12.2 -std=c++20 -O3 -falign-loops=32 => fn1 = ~2000ms, fn2 = ~32000ms

fn1 only
 * gcc 10.3 -std=c++20 -O3 => fn1 = ~2500ms
 * gcc 10.3 -std=c++20 -O3 -falign-loops=32 => fn1 = ~2000ms
 * gcc 12.2 -std=c++20 -O3 => fn1 = ~2000ms
 * gcc 12.2 -std=c++20 -O3 -falign-loops=32 => fn1 = ~2000ms

>> Also please note that cloud instances backing godbolt.org have different CPUs, so timing results from different runs are not directly comparable.
Yes, I know.  I really only used godbolt to reach the conclusion that the issue
still exists on trunk.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
  2023-01-21 11:48 ` [Bug tree-optimization/108487] [10/11/12/13 Regression] " amonakov at gcc dot gnu.org
  2023-01-21 13:32 ` Mark_B53 at yahoo dot com
@ 2023-01-21 15:38 ` amonakov at gcc dot gnu.org
  2023-01-21 23:45 ` redi at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-01-21 15:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |libstdc++

--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
libstdc++ uses a less efficient specialization of _M_range_initialize.

With 10.2 headers, we use

      template<typename _ForwardIterator>
 void
 _M_range_initialize(_ForwardIterator __first, _ForwardIterator __last,
       std::forward_iterator_tag)
 {
   const size_type __n = std::distance(__first, __last);
   this->_M_impl._M_start
     = this->_M_allocate(_S_check_init_len(__n, _M_get_Tp_allocator()));
   this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
   this->_M_impl._M_finish =
     std::__uninitialized_copy_a(__first, __last,
     this->_M_impl._M_start,
     _M_get_Tp_allocator());
 }

but with 10.4 headers, we use

      template<typename _InputIterator>
 void
 _M_range_initialize(_InputIterator __first, _InputIterator __last,
       std::input_iterator_tag)
 {
   try {
     for (; __first != __last; ++__first)

       emplace_back(*__first);

   } catch(...) {
     clear();
     throw;
   }
 }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
                   ` (2 preceding siblings ...)
  2023-01-21 15:38 ` [Bug libstdc++/108487] " amonakov at gcc dot gnu.org
@ 2023-01-21 23:45 ` redi at gcc dot gnu.org
  2023-01-21 23:50 ` redi at gcc dot gnu.org
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: redi at gcc dot gnu.org @ 2023-01-21 23:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
That's because iota_view's iterator is only an input iterator in gcc-10, as a
result of r10-9796-g1cb39945993c89

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
                   ` (3 preceding siblings ...)
  2023-01-21 23:45 ` redi at gcc dot gnu.org
@ 2023-01-21 23:50 ` redi at gcc dot gnu.org
  2023-01-21 23:50 ` redi at gcc dot gnu.org
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: redi at gcc dot gnu.org @ 2023-01-21 23:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=100070
           Keywords|needs-bisection             |

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
See https://cplusplus.github.io/LWG/issue3291 for the rationale.

See PR 100070 for suggestions to deal with such iterators better.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
                   ` (4 preceding siblings ...)
  2023-01-21 23:50 ` redi at gcc dot gnu.org
@ 2023-01-21 23:50 ` redi at gcc dot gnu.org
  2023-01-22  3:07 ` Mark_B53 at yahoo dot com
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: redi at gcc dot gnu.org @ 2023-01-21 23:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

--- Comment #6 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #4)
> That's because iota_view's iterator is only an input iterator in gcc-10,

I meant to say gcc-10.4 there.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
                   ` (5 preceding siblings ...)
  2023-01-21 23:50 ` redi at gcc dot gnu.org
@ 2023-01-22  3:07 ` Mark_B53 at yahoo dot com
  2023-01-22 10:33 ` Mark_B53 at yahoo dot com
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Mark_B53 at yahoo dot com @ 2023-01-22  3:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

--- Comment #7 from MARK BOURGEAULT <Mark_B53 at yahoo dot com> ---
>> See PR 100070 for suggestions to deal with such iterators better.

Unless I'm missing something, there's nothing in that PR that a *user* can do
to achieve the gcc 10.3 performance w/ std::iota_view.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
                   ` (6 preceding siblings ...)
  2023-01-22  3:07 ` Mark_B53 at yahoo dot com
@ 2023-01-22 10:33 ` Mark_B53 at yahoo dot com
  2023-01-22 17:14 ` redi at gcc dot gnu.org
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Mark_B53 at yahoo dot com @ 2023-01-22 10:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

--- Comment #8 from Mark Bourgeault <Mark_B53 at yahoo dot com> ---
What about something like this?

#if __cplusplus >= 201709L
      template<typename _InputIterator,
               typename = std::_RequireInputIter<_InputIterator>>
        vector(_InputIterator __first, _InputIterator __last,
               const allocator_type& __a = allocator_type())
        : _Base(__a)
        {
          struct PseudoRange { _InputIterator begin(); _InputIterator end(); };
          if constexpr (std::ranges::random_access_range<PseudoRange>) {
            _M_range_initialize(__first, __last,
                              std::random_access_iterator_tag());
          }
          else if constexpr (std::ranges::forward_range<PseudoRange>) {
            _M_range_initialize(__first, __last, std::forward_iterator_tag());
          }
          else {
            _M_range_initialize(__first, __last,
                              std::__iterator_category(__first));
        }
#else
  ...
#endif

-----

Here is a PoC:

#include <vector>
#include <ranges>
#include <iostream>

template<typename I>
void test(I first, I last) {
    struct PseudoRange { I begin(); I end(); };
    if constexpr (std::ranges::random_access_range<PseudoRange>) {
        std::cout << "RA\n"; 
    }
    else if constexpr (std::ranges::forward_range<PseudoRange>) {
        std::cout << "F\n"; 
    }
    else  {
        std::cout << "I\n"; 
    }
}

int main() {
    auto rng = std::ranges::iota_view{0, 10} | std::views::transform([](int i)
{ return i*i; });
    test(rng.begin(), rng.end());
    auto rng2 = std::ranges::iota_view{0, 10} | std::views::filter([](int i) {
return i%2; });
    test(rng2.begin(), rng2.end());
    return 0;
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
                   ` (7 preceding siblings ...)
  2023-01-22 10:33 ` Mark_B53 at yahoo dot com
@ 2023-01-22 17:14 ` redi at gcc dot gnu.org
  2023-01-23  7:45 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: redi at gcc dot gnu.org @ 2023-01-22 17:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

--- Comment #9 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I think we just want to dispatch on iterator concept not iterator category.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
                   ` (8 preceding siblings ...)
  2023-01-22 17:14 ` redi at gcc dot gnu.org
@ 2023-01-23  7:45 ` rguenth at gcc dot gnu.org
  2023-03-27 11:03 ` rguenth at gcc dot gnu.org
  2023-07-07 10:44 ` [Bug libstdc++/108487] [11/12/13/14 " rguenth at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-01-23  7:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |10.5
           Keywords|                            |missed-optimization

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
                   ` (9 preceding siblings ...)
  2023-01-23  7:45 ` rguenth at gcc dot gnu.org
@ 2023-03-27 11:03 ` rguenth at gcc dot gnu.org
  2023-07-07 10:44 ` [Bug libstdc++/108487] [11/12/13/14 " rguenth at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-03-27 11:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-03-27
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/108487] [11/12/13/14 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view
  2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
                   ` (10 preceding siblings ...)
  2023-03-27 11:03 ` rguenth at gcc dot gnu.org
@ 2023-07-07 10:44 ` rguenth at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.5                        |11.5

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 10 branch is being closed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-07-07 10:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-20 22:20 [Bug rtl-optimization/108487] New: ~20-30x slowdown in populating std::vector from std::ranges::iota_view Mark_B53 at yahoo dot com
2023-01-21 11:48 ` [Bug tree-optimization/108487] [10/11/12/13 Regression] " amonakov at gcc dot gnu.org
2023-01-21 13:32 ` Mark_B53 at yahoo dot com
2023-01-21 15:38 ` [Bug libstdc++/108487] " amonakov at gcc dot gnu.org
2023-01-21 23:45 ` redi at gcc dot gnu.org
2023-01-21 23:50 ` redi at gcc dot gnu.org
2023-01-21 23:50 ` redi at gcc dot gnu.org
2023-01-22  3:07 ` Mark_B53 at yahoo dot com
2023-01-22 10:33 ` Mark_B53 at yahoo dot com
2023-01-22 17:14 ` redi at gcc dot gnu.org
2023-01-23  7:45 ` rguenth at gcc dot gnu.org
2023-03-27 11:03 ` rguenth at gcc dot gnu.org
2023-07-07 10:44 ` [Bug libstdc++/108487] [11/12/13/14 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).