public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list
@ 2011-09-01  0:42 justin at fathomdb dot com
  2011-09-01  9:07 ` [Bug libstdc++/50257] [C++0x] " paolo.carlini at oracle dot com
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: justin at fathomdb dot com @ 2011-09-01  0:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

             Bug #: 50257
           Summary: unordered_map slow initialization due to huge
                    __prime_list
    Classification: Unclassified
           Product: gcc
           Version: 4.6.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: justin@fathomdb.com


The constructor for std::unordered_map calls _Prime_rehash_policy::_M_next_bkt
to determine the initial size of the hashtable.  That method computes the size
by looking for the next largest number in a large list of primes (in
src/hashtable-aux.cc), which it does using lower_bound.  However, this list is
very big - I counted it as 310 elements of 8 bytes each on a 64 bit machine, or
~ 2.4KB, accessed in a way (lower_bound) that probably causes lots of cache
misses.  This lower_bound call comes up as a performance hotspot on my
profiles.

I think that:

1) the initial granularity is probably overkill.  The inital entries are:
 2ul, 3ul, 5ul, 7ul, 11ul, 13ul, 17ul, 19ul, 23ul, 29ul, 31ul,
    37ul, 41ul, 43ul, 47ul, 53ul, 59ul, 61ul, 67ul, 71ul, 73ul, 79ul,
    83ul, 89ul, 97ul, 103ul, 109ul, 113ul, 127ul, 137ul, 139ul, 149ul,
    157ul, 167ul, 179ul, 193ul, 199ul, 211ul, 227ul, 241ul, 257ul,...
Overall performance would probably be better/comparable with e.g. 11ul, 31ul,
127ul, 241ul.

2) we should special-case the first values, or (at the very least) the value
when size is specified in the unordered_map constructor (I think that's 10).

I also found this prime list, which looks a lot more reasonable in terms of the
density: http://gcc.gnu.org/ml/gcc-patches/2009-05/msg01293.html

One of the problems is that this list of primes serves three purposes:
1) Initial allocation when no size is specified (for which it is slow)
2) Initial allocation when the user is specifying the size (when we probably
want a size as close as possible to the target number)
3) Resizing (when again we want a size as close as possible to the size we've
computed)

I think that this is probably a good list for purpose #2 and #3, but it is
pretty painful for #1.  Unfortunately #1 is the probably the most common use
case, I would think.  An alternative fix would be to have two different
_Prime_rehash_policy strategies: one for people that know the size of their
hashtable, and one for people that just want sensible defaults.  I don't think
_Prime_rehash_policy is a good match for the second group, and I suspect the
second group is the majority.

Is this the right place to raise this, or would the mailing list be better?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
@ 2011-09-01  9:07 ` paolo.carlini at oracle dot com
  2011-09-01  9:24 ` paolo.carlini at oracle dot com
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: paolo.carlini at oracle dot com @ 2011-09-01  9:07 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

Paolo Carlini <paolo.carlini at oracle dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|unordered_map slow          |[C++0x] unordered_map slow
                   |initialization due to huge  |initialization due to huge
                   |__prime_list                |__prime_list

--- Comment #1 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-01 09:06:47 UTC ---
Indeed, this is the right place to ask, but also consider the mailing list for
anything not strictly speaking a bug but an enhancement. Anyway, to begin with,
we can definitely special case the default value of 10 and avoid the
lower_bound call, seems a sensible optimization anyway, but to be honest I'm a
bit surprised that you can see that in the profile vs, eg, the time spent
allocating memory in _M_allocate_buckets immediately afterward. Can you clarify
this?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
  2011-09-01  9:07 ` [Bug libstdc++/50257] [C++0x] " paolo.carlini at oracle dot com
@ 2011-09-01  9:24 ` paolo.carlini at oracle dot com
  2011-09-01  9:27 ` paolo.carlini at oracle dot com
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: paolo.carlini at oracle dot com @ 2011-09-01  9:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #2 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-01 09:23:59 UTC ---
Created attachment 25156
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25156
Mainline patchlet optimizing for 10


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
  2011-09-01  9:07 ` [Bug libstdc++/50257] [C++0x] " paolo.carlini at oracle dot com
  2011-09-01  9:24 ` paolo.carlini at oracle dot com
@ 2011-09-01  9:27 ` paolo.carlini at oracle dot com
  2011-09-01 11:32 ` paolo.carlini at oracle dot com
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: paolo.carlini at oracle dot com @ 2011-09-01  9:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #3 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-01 09:27:04 UTC ---
I just attached a patchlet (vs mainline, 4.6.x may require a bit of tweaking,
the patch is trivial anyway) implementing the straightforward optimization for
the value of 10. Please let me know how it does on your code.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (2 preceding siblings ...)
  2011-09-01  9:27 ` paolo.carlini at oracle dot com
@ 2011-09-01 11:32 ` paolo.carlini at oracle dot com
  2011-09-01 17:06 ` justin at fathomdb dot com
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: paolo.carlini at oracle dot com @ 2011-09-01 11:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #4 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-01 11:31:42 UTC ---
Created attachment 25158
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25158
Alternate patchlet

Or we can do this, which optimizes 0 too, should be also useful.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (3 preceding siblings ...)
  2011-09-01 11:32 ` paolo.carlini at oracle dot com
@ 2011-09-01 17:06 ` justin at fathomdb dot com
  2011-09-01 17:33 ` justin at fathomdb dot com
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: justin at fathomdb dot com @ 2011-09-01 17:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #5 from Justin SB <justin at fathomdb dot com> 2011-09-01 17:06:04 UTC ---
Created attachment 25167
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25167
Test case

Test case that stresses unordered_map construction


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (4 preceding siblings ...)
  2011-09-01 17:06 ` justin at fathomdb dot com
@ 2011-09-01 17:33 ` justin at fathomdb dot com
  2011-09-01 17:54 ` paolo.carlini at oracle dot com
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: justin at fathomdb dot com @ 2011-09-01 17:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #6 from Justin SB <justin at fathomdb dot com> 2011-09-01 17:33:32 UTC ---
Thanks so much for the help.  I created a test case (attached) and tried out
the patch (the second version); it is a big improvement.

Here are the timings:
g++ -g -O3 --std=c++0x test.cc

Without patch:
user    0m13.201s

With patch:
user    0m10.237s

---

The special casing is obviously a bit of a hack, so I wondered if we could do
better by doing a linear scan if the value is less than a threshold (which
would have a much nicer memory pattern).  It was a small improvement, but
nowhere near what the special-casing of the value 10 does:

With linear scan for small n:
user    0m12.561s

I therefore believe the real win is because the special case allows the
compiler to optimize away the entire lookup in __prime_list, because the
function is inlined and the value of n is known.

---

There is a comment in hashtable_policy:

  // XXX This is a hack.  There's no good reason for any of
  // _Prime_rehash_policy's member functions to be inline.

However, I think that (with the patch) this isn't true any more.  There's a big
performance win by avoiding touching __prime_list, so at least the special case
should remain inline.

---

Based on your malloc comment, I'm using tcmalloc which is a bit faster than the
default allocator, which is probably why it showed up; the patch still knocks
the same ~3 seconds off the time:
g++ -g -O3 --std=c++0x test.cc -ltcmalloc

Without patch, with -ltcmalloc
user    0m10.173s

With patch, with -ltcmalloc
user    0m7.288s


---

In terms of the (non cumulative) cpu cycles spent in the lookup vs new/delete:

With -ltcmalloc:
Without patch: 44% iter() / 48% memory.
With patch: 34% iter() / 63% memory

With stock malloc:
Without patch: 38% iter() / 58% memory
With patch: 18% iter() / 79% memory

So memory is indeed a big cost, but as I was using tcmalloc I was able to see
that the constructor was surprisingly expensive (spending about as much time in
the constructor as I was in allocation or deallocation)

---

TLDR: The patch works for me, and I think it's a significant win, because it
allows the compiler to optimize __prime_list away entirely.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (5 preceding siblings ...)
  2011-09-01 17:33 ` justin at fathomdb dot com
@ 2011-09-01 17:54 ` paolo.carlini at oracle dot com
  2011-09-05 15:36 ` paolo.carlini at oracle dot com
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: paolo.carlini at oracle dot com @ 2011-09-01 17:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

Paolo Carlini <paolo.carlini at oracle dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2011-09-01
         AssignedTo|unassigned at gcc dot       |paolo.carlini at oracle dot
                   |gnu.org                     |com
   Target Milestone|---                         |4.7.0
     Ever Confirmed|0                           |1

--- Comment #7 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-01 17:54:14 UTC ---
Ah, very good. Thanks a lot for all your help. Indeed, I'm seeing similar
numbers on my machine. Thus, I would say, let's go with this kind of tweak for
now: indeed, it looks like an hack, but having default construction, move
constructor, faster is important, and as you noticed, it's also about having
the common case inline, vs the out of line lower_bound call. Also, I should
add, it's an hack, but I mild one, in my opinion, because we are doing it at an
abstraction level, _Prime_rehash_policy, which is exactly the same at which the
__prime_list is managed, thus we know for sure it begins by 2, etc. Only 10
it's a bit out of the blue at this level. Anyway, with another contributor, we
are also in the process of improving, streamlining, these _Prime_rehash_policy
functions for other reasons, I'll see later on if we can (re-)implement this
kind of tweak in a better way.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (6 preceding siblings ...)
  2011-09-01 17:54 ` paolo.carlini at oracle dot com
@ 2011-09-05 15:36 ` paolo.carlini at oracle dot com
  2011-09-06  4:04 ` justin at fathomdb dot com
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: paolo.carlini at oracle dot com @ 2011-09-05 15:36 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #8 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-05 15:35:49 UTC ---
Created attachment 25198
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25198
Yet another try

I'm also considering this variant: I guess overall I like it a tad better, in
particular for large number we add a comparison instead of two. And indeed an
uniform treatment for small numbers which also leads to __prime_list + 5.

What do you think?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (7 preceding siblings ...)
  2011-09-05 15:36 ` paolo.carlini at oracle dot com
@ 2011-09-06  4:04 ` justin at fathomdb dot com
  2011-09-06  9:16 ` paolo.carlini at oracle dot com
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: justin at fathomdb dot com @ 2011-09-06  4:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #9 from Justin SB <justin at fathomdb dot com> 2011-09-06 04:04:12 UTC ---
I think I personally prefer the original patches for readability, but I guess
it all depends on what gets optimized away.  But that caused me to think: how
about relying on constexpr?  We should be able to precompute the lookup for any
constant, while having the original runtime behaviour (no extra checks) for any
non-constant.

I tried the code below, and while it seems to be correct and satisfy the
constexpr rules, it doesn't get optimized away to a constant with gcc 4.6 (and
I'm ashamed to say I can't get the git version of gcc to build at the moment)

If this could be made to work, I think this could be a great approach, in that
we're then not trading off the constant/non-constant cases against each other.

(Optimizing small non-constexpr values as you've done in this latest patch
might also be helpful for performance, but I don't know if that really happens
much in practice; I suspect if a caller goes to the trouble of specifying a
small size they know that the hash is very unlikely to grow beyond that size)

---

#include <iostream>

using namespace std;

constexpr int primes[] = { 2, 3, 5, 7, 11, 13}; 

constexpr int primes_lower_bound(int __first, int __last, int __val)
{
  return (__first == __last)
    ? primes[__first]
    : (primes[__first + ((__last - __first) >> 1)] < __val)
      ? primes_lower_bound(__first + ((__last - __first) >> 1) + 1, __last,
__val) 
      : primes_lower_bound(__first, __first + ((__last - __first) >> 1),
__val);
}

static_assert( primes_lower_bound( 0, 6, 10) == 11, "Sanity check error");

constexpr int next_prime(int k) {
return primes_lower_bound( 0, 6, k);
}

int main() {
cout << next_prime(10) << endl;
return 0;
}


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (8 preceding siblings ...)
  2011-09-06  4:04 ` justin at fathomdb dot com
@ 2011-09-06  9:16 ` paolo.carlini at oracle dot com
  2011-09-06  9:24 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: paolo.carlini at oracle dot com @ 2011-09-06  9:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #10 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-06 09:14:40 UTC ---
Unfortunately constexpr functions are not going to help here, because of course
a call of such a function is never optimized to a constant unless *the
arguments* are all compile-time constants. In the meanwhile I timed my last
patchlet and definitely isn't worse than the other for the simple testcase we
have, otherwise I think it's much superior from any other point of view
(besides maybe readability but actually I'm using a rather common pattern),
thus for now I guess I'm going ahead with it. Thanks again.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (9 preceding siblings ...)
  2011-09-06  9:16 ` paolo.carlini at oracle dot com
@ 2011-09-06  9:24 ` jakub at gcc dot gnu.org
  2011-09-06  9:40 ` paolo.carlini at oracle dot com
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-09-06  9:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-09-06 09:23:59 UTC ---
You could perhaps do
  if (__builtin_constant_p (n))
    {
      if (n < X1)
        r = P1;
      else if (n < X2)
        r = P2;
      else if (n < X3)
        r = P3;
      ...
    }
  else
    what_you_do_currently.
Then if the argument is seen to be a constant (can be even through
optimizations), it will be optimized at compile time, otherwise it will be done
at runtime.  The drawbacks might be that you have the table in two places and
if inlining estimations don't work correctly, the inline function might be
considered too large.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (10 preceding siblings ...)
  2011-09-06  9:24 ` jakub at gcc dot gnu.org
@ 2011-09-06  9:40 ` paolo.carlini at oracle dot com
  2011-09-06 10:23 ` paolo at gcc dot gnu.org
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: paolo.carlini at oracle dot com @ 2011-09-06  9:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #12 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-06 09:39:47 UTC ---
Thanks Jakub, for now we'd rather keep the library __builtin_constant_p-free (a
couple of weeks ago we had fun with Marc Glisse recollecting all the flames on
the mailing list the last time somebody tried to add one elsewhere ;) Really, I
think here most of the improvement performance-wise comes from not calling the
out-of-line lower_bound on the whole __prime_list (normally in the profile the
malloc call is much higher) thus I say let's delay for now further optimization
for constant values (would be 0 and 10, I guess),


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (11 preceding siblings ...)
  2011-09-06  9:40 ` paolo.carlini at oracle dot com
@ 2011-09-06 10:23 ` paolo at gcc dot gnu.org
  2011-09-06 10:27 ` paolo.carlini at oracle dot com
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: paolo at gcc dot gnu.org @ 2011-09-06 10:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #13 from paolo at gcc dot gnu.org <paolo at gcc dot gnu.org> 2011-09-06 10:22:27 UTC ---
Author: paolo
Date: Tue Sep  6 10:22:21 2011
New Revision: 178581

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=178581
Log:
2011-09-06  Paolo Carlini  <paolo.carlini@oracle.com>

    PR libstdc++/50257
    * include/bits/hashtable_policy.h (_Prime_rehash_policy::
       _M_next_bkt): Optimize for small argument.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/hashtable_policy.h


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (12 preceding siblings ...)
  2011-09-06 10:23 ` paolo at gcc dot gnu.org
@ 2011-09-06 10:27 ` paolo.carlini at oracle dot com
  2011-09-06 15:57 ` justin at fathomdb dot com
  2011-09-06 16:45 ` paolo.carlini at oracle dot com
  15 siblings, 0 replies; 17+ messages in thread
From: paolo.carlini at oracle dot com @ 2011-09-06 10:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

Paolo Carlini <paolo.carlini at oracle dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #14 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-06 10:26:59 UTC ---
I'm closing this, further improvements as enhancements.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (13 preceding siblings ...)
  2011-09-06 10:27 ` paolo.carlini at oracle dot com
@ 2011-09-06 15:57 ` justin at fathomdb dot com
  2011-09-06 16:45 ` paolo.carlini at oracle dot com
  15 siblings, 0 replies; 17+ messages in thread
From: justin at fathomdb dot com @ 2011-09-06 15:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #15 from Justin SB <justin at fathomdb dot com> 2011-09-06 15:55:31 UTC ---
That patch works for me - thanks;  gcc optimizes away the array lookup.

I do still think that constexpr could be helpful here: every lookup for an
explicit (or default) unordered_map size would then be optimized away (not just
those <= 11), but constexpr is still new to me so I'm not sure of the
behavioural nuances.  I do think it should definitely be optimized away in my
test case, but that's a separate issue.

Thanks again for fixing.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/50257] [C++0x] unordered_map slow initialization due to huge __prime_list
  2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
                   ` (14 preceding siblings ...)
  2011-09-06 15:57 ` justin at fathomdb dot com
@ 2011-09-06 16:45 ` paolo.carlini at oracle dot com
  15 siblings, 0 replies; 17+ messages in thread
From: paolo.carlini at oracle dot com @ 2011-09-06 16:45 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50257

--- Comment #16 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-06 16:37:55 UTC ---
If you could have available a complete constexpr implementation of lower_bound,
it would make in principle possible to optimize to a constant all the cases
when it's called from a _M_next_bkt in turn called with constant argument, like
the default constructor or the move constructor, or the constructor taking a
size known at compile-time (Note the *in principle*, see c++/50087). All in
all, seems a bit overkilling to me, considering the memory allocation which
immediately follows. Also, the optimizers may well at some point become able to
do a lot of this anyway via interprocedural constant propagation and
versioning, because the information is all there: as soon as something is
constant at compile-time it "just" needs to be thoroughly propagated.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-09-06 16:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-01  0:42 [Bug libstdc++/50257] New: unordered_map slow initialization due to huge __prime_list justin at fathomdb dot com
2011-09-01  9:07 ` [Bug libstdc++/50257] [C++0x] " paolo.carlini at oracle dot com
2011-09-01  9:24 ` paolo.carlini at oracle dot com
2011-09-01  9:27 ` paolo.carlini at oracle dot com
2011-09-01 11:32 ` paolo.carlini at oracle dot com
2011-09-01 17:06 ` justin at fathomdb dot com
2011-09-01 17:33 ` justin at fathomdb dot com
2011-09-01 17:54 ` paolo.carlini at oracle dot com
2011-09-05 15:36 ` paolo.carlini at oracle dot com
2011-09-06  4:04 ` justin at fathomdb dot com
2011-09-06  9:16 ` paolo.carlini at oracle dot com
2011-09-06  9:24 ` jakub at gcc dot gnu.org
2011-09-06  9:40 ` paolo.carlini at oracle dot com
2011-09-06 10:23 ` paolo at gcc dot gnu.org
2011-09-06 10:27 ` paolo.carlini at oracle dot com
2011-09-06 15:57 ` justin at fathomdb dot com
2011-09-06 16:45 ` paolo.carlini at oracle dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).