[Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug libgomp/40852]  New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
@ 2009-07-24 20:15 jaffe at broad dot mit dot edu
  2009-07-24 20:29 ` [Bug libgomp/40852] " rguenth at gcc dot gnu dot org
                   ` (26 more replies)
  0 siblings, 27 replies; 28+ messages in thread
From: jaffe at broad dot mit dot edu @ 2009-07-24 20:15 UTC (permalink / raw)
  To: gcc-bugs

Parallel sorts get ~10 times slower as one increases the vector size from
4*10^9 to 5*10^9, perhaps at exactly 2^32, but this wasn't checked.  The
example below is for a vector of ints but the same phenomenon is observed on a
vector of long longs.

To reproduce (sort_test.cc is below):

0. Adjust 'processors' in sort_test.cc.
1. g++ -O3 -fopenmp sort_test.cc -lgomp
2. ./a.out

output:

58 seconds used in sort [for vector of size 4,000,000,000]
667 seconds used in sort [for vector of size 5,000,000,000]

gcc version information:

crd4% gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.4.1/configure
--with-gmp=/broad/tools/Linux/x86_64/pkgs/gcc_4.4.1
--with-mpfr=/broad/tools/Linux/x86_64/pkgs/gcc_4.4.1
--prefix=/broad/tools/Linux/x86_64/pkgs/gcc_4.4.1
Thread model: posix
gcc version 4.4.1 (GCC) 
We first observed the problem under gcc 4.3.3.

hardware info:

crd4% uname -a
Linux crd4 2.6.16.54-0.2.5-smp #1 SMP Mon Jan 21 13:29:51 UTC 2008 x86_64
x86_64 x86_64 GNU/Linux
This is a 32-processor machine with 256 GB of memory, but I don't think the
problem is 
specific to this architecture.

sort_test.cc:

#include <iostream>
#include <omp.h>
#include <time.h>
#include <vector>
using namespace std;
int main( )
{    for ( long long  m = 4; m <= 5; m++ )
     {    const long long entries = m * (long long) 1000000000;
          const int processors = 32;
          vector<int> x(entries);
          for ( long long i = 0; i < entries; i++ )
               x[i] = (i*i) % 123456789;
          time_t clock1, clock2; time( &clock1 );
          omp_set_num_threads(processors);
          sort( x.begin( ), x.end( ) );
          time( &clock2 );           
          cout << clock2 - clock1 << " seconds used in sort" << endl;    }    }


-- 
           Summary: parallel sort run time increases ~10 fold when vector
                    size gets over ~4*10^9
           Product: gcc
           Version: 4.4.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jaffe at broad dot mit dot edu
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libgomp/40852] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
@ 2009-07-24 20:29 ` rguenth at gcc dot gnu dot org
  2009-07-24 20:44 ` [Bug libstdc++/40852] " jaffe at broadinstitute dot org
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-24 20:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-07-24 20:29 -------
I suppose you are running into cache effects.  Why do you think this is a GCC
bug?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
  2009-07-24 20:29 ` [Bug libgomp/40852] " rguenth at gcc dot gnu dot org
@ 2009-07-24 20:44 ` jaffe at broadinstitute dot org
  2009-07-24 21:15 ` paolo dot carlini at oracle dot com
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: jaffe at broadinstitute dot org @ 2009-07-24 20:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from jaffe at broadinstitute dot org  2009-07-24 20:43 -------
Subject: Re:  parallel sort run time increases ~10 fold
 when vector size gets over ~4*10^9

If instead of sorting a vec<int>, one sorts a vec<long long>, there is still a
ten-fold
slowdown, as one increases the vector size from 4 to 5 billion.  So it's not
the total
amount of memory that matters, but rather the number of entries in the vector. 
I don't
think this is about cache effects.

Best,

David

============================================================================================

rguenth at gcc dot gnu dot org wrote:
> ------- Comment #1 from rguenth at gcc dot gnu dot org  2009-07-24 20:29 -------
> I suppose you are running into cache effects.  Why do you think this is a GCC
> bug?
> 
> 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
  2009-07-24 20:29 ` [Bug libgomp/40852] " rguenth at gcc dot gnu dot org
  2009-07-24 20:44 ` [Bug libstdc++/40852] " jaffe at broadinstitute dot org
@ 2009-07-24 21:15 ` paolo dot carlini at oracle dot com
  2009-07-24 21:20 ` jaffe at broadinstitute dot org
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: paolo dot carlini at oracle dot com @ 2009-07-24 21:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from paolo dot carlini at oracle dot com  2009-07-24 21:15 -------
Out of curiosity, did you try parallel-mode on that machine? Basically, just
add -D_GLIBCXX_PARALLEL, but refer to the documentation of course:

http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html#manual.ext.parallel_mode.intro

I'm also adding Johannes, in CC...

Note, I don't think we have any specific issue with the normal, serial,
std::sort...


-- 

paolo dot carlini at oracle dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |singler at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (2 preceding siblings ...)
  2009-07-24 21:15 ` paolo dot carlini at oracle dot com
@ 2009-07-24 21:20 ` jaffe at broadinstitute dot org
  2009-07-24 21:24 ` [Bug libstdc++/40852] [parallel-mode] " paolo dot carlini at oracle dot com
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: jaffe at broadinstitute dot org @ 2009-07-24 21:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from jaffe at broadinstitute dot org  2009-07-24 21:20 -------
Subject: Re:  parallel sort run time increases ~10 fold
 when vector size gets over ~4*10^9

Oh crap, yes I did, and now I see that I accidentally left off the first three
lines of sort_test.cc.
They are:

#define _GLIBCXX_PARALLEL
#include <algorithm>
#include <iomanip>

David

=======================================================================================================

paolo dot carlini at oracle dot com wrote:
> ------- Comment #3 from paolo dot carlini at oracle dot com  2009-07-24 21:15 -------
> Out of curiosity, did you try parallel-mode on that machine? Basically, just
> add -D_GLIBCXX_PARALLEL, but refer to the documentation of course:
> 
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html#manual.ext.parallel_mode.intro
> 
> I'm also adding Johannes, in CC...
> 
> Note, I don't think we have any specific issue with the normal, serial,
> std::sort...
> 
> 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (3 preceding siblings ...)
  2009-07-24 21:20 ` jaffe at broadinstitute dot org
@ 2009-07-24 21:24 ` paolo dot carlini at oracle dot com
  2009-10-19 18:08 ` jason at gcc dot gnu dot org
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: paolo dot carlini at oracle dot com @ 2009-07-24 21:24 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from paolo dot carlini at oracle dot com  2009-07-24 21:23 -------
So this is issue is just that you are not completely happy with the behavior of
parallel-mode. Ok... Let's see what Johannes thinks.


-- 

paolo dot carlini at oracle dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|parallel sort run time      |[parallel-mode] parallel
                   |increases ~10 fold when     |sort run time increases ~10
                   |vector size gets over       |fold when vector size gets
                   |~4*10^9                     |over ~4*10^9


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (4 preceding siblings ...)
  2009-07-24 21:24 ` [Bug libstdc++/40852] [parallel-mode] " paolo dot carlini at oracle dot com
@ 2009-10-19 18:08 ` jason at gcc dot gnu dot org
  2009-10-20  7:46 ` singler at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: jason at gcc dot gnu dot org @ 2009-10-19 18:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from jason at gcc dot gnu dot org  2009-10-19 18:07 -------
Have you tried selecting a different sort algorithm?  The default seems to be
the multi-way mergesort, but there are two quicksort options as well.


-- 

jason at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jason at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (5 preceding siblings ...)
  2009-10-19 18:08 ` jason at gcc dot gnu dot org
@ 2009-10-20  7:46 ` singler at gcc dot gnu dot org
  2009-10-20 10:55 ` jaffe at broadinstitute dot org
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-20  7:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from singler at gcc dot gnu dot org  2009-10-20 07:46 -------
Sorry, the CC has never reached me.  
So concerning comment #4:  Was the parallel mode actually activated?
The multiway mergesort takes another time the space of the input as
temporarily.  Sure that the program was not swapping?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (6 preceding siblings ...)
  2009-10-20  7:46 ` singler at gcc dot gnu dot org
@ 2009-10-20 10:55 ` jaffe at broadinstitute dot org
  2009-10-22  6:57 ` singler at gcc dot gnu dot org
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: jaffe at broadinstitute dot org @ 2009-10-20 10:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from jaffe at broadinstitute dot org  2009-10-20 10:55 -------
Subject: Re:  [parallel-mode] parallel sort run time
 increases ~10 fold when vector size gets over ~4*10^9

Regarding comment #7, I just ran this now on a machine with 32 processors and
512 GB memory.

(a) Sorting 4 x 10^9 ints took 0.9 minutes.
(b) Sorting 5 x 10^9 ints took 16 minutes.

The second test used about 40 GB, which is a small fraction of the available
memory.

(c) Sorting 2.5 x 10^9 structures having 2 ints each took 1.1 minutes.

Regarding comment #6, repeating (a) and (b) with
__gnu_parallel::balanced_quicksort_tag( ):

(a') 6.3 minutes
(b') 8.1 minutes,

so the algorithm is slower on these data but does not exhibit the same jump in
runtime.
I also tried __gnu_parallel::quicksort_tag( ) which was about the same for (b)
[(a) not tested].


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (7 preceding siblings ...)
  2009-10-20 10:55 ` jaffe at broadinstitute dot org
@ 2009-10-22  6:57 ` singler at gcc dot gnu dot org
  2009-10-22  7:16 ` singler at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-22  6:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from singler at gcc dot gnu dot org  2009-10-22 06:57 -------
I can reproduce the bug on my machine (2 Quadcore Nehalems, 48GB RAM)

4 x 10^9 ints: 65 seconds used in sort
5 x 10^9 ints: 193 seconds used in sort


-- 

singler at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |singler at gcc dot gnu dot
                   |dot org                     |org
             Status|UNCONFIRMED                 |ASSIGNED
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-10-22 06:57:14
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (8 preceding siblings ...)
  2009-10-22  6:57 ` singler at gcc dot gnu dot org
@ 2009-10-22  7:16 ` singler at gcc dot gnu dot org
  2009-10-22  7:17 ` singler at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-22  7:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from singler at gcc dot gnu dot org  2009-10-22 07:15 -------
The problem is in multiseq_selection.h, where this line has an overflow

(static_cast<uint64_t>(__total) * __rank / __N - __leftsize)

if (__total * __rank) exceeds 64 bits.  The quick fix is to use a temporary
double, which solves the original test case:

4 x 10^9 ints: 64 seconds used in sort
5 x 10^9 ints: 80 seconds used in sort

Find patches for branch (4.4) and trunk (4.5) attached.

However, I do not fully trust the double arithmetics yet, although some test
cases work.  Does anybody else know a better way to avoid an overflow in ((a *
b) / c) with only integer arithmetics and normal rounding?

Maybe I can find a way to avoid this calculation altogether.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (10 preceding siblings ...)
  2009-10-22  7:17 ` singler at gcc dot gnu dot org
@ 2009-10-22  7:17 ` singler at gcc dot gnu dot org
  2009-10-22  7:42 ` singler at gcc dot gnu dot org
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-22  7:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from singler at gcc dot gnu dot org  2009-10-22 07:16 -------
Created an attachment (id=18862)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18862&action=view)
Patch replacing uint64_t by double to avoid overflow, for trunk.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (9 preceding siblings ...)
  2009-10-22  7:16 ` singler at gcc dot gnu dot org
@ 2009-10-22  7:17 ` singler at gcc dot gnu dot org
  2009-10-22  7:17 ` singler at gcc dot gnu dot org
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-22  7:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from singler at gcc dot gnu dot org  2009-10-22 07:17 -------
Created an attachment (id=18863)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18863&action=view)
Patch replacing uint64_t by double to avoid overflow, for branch 4.4.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (11 preceding siblings ...)
  2009-10-22  7:17 ` singler at gcc dot gnu dot org
@ 2009-10-22  7:42 ` singler at gcc dot gnu dot org
  2009-10-22  9:01 ` pluto at agmk dot net
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-22  7:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from singler at gcc dot gnu dot org  2009-10-22 07:42 -------
(In reply to comment #10)

> However, I do not fully trust the double arithmetics yet, although some test
> cases work.

Er, this sounded a bit pessimistic, all sort tests I have tried so far work
with the patch.

And some more explanation:
The overflow resulted in erratic and thus very load balancing in the merge
step, causing the huge running times.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (12 preceding siblings ...)
  2009-10-22  7:42 ` singler at gcc dot gnu dot org
@ 2009-10-22  9:01 ` pluto at agmk dot net
  2009-10-22 10:23 ` jaffe at broadinstitute dot org
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pluto at agmk dot net @ 2009-10-22  9:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from pluto at agmk dot net  2009-10-22 09:01 -------
(In reply to comment #10)

> However, I do not fully trust the double arithmetics yet, although some test
> cases work.  Does anybody else know a better way to avoid an overflow in ((a *
> b) / c) with only integer arithmetics and normal rounding?

you can use a 128-bit integer type on x86-64.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (13 preceding siblings ...)
  2009-10-22  9:01 ` pluto at agmk dot net
@ 2009-10-22 10:23 ` jaffe at broadinstitute dot org
  2009-10-22 16:41 ` singler at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: jaffe at broadinstitute dot org @ 2009-10-22 10:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from jaffe at broadinstitute dot org  2009-10-22 10:22 -------
Subject: Re:  [parallel-mode] parallel sort run time
 increases ~10 fold when vector size gets over ~4*10^9

Wonderful!  Thank you very much for fixing this problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (14 preceding siblings ...)
  2009-10-22 10:23 ` jaffe at broadinstitute dot org
@ 2009-10-22 16:41 ` singler at gcc dot gnu dot org
  2009-10-22 17:46 ` paolo dot carlini at oracle dot com
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-22 16:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from singler at gcc dot gnu dot org  2009-10-22 16:41 -------
(In reply to comment #14)
> (In reply to comment #10)
> 
> > However, I do not fully trust the double arithmetics yet, although some test
> > cases work.  Does anybody else know a better way to avoid an overflow in ((a *
> > b) / c) with only integer arithmetics and normal rounding?
> 
> you can use a 128-bit integer type on x86-64.

Very good idea.
Do you know a good #ifdef clause to check its availability.  Is it really just
x64-64?
Also, I probably want to use it only when really needed, because I assume it to
be implemented in software, in particular the division.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (15 preceding siblings ...)
  2009-10-22 16:41 ` singler at gcc dot gnu dot org
@ 2009-10-22 17:46 ` paolo dot carlini at oracle dot com
  2009-10-23 10:00 ` singler at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: paolo dot carlini at oracle dot com @ 2009-10-22 17:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from paolo dot carlini at oracle dot com  2009-10-22 17:46 -------
Is something known about the actual size of a, b, and c? Also, I don't know
which is the required precision for the result: must be exact if representable?
I suppose not, otherwise the suggestiong of using double would not make sense.
Depending on the answer to the above, there are various options, maybe checking
for a * b overflowing (if the quantities are all positive, then checking for
wraparound is easy) and then taking the appropriate actions.

Anyway, barring more sophisticated solutions, using long double seems a better
idea to me, because on most widespread targets a long double is at least 80
bits, with a mantissa of at least 64 bits, thus able to exactly represent any
long long integer.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (16 preceding siblings ...)
  2009-10-22 17:46 ` paolo dot carlini at oracle dot com
@ 2009-10-23 10:00 ` singler at gcc dot gnu dot org
  2009-10-23 10:01 ` singler at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-23 10:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from singler at gcc dot gnu dot org  2009-10-23 10:00 -------
(In reply to comment #17)
> Is something known about the actual size of a, b, and c? 

They can be as large as the input size.

> Also, I don't know which is the required precision for the result: must be 
> exact if representable?

In the last iteration, __n == 0 => __total == __N, and then, the result must
absolutely be __rank, according to the specification.

Anyway, I think I have found a solution that is easier, faster, and avoids the
large intermediate altogether (see attached patch).  It also fixes similar
problems in two other locations.  However, this patch needs further thorough
testing.

Also, __n == 2 ^ __r - 1, so __n + 1 == 2 ^ __r, and the divisions could be
replaced by shifts.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (17 preceding siblings ...)
  2009-10-23 10:00 ` singler at gcc dot gnu dot org
@ 2009-10-23 10:01 ` singler at gcc dot gnu dot org
  2009-10-23 16:00 ` paolo dot carlini at oracle dot com
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-23 10:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from singler at gcc dot gnu dot org  2009-10-23 10:01 -------
Created an attachment (id=18878)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18878&action=view)
Patch avoid large intermediates to avoid overflow, for trunk. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (18 preceding siblings ...)
  2009-10-23 10:01 ` singler at gcc dot gnu dot org
@ 2009-10-23 16:00 ` paolo dot carlini at oracle dot com
  2009-10-27  9:45 ` jaffe at broadinstitute dot org
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: paolo dot carlini at oracle dot com @ 2009-10-23 16:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from paolo dot carlini at oracle dot com  2009-10-23 16:00 -------
Excellent. Let's wait a bit for feedback from people experiencing this issue
and then commit the patch, first mainline and then probably 4_4-branch too.
Make sure to also regression test the fix on a "normal" ;) machine...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (19 preceding siblings ...)
  2009-10-23 16:00 ` paolo dot carlini at oracle dot com
@ 2009-10-27  9:45 ` jaffe at broadinstitute dot org
  2009-10-27 15:54 ` paolo dot carlini at oracle dot com
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: jaffe at broadinstitute dot org @ 2009-10-27  9:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from jaffe at broadinstitute dot org  2009-10-27 09:45 -------
Subject: Re:  [parallel-mode] parallel sort run time
 increases ~10 fold when vector size gets over ~4*10^9

I tested the patch from comment #19, sorting X billion integers on a machine
having
32 processors and 256 GB memory, X = 4, 6, ..., 26.  The overall behavior is
very
close to linear.  For example, X = 4 took 1.02 minutes, whereas X = 20 took
5.22
minutes.  Very nice!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (20 preceding siblings ...)
  2009-10-27  9:45 ` jaffe at broadinstitute dot org
@ 2009-10-27 15:54 ` paolo dot carlini at oracle dot com
  2009-10-28 10:04 ` singler at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: paolo dot carlini at oracle dot com @ 2009-10-27 15:54 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from paolo dot carlini at oracle dot com  2009-10-27 15:53 -------
Patch regtests fine on x86_64-linux. Johannes, can you prepare a ChangeLog
entry, post and commit both? Thanks!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (21 preceding siblings ...)
  2009-10-27 15:54 ` paolo dot carlini at oracle dot com
@ 2009-10-28 10:04 ` singler at gcc dot gnu dot org
  2009-10-28 10:05 ` singler at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-28 10:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from singler at gcc dot gnu dot org  2009-10-28 10:04 -------
Subject: Bug 40852

Author: singler
Date: Wed Oct 28 10:04:03 2009
New Revision: 153648

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=153648
Log:
2009-10-28  Johannes Singler  <singler@kit.edu>

        PR libstdc++/40852
        * include/parallel/multiseq_selection.h
        (multiseq_partition, multiseq_selection):  Avoid intermediate
        values exceeding the integer type range for very large inputs.


Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/parallel/multiseq_selection.h


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (22 preceding siblings ...)
  2009-10-28 10:04 ` singler at gcc dot gnu dot org
@ 2009-10-28 10:05 ` singler at gcc dot gnu dot org
  2009-10-28 10:11 ` singler at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-28 10:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from singler at gcc dot gnu dot org  2009-10-28 10:04 -------
Subject: Bug 40852

Author: singler
Date: Wed Oct 28 10:04:35 2009
New Revision: 153649

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=153649
Log:
2009-10-28  Johannes Singler  <singler@kit.edu>

        PR libstdc++/40852
        * include/parallel/multiseq_selection.h
        (multiseq_partition, multiseq_selection):  Avoid intermediate
        values exceeding the integer type range for very large inputs.


Modified:
    branches/gcc-4_4-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_4-branch/libstdc++-v3/include/parallel/multiseq_selection.h


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (23 preceding siblings ...)
  2009-10-28 10:05 ` singler at gcc dot gnu dot org
@ 2009-10-28 10:11 ` singler at gcc dot gnu dot org
  2009-10-28 10:44 ` paolo dot carlini at oracle dot com
  2009-10-29 16:53 ` law at gcc dot gnu dot org
  26 siblings, 0 replies; 28+ messages in thread
From: singler at gcc dot gnu dot org @ 2009-10-28 10:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from singler at gcc dot gnu dot org  2009-10-28 10:11 -------
Closing this bug.


-- 

singler at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (24 preceding siblings ...)
  2009-10-28 10:11 ` singler at gcc dot gnu dot org
@ 2009-10-28 10:44 ` paolo dot carlini at oracle dot com
  2009-10-29 16:53 ` law at gcc dot gnu dot org
  26 siblings, 0 replies; 28+ messages in thread
From: paolo dot carlini at oracle dot com @ 2009-10-28 10:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #26 from paolo dot carlini at oracle dot com  2009-10-28 10:44 -------
Fixed for 4.4.3 and mainline.


-- 

paolo dot carlini at oracle dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.4.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
  2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
                   ` (25 preceding siblings ...)
  2009-10-28 10:44 ` paolo dot carlini at oracle dot com
@ 2009-10-29 16:53 ` law at gcc dot gnu dot org
  26 siblings, 0 replies; 28+ messages in thread
From: law at gcc dot gnu dot org @ 2009-10-29 16:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #27 from law at gcc dot gnu dot org  2009-10-29 16:49 -------
Subject: Bug 40852

Author: law
Date: Thu Oct 29 16:48:00 2009
New Revision: 153715

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=153715
Log:
Recorded merge of revisions
153580-153581,153584,153586-153600,153604,153606,153610,153613,153615-153618,153621,153643,153646-153648,153650-153652,153654-153667,153669-153671
via svnmerge from 
svn+ssh://law@gcc.gnu.org/svn/gcc/trunk

........
  r153580 | gccadmin | 2009-10-26 18:17:26 -0600 (Mon, 26 Oct 2009) | 1 line

  Daily bump.
........
  r153581 | paolo | 2009-10-26 19:18:10 -0600 (Mon, 26 Oct 2009) | 6 lines

  2009-10-26  Paolo Carlini  <paolo.carlini@oracle.com>

        * include/std/chrono (duration<>::duration(const duration<>&)): Fix
        per the straightforward resolution of DR 974.
        * testsuite/20_util/duration/cons/dr974.cc: Add.
........
  r153584 | carrot | 2009-10-27 03:06:36 -0600 (Tue, 27 Oct 2009) | 16 lines

        * target.h (have_conditional_execution): Add a new target hook
function.
        * target-def.h (TARGET_HAVE_CONDITIONAL_EXECUTION): Likewise.
        * targhooks.h (default_have_conditional_execution): Likewise.
        * targhooks.c (default_have_conditional_execution): Likewise.
        * doc/tm.texi (TARGET_HAVE_CONDITIONAL_EXECUTION): Document it.
        * config/arm/arm.c (TARGET_HAVE_CONDITIONAL_EXECUTION): Define it.
        (arm_have_conditional_execution): New function.
        * ifcvt.c (noce_process_if_block, find_if_header,
        cond_exec_find_if_block, dead_or_predicable): Change the usage of macro
        HAVE_conditional_execution to a target hook call.
        * recog.c (peephole2_optimize): Likewise.
        * sched-rgn.c (add_branch_dependences): Likewise.
        * final.c (asm_insn_count, final_scan_insn): Likewise.
        * bb-reorder.c (HAVE_conditional_execution): Remove it.
........
  r153586 | ebotcazou | 2009-10-27 04:09:04 -0600 (Tue, 27 Oct 2009) | 1 line

  Fix nits
........
  r153587 | jakub | 2009-10-27 04:28:48 -0600 (Tue, 27 Oct 2009) | 3 lines

        PR c++/41020
        * g++.dg/lookup/extern-c-redecl5.C: Fix up regexp.
........
  r153588 | aldyh | 2009-10-27 05:18:12 -0600 (Tue, 27 Oct 2009) | 5 lines

        PR bootstrap/41451
        * fold-const.c (fold_binary_loc): Do not call
        protected_set_expr_location.
........
  r153589 | rguenth | 2009-10-27 05:30:59 -0600 (Tue, 27 Oct 2009) | 5 lines

  2009-10-27  Richard Guenther  <rguenther@suse.de>

        PR lto/41821
        * gimple.c (gimple_types_compatible_p): Handle OFFSET_TYPE.
........
  r153590 | revitale | 2009-10-27 05:46:07 -0600 (Tue, 27 Oct 2009) | 1 line

  Fix PR40648 -- Fix misaligned store vectorizer patch
........
  r153591 | charlet | 2009-10-27 07:06:06 -0600 (Tue, 27 Oct 2009) | 16 lines

  2009-10-27  Arnaud Charlet  <charlet@adacore.com>

        * exp_aggr.adb: Fix comment.

  2009-10-27  Emmanuel Briot  <briot@adacore.com>

        * prj-err.adb (Error_Msg): take into account continuation lines when
        computing whether we have a warning.

  2009-10-27  Vasiliy Fofanov  <fofanov@adacore.com>

        * make.adb, s-os_lib.adb, s-os_lib.ads (Create_Temp_Output_File): New
        routine that is designed to create temp file descriptor specifically
        for redirecting an output stream.
........
  r153592 | charlet | 2009-10-27 07:16:48 -0600 (Tue, 27 Oct 2009) | 45 lines

  2009-10-27  Vincent Celier  <celier@adacore.com>

        * makeutl.adb (Check_Source_Info_In_ALI): Do not recompile if a subunit
        from the runtime is found, except if gnatmake switch -a is used and
this
        subunit cannot be found.

  2009-10-27  Ed Schonberg  <schonberg@adacore.com>

        * gnatbind.adb (gnatbind): When the -R option is selected, list
subunits
        as well, for tools that need the complete closure of the main program.

  2009-10-27  Sergey Rybin  <rybin@adacore.com>

        * gnat_ugn.texi: Minor updates.

  2009-10-27  Emmanuel Briot  <briot@adacore.com>

        * prj-tree.adb (Free): Fix memory leak.

  2009-10-27  Vasiliy Fofanov  <fofanov@adacore.com>

        * adaint.c, s-os_lib.adb (__gnat_create_output_file_new): New function
        that ensures the file that is created is new. Use this function to make
        sure there is no race condition if several processes are creating temp
        files concurrently.

        * s-os_lib.ads: Update comment.

  2009-10-27  Thomas Quinot  <quinot@adacore.com>

        * sem_ch12.adb: Minor reformatting

  2009-10-27  Javier Miranda  <miranda@adacore.com>

        * exp_ch4.ads (Integer_Promotion_Possible): New subprogram.
        * exp_ch4.adb (Integer_Promotion_Possible): New subprogram.
        (Expand_N_Type_Conversion): Replace code that checks if the integer
        promotion of the operands is possible by a call to the new function
        Integer_Promotion_Possible. Minor reformating because an enclosing
        block is now not needed.
        * checks.adb (Apply_Arithmetic_Overflow_Check): Add missing check to
        see if the integer promotion is possible; in such case the runtime
        checks are not generated.
........
  r153593 | charlet | 2009-10-27 07:22:25 -0600 (Tue, 27 Oct 2009) | 17 lines

  2009-10-27  Thomas Quinot  <quinot@adacore.com>

        * sem_ch12.adb (Install_Formal_Packages): Do not omit installation of
        visible entities when the formal package doesn't have a box.

        * checks.adb: Minor reformatting.

  2009-10-27  Vincent Celier  <celier@adacore.com>

        * prj-part.adb (Parse): Catch exception Types.Unrecoverable_Error and
        set Project to Empty_Node.

  2009-10-27  Robert Dewar  <dewar@adacore.com>

        * gnatbind.adb: Minor reformatting
........
  r153594 | charlet | 2009-10-27 07:51:46 -0600 (Tue, 27 Oct 2009) | 18 lines

  2009-10-27  Robert Dewar  <dewar@adacore.com>

        * s-os_lib.ads, s-os_lib.adb, prj-err.adb, makeutl.adb: Minor
        reformatting.

  2009-10-27  Ed Schonberg  <schonberg@adacore.com>

        * sem.util.ads, sem_util.adb (Denotes_Same_Object,
        Denotes_Same_Prefix): New functions to detect overlap between actuals
        that are not by-copy in a call, when one of them is in-out.
        * sem_warn.ads, sem_warn.adb (Warn_On_Overlapping_Actuals): New
        procedure,  called on a subprogram call to warn when an in-out actual
        that is not by-copy overlaps with another actual, thus leadind to
        potentially dangerous aliasing in the body of the called subprogram.
        Currently the warning is under control of the -gnatX switch.
        * sem_res.adb (resolve_call): call Warn_On_Overlapping_Actuals.
........
  r153595 | charlet | 2009-10-27 08:02:58 -0600 (Tue, 27 Oct 2009) | 6 lines

  2009-10-27  Robert Dewar  <dewar@adacore.com>

        * sem_warn.adb, sem_util.adb, sem_util.ads: Minor reformatting. Add
        comments.
........
  r153596 | charlet | 2009-10-27 08:07:19 -0600 (Tue, 27 Oct 2009) | 2 lines

  Minor doc updates.
........
  r153597 | charlet | 2009-10-27 08:14:44 -0600 (Tue, 27 Oct 2009) | 6 lines

  2009-10-27  Robert Dewar  <dewar@adacore.com>

        * s-fileio.adb, s-fileio.ads, sem_util.adb, sem_warn.adb,
        sem_warn.ads: Minor reformatting
........
  r153598 | rguenth | 2009-10-27 09:16:35 -0600 (Tue, 27 Oct 2009) | 5 lines

  2009-10-27  Richard Guenther  <rguenther@suse.de>

        * tree-complex.c (expand_complex_div_wide): Check for
        INTEGER_CST, not TREE_CONSTANT on comparison folding result.
........
  r153599 | jakub | 2009-10-27 09:50:50 -0600 (Tue, 27 Oct 2009) | 6 lines

        PR c/41842
        * c-typeck.c (convert_arguments): Return -1 if any of the arguments is
        error_mark_node.

        * gcc.dg/pr41842.c: New test.
........
  r153600 | rguenth | 2009-10-27 09:52:44 -0600 (Tue, 27 Oct 2009) | 14 lines

  2009-10-27  Richard Guenther  <rguenther@suse.de>

        * tree-ssa-structalias.c (find_func_aliases): In IPA mode
        handle calls to externally visible functions like in regular mode.
        (create_variable_info_for): Do not create function infos here.
        (have_alias_info): Remove write-only variable.
        (solve_constraints): New function split out from common code
        in compute_points_to_sets and ipa_pta_execute.
        (compute_points_to_sets): Adjust.
        (ipa_pta_execute): Likewise.  Handle clones and externally visible
        functions like in non-IPA mode.

        * gcc.dg/torture/ipa-pta-1.c: Adjust testcase.
........
  r153604 | uros | 2009-10-27 11:03:47 -0600 (Tue, 27 Oct 2009) | 3 lines

        * ChangeLog: Fix formatting.
        * testsuite/ChangeLog: Ditto.
........
  r153606 | ktietz | 2009-10-27 11:14:47 -0600 (Tue, 27 Oct 2009) | 11 lines

  2009-10-27  Kai Tietz <kai.tietz@onevision.com>

          PR/41799
          * config/i386/mingw32.h (CHECK_EXECUTE_STACK_ENABLED): New macro.
          * config/i386/mingw.opt: Add fset-stack-executable.
          * config/i386/i386.c (ix86_trampoline_init): Make call to
          emit_library_call conditional, if CHECK_EXECUTE_STACK_ENABLED is
          defined and its value is not zero.
          * doc/invoke.texi
........
  r153610 | espindola | 2009-10-27 12:17:13 -0600 (Tue, 27 Oct 2009) | 7 lines

  2009-10-27  Dmitry Gorbachev  <d.g.gorbachev@gmail.com>

        PR lto/41652
        * configure.ac: Call AC_SYS_LARGEFILE before AC_OUTPUT.
        * configure: Regenerate.
........
  r153613 | ebotcazou | 2009-10-27 13:41:13 -0600 (Tue, 27 Oct 2009) | 4 lines

        * raise-gcc (db_region_for): Use _Unwind_GetIPInfo instead of
        _Unwind_GetIP if HAVE_GETIPINFO is defined.
        (db_action_for): Likewise.
........
  r153615 | rth | 2009-10-27 14:09:07 -0600 (Tue, 27 Oct 2009) | 7 lines

          PR c++/41819
          * tree-eh.c (eh_region_may_contain_throw_map): Rename from
          eh_region_may_contain_throw; update users.
          (eh_region_may_contain_throw): New function.
          (lower_catch): Check flag_exceptions before creating exception
region.
          (lower_eh_filter, lower_eh_must_not_throw): Likewise.
          (lower_cleanup): Tidy existing flag_exceptions check to match.
........
  r153616 | ebotcazou | 2009-10-27 14:24:31 -0600 (Tue, 27 Oct 2009) | 3 lines

        * gcc-interface/decl.c (purpose_member_field): New static function.
        (annotate_rep): Use it instead of purpose_member.
........
  r153617 | jason | 2009-10-27 15:58:09 -0600 (Tue, 27 Oct 2009) | 10 lines

        Allow no-capture lambdas to convert to function pointer.
        * semantics.c (maybe_add_lambda_conv_op): New.
        * parser.c (cp_parser_lambda_expression): Call it.
        (cp_parser_lambda_declarator_opt): Make op() static if
        no captures.
        * mangle.c (write_closure_type_name): Adjust.
        * semantics.c (finish_this_expr): Adjust.
        * decl.c (grok_op_properties): Allow it.
        * call.c (build_user_type_conversion_1): Handle static conversion op.
        (build_op_call): And op().
........
  r153618 | rth | 2009-10-27 17:25:54 -0600 (Tue, 27 Oct 2009) | 1 line

          * cgraphunit.c (cgraph_optimize): Maintain timevar stack properly.
........
  r153621 | gccadmin | 2009-10-27 18:16:59 -0600 (Tue, 27 Oct 2009) | 1 line

  Daily bump.
........
  r153643 | kkojima | 2009-10-27 22:22:21 -0600 (Tue, 27 Oct 2009) | 4 lines

        * config/sh/sh.md (stuff_delay_slot): Move const_int pattern
        inside the unspec vector.
........
  r153646 | bonzini | 2009-10-28 03:49:58 -0600 (Wed, 28 Oct 2009) | 6 lines

  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>

        * config/sh/sh.md (cbranchfp4_media): Remove hack extending
        cstore result to DImode.
........
  r153647 | bonzini | 2009-10-28 03:54:01 -0600 (Wed, 28 Oct 2009) | 6 lines

  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>

        * expmed.c (emit_store_flag): Check costs before
        transforming to the opposite representation.
........
  r153648 | singler | 2009-10-28 04:04:03 -0600 (Wed, 28 Oct 2009) | 8 lines

  2009-10-28  Johannes Singler  <singler@kit.edu>

          PR libstdc++/40852
          * include/parallel/multiseq_selection.h
          (multiseq_partition, multiseq_selection):  Avoid intermediate
          values exceeding the integer type range for very large inputs.
........
  r153650 | bonzini | 2009-10-28 04:17:29 -0600 (Wed, 28 Oct 2009) | 15 lines

  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>

        PR rtl-optimization/40741
        * config/arm/arm.c (thumb1_rtx_costs): IOR or XOR with
        a small constant is cheap.
        * config/arm/arm.md (andsi3, iorsi3): Try to place the result of
        force_reg on the LHS.
        (xorsi3): Likewise, and split the XOR if the constant is complex
        and not in Thumb mode.

  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>

        PR rtl-optimization/40741
        * gcc.target/arm/thumb-branch1.c: New.
........
  r153651 | bonzini | 2009-10-28 04:27:15 -0600 (Wed, 28 Oct 2009) | 13 lines

  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>

        PR rtl-optimization/39715
        * combine.c (simplify_comparison): Use extensions to
        widen comparisons.  Try an ANDing first.

  testsuite:
  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>

        PR rtl-optimization/39715
        * gcc.target/arm/thumb-bitfld1.c: New.
........
  r153652 | bonzini | 2009-10-28 06:37:30 -0600 (Wed, 28 Oct 2009) | 13 lines

  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>

        PR rtl-optimization/41812

        Revert:
        2009-06-27  Paolo Bonzini  <bonzini@gnu.org>

        * df-problems.c (df_md_scratch): New.
        (df_md_alloc, df_md_free): Allocate/free it.
        (df_md_local_compute): Only include live registers in init.
        (df_md_transfer_function): Prune the in-set computed by
        the confluence function, and the gen-set too.
........
  r153654 | paolo | 2009-10-28 07:07:00 -0600 (Wed, 28 Oct 2009) | 6 lines

  2009-10-28  Paolo Carlini  <paolo.carlini@oracle.com>

        * include/bits/stl_iterator_base_funcs.h: (next): Change
        template parameter name consistently with the resolution
        of DR 1011 ([Ready] in Santa Cruz).
........
  r153655 | rguenth | 2009-10-28 07:28:32 -0600 (Wed, 28 Oct 2009) | 14 lines

  2009-10-28  Richard Guenther  <rguenther@suse.de>

        PR middle-end/41855
        * tree-ssa-alias.c (refs_may_alias_p_1): Deal with CONST_DECLs
        (ref_maybe_used_by_call_p_1): Fix bcopy handling.
        (call_may_clobber_ref_p_1): Likewise.
        * tree-ssa-structalias.c (find_func_aliases): Likewise.
        * alias.c (nonoverlapping_memrefs_p): Deal with CONST_DECLs.

        * gfortran.dg/lto/20091028-1_0.f90: New testcase.
        * gfortran.dg/lto/20091028-1_1.c: Likewise.
        * gfortran.dg/lto/20091028-2_0.f90: Likewise.
        * gfortran.dg/lto/20091028-2_1.c: Likewise.
........
  r153656 | charlet | 2009-10-28 07:31:51 -0600 (Wed, 28 Oct 2009) | 25 lines

  2009-10-28  Robert Dewar  <dewar@adacore.com>

        * a-ztexio.adb, a-ztexio.ads, a-witeio.ads, a-witeio.adb,
        a-textio.ads, a-textio.adb: Reorganize (moving specs from private part
        to body).
        (Initialize_Standard_Files): New procedure.
        * a-tienau.adb: Minor change to make EOF directly visible
        * a-tirsfi.ads, a-wrstfi.adb, a-wrstfi.ads, a-zrstfi.adb,
        a-zrstfi.ads, a-tirsfi.adb: New unit, initial version.
        * gnat_rm.texi: Add documentation for
        Ada.[Wide_[Wide_]]Text_IO.Reset_Standard_Files.
        * Makefile.rtl: Add entries for
        Ada.[Wide_[Wide_]]Text_IO.Reset_Standard_Files

  2009-10-28  Thomas Quinot  <quinot@adacore.com>

        * exp_ch9.ads: Minor reformatting
        * sem_ch3.adb: Minor reformatting
        * sem_aggr.adb: Minor reformatting.
        * sem_attr.adb: Minor reformatting
        * tbuild.adb, tbuild.ads, par-ch4.adb, exp_ch4.adb
(Tbuild.New_Op_Node):
        New subprogram.
        Minor code reorganization/factoring.
........
  r153657 | charlet | 2009-10-28 07:41:05 -0600 (Wed, 28 Oct 2009) | 29 lines

  2009-10-28  Thomas Quinot  <quinot@adacore.com>

        * exp_ch4.adb (Expand_N_Type_Conversion): Perform Integer promotion for
        the operand of the unary minus and ABS operators.

        * sem_type.adb (Covers): A concurrent type and its corresponding record
        type are compatible.
        * exp_attr.adb (Expand_N_Attribute_Reference): Do not rewrite a 'Access
        attribute reference for the current instance of a protected type while
        analyzing an access discriminant constraint in a component definition.
        Such a reference is handled in the corresponding record's init proc,
        while initializing the constrained component.
        * exp_ch9.adb (Expand_N_Protected_Type_Declaration): When creating the
        corresponding record type, propagate components'
        Has_Per_Object_Constraint flag.
        * exp_ch3.adb (Build_Init_Procedure.Build_Init_Statements):
        For a concurrent type, set up concurrent aspects before initializing
        components with a per object constrain, because they may be controlled,
        and their initialization may call entries or protected subprograms of
        the enclosing concurrent object.

  2009-10-28  Emmanuel Briot  <briot@adacore.com>

        * prj-nmsc.adb (Add_If_Not_In_List): New subprogram, for better sharing
        of code.
        (Find_Source_Dirs): resolve links if Opt.Follow_Links_For_Dirs when
        processing the directories specified explicitly in the project file.
........
  r153658 | charlet | 2009-10-28 07:50:10 -0600 (Wed, 28 Oct 2009) | 10 lines

  2009-10-28  Robert Dewar  <dewar@adacore.com>

        * exp_attr.adb, exp_ch9.adb, prj-nmsc.adb, tbuild.adb, ali.adb,
        types.ads: Minor reformatting

  2009-10-28  Tristan Gingold  <gingold@adacore.com>

        * init.c: Fix __gnat_error_handler for Darwin10 (Snow Leopard)
........
  r153659 | rguenth | 2009-10-28 07:52:20 -0600 (Wed, 28 Oct 2009) | 11 lines

  2009-10-28  Richard Guenther  <rguenther@suse.de>

        * tree.c (free_lang_data_in_type): Do not call get_alias_set.
        (free_lang_data): Unconditionally compute alias sets for all
        standard integer types.  Bail out if gate bailed out previously.
        Do not reset the types_compatible_p langhook.
        (gate_free_lang_data): Remove.
        (struct pass_ipa_free_lang_data): Enable unconditionally.
        * gimple.c (gimple_get_alias_set): Use the same alias-set for
        all pointer types.
........
  r153660 | charlet | 2009-10-28 08:07:16 -0600 (Wed, 28 Oct 2009) | 2 lines

        * gcc-interface/Make-lang.in: Update dependencies.
........
  r153661 | charlet | 2009-10-28 08:09:12 -0600 (Wed, 28 Oct 2009) | 22 lines

  2009-10-28  Vincent Celier  <celier@adacore.com>

        * prj-nmsc.adb (Add_To_Or_Remove_From_List): New name of procedure
        Add_If_Not_In_List to account to the fact that a directory may be
        removed from the list. Only remove directory if Removed is True.

  2009-10-28  Gary Dismukes  <dismukes@adacore.com>

        * a-textio.ads, a-textio.ads: Put back function EOF_Char in private
        part. Put back body of function EOF_Char.
        * a-tienau.adb: Remove with of Interfaces.C_Streams and change EOF back
        to EOF_Char.

  2009-10-28  Emmanuel Briot  <briot@adacore.com>

        * prj-tree.adb (Free): Fix memory leak.

  2009-10-28  Thomas Quinot  <quinot@adacore.com>

        * s-fileio.adb: Minor reformatting
........
  r153662 | charlet | 2009-10-28 08:14:05 -0600 (Wed, 28 Oct 2009) | 9 lines

  2009-10-28  Thomas Quinot  <quinot@adacore.com>

        * s-crtl.ads (System.CRTL.strerror): New function.

  2009-10-28  Ed Schonberg  <schonberg@adacore.com>

        * sem_type.adb: Add guard to recover some type errors.
........
  r153663 | charlet | 2009-10-28 08:22:09 -0600 (Wed, 28 Oct 2009) | 12 lines

  2009-10-28  Bob Duff  <duff@adacore.com>

        * s-fileio.adb: Give more information in exception messages.

  2009-10-28  Robert Dewar  <dewar@adacore.com>

        * gnat_ugn.texi: Document new -gnatyt requirement for space after right
        paren if next token starts with digit or letter.
        * styleg.adb (Check_Right_Paren): New rule for space after if next
        character is a letter or digit.
........
  r153664 | rguenth | 2009-10-28 08:33:17 -0600 (Wed, 28 Oct 2009) | 4 lines

  2009-10-28  Richard Guenther  <rguenther@suse.de>

          * gimple.c (gimple_get_alias_set): Fix comment typo.
........
  r153665 | jakub | 2009-10-28 08:36:28 -0600 (Wed, 28 Oct 2009) | 3 lines

        * var-tracking.c (emit_note_insn_var_location): Get the mode of
        a variable part from its REG, MEM or VALUE.
........
  r153666 | jakub | 2009-10-28 08:37:24 -0600 (Wed, 28 Oct 2009) | 4 lines

        * var-tracking.c (emit_note_insn_var_location): Don't call the second
        vt_expand_loc unnecessarily when location is not a register nor
        memory.
........
  r153667 | jakub | 2009-10-28 08:39:06 -0600 (Wed, 28 Oct 2009) | 6 lines

        PR target/41762
        * config/i386/i386.c (ix86_pic_register_p): Don't call
        rtx_equal_for_cselib_p for VALUEs discarded as useless.

        * gcc.dg/pr41762.c: New test.
........
  r153669 | jakub | 2009-10-28 08:43:04 -0600 (Wed, 28 Oct 2009) | 6 lines

        PR debug/41801
        * builtins.c (get_builtin_sync_mem): Expand loc in ptr_mode,
        call convert_memory_address on addr.

        * g++.dg/ext/sync-3.C: New test.
........
  r153670 | jakub | 2009-10-28 08:45:03 -0600 (Wed, 28 Oct 2009) | 6 lines

        PR middle-end/41837
        * ipa-struct-reorg.c (find_field_in_struct_1): Return NULL if
        fields don't have DECL_NAME.

        * gcc.dg/pr41837.c: New test.
........
  r153671 | rguenth | 2009-10-28 08:48:34 -0600 (Wed, 28 Oct 2009) | 15 lines

  2009-10-28  Richard Guenther  <rguenther@suse.de>

        PR lto/41808
        PR lto/41839
        * tree-ssa.c (useless_type_conversion_p): Do not treat
        conversions to pointers to incomplete types as useless.
        * gimple.c (gimple_types_compatible_p): Compare struct tags,
        not typedef names.

        * gcc.dg/lto/20091027-1_0.c: New testcase.
        * gcc.dg/lto/20091027-1_1.c: Likewise.
        * g++.dg/lto/20091026-1_0.C: Likewise.
        * g++.dg/lto/20091026-1_1.C: Likewise.
        * g++.dg/lto/20091026-1_a.h: Likewise.
........

Modified:
    branches/reload-v2a/   (props changed)

Propchange: branches/reload-v2a/
            ('svnmerge-integrated' modified)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2009-10-29 16:52 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-24 20:15 [Bug libgomp/40852] New: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 jaffe at broad dot mit dot edu
2009-07-24 20:29 ` [Bug libgomp/40852] " rguenth at gcc dot gnu dot org
2009-07-24 20:44 ` [Bug libstdc++/40852] " jaffe at broadinstitute dot org
2009-07-24 21:15 ` paolo dot carlini at oracle dot com
2009-07-24 21:20 ` jaffe at broadinstitute dot org
2009-07-24 21:24 ` [Bug libstdc++/40852] [parallel-mode] " paolo dot carlini at oracle dot com
2009-10-19 18:08 ` jason at gcc dot gnu dot org
2009-10-20  7:46 ` singler at gcc dot gnu dot org
2009-10-20 10:55 ` jaffe at broadinstitute dot org
2009-10-22  6:57 ` singler at gcc dot gnu dot org
2009-10-22  7:16 ` singler at gcc dot gnu dot org
2009-10-22  7:17 ` singler at gcc dot gnu dot org
2009-10-22  7:17 ` singler at gcc dot gnu dot org
2009-10-22  7:42 ` singler at gcc dot gnu dot org
2009-10-22  9:01 ` pluto at agmk dot net
2009-10-22 10:23 ` jaffe at broadinstitute dot org
2009-10-22 16:41 ` singler at gcc dot gnu dot org
2009-10-22 17:46 ` paolo dot carlini at oracle dot com
2009-10-23 10:00 ` singler at gcc dot gnu dot org
2009-10-23 10:01 ` singler at gcc dot gnu dot org
2009-10-23 16:00 ` paolo dot carlini at oracle dot com
2009-10-27  9:45 ` jaffe at broadinstitute dot org
2009-10-27 15:54 ` paolo dot carlini at oracle dot com
2009-10-28 10:04 ` singler at gcc dot gnu dot org
2009-10-28 10:05 ` singler at gcc dot gnu dot org
2009-10-28 10:11 ` singler at gcc dot gnu dot org
2009-10-28 10:44 ` paolo dot carlini at oracle dot com
2009-10-29 16:53 ` law at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).