public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
@ 2013-03-06 11:08 ` steven at gcc dot gnu.org
  2013-03-06 11:38 ` rguenth at gcc dot gnu.org
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-06 11:08 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|gcc-bugs at gcc dot gnu.org |
             Blocks|                            |47344

--- Comment #21 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-06 11:07:54 UTC ---
IRA now is very different from IRA in early 2009.  Some new timings
for GCC 4.8 trunk would be nice. Perhaps this bug is already fixed.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
  2013-03-06 11:08 ` [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2 steven at gcc dot gnu.org
@ 2013-03-06 11:38 ` rguenth at gcc dot gnu.org
  2013-03-06 16:50 ` sergstesh at yahoo dot com
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-06 11:38 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #22 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-06 11:38:07 UTC ---
4.7.2  -O0  25s  2189981kB

 integrated RA           :   8.96 (35%) usr   0.89 (28%) sys   9.89 (34%) wall 
206439 kB (16%) ggc
 reload                  :   2.98 (12%) usr   0.07 ( 2%) sys   3.05 (11%) wall 
 43197 kB ( 3%) ggc

4.8.0  -O0  29s  2111318kB

 integrated RA           :   8.88 (31%) usr   0.41 (13%) sys   9.32 (29%) wall 
206439 kB (18%) ggc
 LRA non-specific        :   5.50 (19%) usr   0.08 ( 3%) sys   5.59 (17%) wall 
  4600 kB ( 0%) ggc

4.8.0  -O2  (terminated after 9 minutes waiting, LIM being the offender, I
suspect domwalk ...)  >2.5GB

applying domwalk fix ...

4.8.0  -O1  still awfully slow

the smaller testcase is also tested on http://gcc.opensuse.org/c++bench/random/
but it runs OOM at -O1 and -O2 there (there is a virtual ulimit of 1GB due
to lack of resources on the machine), but -O3 surprisingly works so you
can see a recent time-report there (also for -O0).

At -O3 you can see there (everything > 10%):

 tree loop invariant motion:  53.83 (13%) usr
 PRE                     : 116.26 (29%) usr 
 LRA hard reg assignment :  73.68 (18%) usr
 load CSE after reload   :  38.67 (10%) usr

ISTR the testcases are not exactly exposing the same issues.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
  2013-03-06 11:08 ` [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2 steven at gcc dot gnu.org
  2013-03-06 11:38 ` rguenth at gcc dot gnu.org
@ 2013-03-06 16:50 ` sergstesh at yahoo dot com
  2013-03-06 23:39 ` steven at gcc dot gnu.org
                   ` (32 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: sergstesh at yahoo dot com @ 2013-03-06 16:50 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #23 from Sergei Steshenko <sergstesh at yahoo dot com> 2013-03-06 16:49:51 UTC ---
FYI, the original file ( http://gcc.gnu.org/bugzilla/attachment.cgi?id=17377 )
can be compiled with 'clang', albeit slowly:

"
sergei@amdam2:~/gcc_bug> time ~/AFSWD/install/LLVM-3.2/binsh/clang -v
gap_TlnLv4.c -O2 -c
clang version 3.2 (tags/RELEASE_32/final)
Target: i386-pc-linux-gnu
Thread model: posix
 "/mnt/sdb5/qemu/20121021/LLVM-3.2/bin/clang" -cc1 -triple i386-pc-linux-gnu
-emit-obj -disable-free -disable-llvm-verifier -main-file-name gap_TlnLv4.c
-mrelocation-model static -fmath-errno -masm-verbose -mconstructor-aliases
-target-cpu pentium4 -target-linker-version 2.22 -momit-leaf-frame-pointer -v
-coverage-file /home/sergei/gcc_bug/gap_TlnLv4.o -resource-dir
/mnt/sdb5/qemu/20121021/LLVM-3.2/bin/../lib/clang/3.2 -fmodule-cache-path
/tmp/clang-module-cache -internal-isystem /usr/local/include -internal-isystem
/mnt/sdb5/qemu/20121021/LLVM-3.2/bin/../lib/clang/3.2/include
-internal-externc-isystem /include -internal-externc-isystem /usr/include -O2
-fdebug-compilation-dir /home/sergei/gcc_bug -ferror-limit 19 -fmessage-length
182 -mstackrealign -fobjc-runtime=gcc -fdiagnostics-show-option
-fcolor-diagnostics -o gap_TlnLv4.o -x c gap_TlnLv4.c
clang -cc1 version 3.2 based upon LLVM 3.2svn default target i386-pc-linux-gnu
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /mnt/sdb5/qemu/20121021/LLVM-3.2/bin/../lib/clang/3.2/include
 /usr/include
End of search list.

real    58m50.364s
user    53m25.128s
sys     0m11.641s
sergei@amdam2:~/gcc_bug>    
".

Memory consumption is about 186M.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2013-03-06 16:50 ` sergstesh at yahoo dot com
@ 2013-03-06 23:39 ` steven at gcc dot gnu.org
  2013-03-07  0:08 ` steven at gcc dot gnu.org
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-06 23:39 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |steven at gcc dot gnu.org

--- Comment #24 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-06 23:39:27 UTC ---
(In reply to comment #22)
> 4.8.0  -O2  (terminated after 9 minutes waiting, LIM being the offender, I
> suspect domwalk ...)  >2.5GB
> 
> applying domwalk fix ...

It is LIM, for sure. I've been watching in GDB for a while at some
back traces, and it's spent minutes already in this DOM walk:

#5  0x0000000000b841e1 in walk_dominator_tree (walk_data=0x7fffffffdd60,
bb=0x7fffef033958) at ../../trunk/gcc/domwalk.c:187
#6  0x0000000000c02d73 in determine_invariantness () at
../../trunk/gcc/tree-ssa-loop-im.c:1189
#7  tree_ssa_lim () at ../../trunk/gcc/tree-ssa-loop-im.c:2632
#8  0x000000000075bcd7 in execute_one_pass (pass=0x12323e0 <pass_lim>) at
../../trunk/gcc/passes.c:2330
#9  0x000000000075c0f5 in execute_pass_list (pass=0x12323e0 <pass_lim>) at
../../trunk/gcc/passes.c:2378

This is supposed to be cheap. Is this a known bottle-neck?

Pathetic...


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2013-03-06 23:39 ` steven at gcc dot gnu.org
@ 2013-03-07  0:08 ` steven at gcc dot gnu.org
  2013-03-07  0:27 ` steven at gcc dot gnu.org
                   ` (30 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-07  0:08 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #25 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 00:08:26 UTC ---
(In reply to comment #24)
> (In reply to comment #22)
> > 4.8.0  -O2  (terminated after 9 minutes waiting, LIM being the offender, I
> > suspect domwalk ...)  >2.5GB
> > 
> > applying domwalk fix ...
> 
> It is LIM, for sure. I've been watching in GDB for a while at some
> back traces, and it's spent minutes already in this DOM walk:

After this, it's spending even more time in refs_independent_p, doing
bitmap tests (ah! a test case for my splay tree bitmaps!).

Is the refs_independent_p relation symmetric? It certainly looks that
way.  If so, one way to halve the work done here is to build only half
the "independence" graph.  Currently it builds a full square:

  if (mem_refs_may_alias_p (ref1->mem, ref2->mem,
                            &memory_accesses.ttae_cache))
    { 
      bitmap_set_bit (ref1->dep_ref, ref2->id);
      bitmap_set_bit (ref2->dep_ref, ref1->id);
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file, "dependent.\n");
      return false;
    }
  else
    {
      bitmap_set_bit (ref1->indep_ref, ref2->id);
      bitmap_set_bit (ref2->indep_ref, ref1->id);
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file, "independent.\n");
      return true;
    }

It's hard to say how much memory is wasted here (obstack_memory_used
is still broken and overflows), but it's probably x*100MB if not more.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2013-03-07  0:08 ` steven at gcc dot gnu.org
@ 2013-03-07  0:27 ` steven at gcc dot gnu.org
  2013-03-07  8:10 ` steven at gcc dot gnu.org
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-07  0:27 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #26 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 00:26:56 UTC ---
(In reply to comment #23)
> FYI, the original file ( http://gcc.gnu.org/bugzilla/attachment.cgi?id=17377 )
> can be compiled with 'clang', albeit slowly:
...
> Memory consumption is about 186M.

How have you measured this?


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2013-03-07  0:27 ` steven at gcc dot gnu.org
@ 2013-03-07  8:10 ` steven at gcc dot gnu.org
  2013-03-07  8:45 ` rguenther at suse dot de
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-07  8:10 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #27 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 08:09:59 UTC ---
Compilation finished after ~3 hours and consuming at least 3GB (from top - I
forgot to use memmax2...).

Winners in the "geez, I'm slow for this test case" list:

 PRE                        :  6241.75 (57%) usr
 tree loop invariant motion :  4194.83 (38%) usr
 TOTAL                      : 11021.46 

Time is mostly spent in the alias oracle.

For LIM I'm going to test using triangle instead of square dep/indep
matrix, as a first step.  Richi, PRE is yours so... ;-)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2013-03-07  8:10 ` steven at gcc dot gnu.org
@ 2013-03-07  8:45 ` rguenther at suse dot de
  2013-03-07  8:48 ` rguenther at suse dot de
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenther at suse dot de @ 2013-03-07  8:45 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #28 from rguenther at suse dot de <rguenther at suse dot de> 2013-03-07 08:44:28 UTC ---
On Wed, 6 Mar 2013, steven at gcc dot gnu.org wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326
> 
> Steven Bosscher <steven at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |steven at gcc dot gnu.org
> 
> --- Comment #24 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-06 23:39:27 UTC ---
> (In reply to comment #22)
> > 4.8.0  -O2  (terminated after 9 minutes waiting, LIM being the offender, I
> > suspect domwalk ...)  >2.5GB
> > 
> > applying domwalk fix ...
> 
> It is LIM, for sure. I've been watching in GDB for a while at some
> back traces, and it's spent minutes already in this DOM walk:
> 
> #5  0x0000000000b841e1 in walk_dominator_tree (walk_data=0x7fffffffdd60,
> bb=0x7fffef033958) at ../../trunk/gcc/domwalk.c:187
> #6  0x0000000000c02d73 in determine_invariantness () at
> ../../trunk/gcc/tree-ssa-loop-im.c:1189
> #7  tree_ssa_lim () at ../../trunk/gcc/tree-ssa-loop-im.c:2632
> #8  0x000000000075bcd7 in execute_one_pass (pass=0x12323e0 <pass_lim>) at
> ../../trunk/gcc/passes.c:2330
> #9  0x000000000075c0f5 in execute_pass_list (pass=0x12323e0 <pass_lim>) at
> ../../trunk/gcc/passes.c:2378
> 
> This is supposed to be cheap. Is this a known bottle-neck?
> 
> Pathetic...

Yes, it's known - and there are several known (to me ...) ways to
make constant factor compile-time and memory-usage improvements...

(I _think_ we have a bug for LIMs slowness, if you can't find it
quickly you can create one and assign me - I have some TLC patches
locally queued for 4.9, but they don't help the slowness very much)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2013-03-07  8:45 ` rguenther at suse dot de
@ 2013-03-07  8:48 ` rguenther at suse dot de
  2013-03-07  8:53 ` rguenther at suse dot de
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenther at suse dot de @ 2013-03-07  8:48 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #29 from rguenther at suse dot de <rguenther at suse dot de> 2013-03-07 08:47:35 UTC ---
On Thu, 7 Mar 2013, steven at gcc dot gnu.org wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326
> 
> --- Comment #25 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 00:08:26 UTC ---
> (In reply to comment #24)
> > (In reply to comment #22)
> > > 4.8.0  -O2  (terminated after 9 minutes waiting, LIM being the offender, I
> > > suspect domwalk ...)  >2.5GB
> > > 
> > > applying domwalk fix ...
> > 
> > It is LIM, for sure. I've been watching in GDB for a while at some
> > back traces, and it's spent minutes already in this DOM walk:
> 
> After this, it's spending even more time in refs_independent_p, doing
> bitmap tests (ah! a test case for my splay tree bitmaps!).
> 
> Is the refs_independent_p relation symmetric? It certainly looks that
> way.  If so, one way to halve the work done here is to build only half
> the "independence" graph.  Currently it builds a full square:
> 
>   if (mem_refs_may_alias_p (ref1->mem, ref2->mem,
>                             &memory_accesses.ttae_cache))
>     { 
>       bitmap_set_bit (ref1->dep_ref, ref2->id);
>       bitmap_set_bit (ref2->dep_ref, ref1->id);
>       if (dump_file && (dump_flags & TDF_DETAILS))
>         fprintf (dump_file, "dependent.\n");
>       return false;
>     }
>   else
>     {
>       bitmap_set_bit (ref1->indep_ref, ref2->id);
>       bitmap_set_bit (ref2->indep_ref, ref1->id);
>       if (dump_file && (dump_flags & TDF_DETAILS))
>         fprintf (dump_file, "independent.\n");
>       return true;
>     }
> 
> It's hard to say how much memory is wasted here (obstack_memory_used
> is still broken and overflows), but it's probably x*100MB if not more.

Yeah, one of my minor TLC patches.  Most of the excessive memory
usage for regular testcases can be fixed by doing LIM on
all siblings of the loop tree independently, instead of keeping
data on all loops in the function live at the same time
(I've not finished _that_ TLC patch - and it won't help for
insanely deep nests, of course)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2013-03-07  8:48 ` rguenther at suse dot de
@ 2013-03-07  8:53 ` rguenther at suse dot de
  2013-03-07  9:58 ` steven at gcc dot gnu.org
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenther at suse dot de @ 2013-03-07  8:53 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #30 from rguenther at suse dot de <rguenther at suse dot de> 2013-03-07 08:52:52 UTC ---
On Thu, 7 Mar 2013, steven at gcc dot gnu.org wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326
> 
> --- Comment #27 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 08:09:59 UTC ---
> Compilation finished after ~3 hours and consuming at least 3GB (from top - I
> forgot to use memmax2...).
> 
> Winners in the "geez, I'm slow for this test case" list:
> 
>  PRE                        :  6241.75 (57%) usr
>  tree loop invariant motion :  4194.83 (38%) usr
>  TOTAL                      : 11021.46 
> 
> Time is mostly spent in the alias oracle.
> 
> For LIM I'm going to test using triangle instead of square dep/indep
> matrix, as a first step.  Richi, PRE is yours so... ;-)

Actually PRE is yours - it's the RTL PRE ;)

Richard.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2013-03-07  8:53 ` rguenther at suse dot de
@ 2013-03-07  9:58 ` steven at gcc dot gnu.org
  2013-03-07 10:14 ` sergstesh at yahoo dot com
                   ` (24 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-07  9:58 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|unassigned at gcc dot       |steven at gcc dot gnu.org
                   |gnu.org                     |

--- Comment #31 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 09:58:03 UTC ---
(In reply to comment #30)
Hmm, RTL PRE isn't really mine either, but I probably know it as well as
anyone else, so I will have a look. It's probably a similar issue as for
gimple LIM: alias checks.

Can you attach your TLC patches for gimple LIM to this PR, please?


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2013-03-07  9:58 ` steven at gcc dot gnu.org
@ 2013-03-07 10:14 ` sergstesh at yahoo dot com
  2013-03-07 10:15 ` rguenther at suse dot de
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: sergstesh at yahoo dot com @ 2013-03-07 10:14 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #32 from Sergei Steshenko <sergstesh at yahoo dot com> 2013-03-07 10:13:40 UTC ---
(In reply to comment #26)
> (In reply to comment #23)
> > FYI, the original file ( http://gcc.gnu.org/bugzilla/attachment.cgi?id=17377 )
> > can be compiled with 'clang', albeit slowly:
> ...
> > Memory consumption is about 186M.
> 
> How have you measured this?

>From time to time I was looking at output of 'top'. The compiler pretty quickly
reaches the 186M +/- mark and stays at it.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2013-03-07 10:14 ` sergstesh at yahoo dot com
@ 2013-03-07 10:15 ` rguenther at suse dot de
  2013-03-07 17:14 ` sergstesh at yahoo dot com
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenther at suse dot de @ 2013-03-07 10:15 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #33 from rguenther at suse dot de <rguenther at suse dot de> 2013-03-07 10:14:53 UTC ---
On Thu, 7 Mar 2013, steven at gcc dot gnu.org wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326
> 
> Steven Bosscher <steven at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEW                         |ASSIGNED
>          AssignedTo|unassigned at gcc dot       |steven at gcc dot gnu.org
>                    |gnu.org                     |
> 
> --- Comment #31 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 09:58:03 UTC ---
> (In reply to comment #30)
> Hmm, RTL PRE isn't really mine either, but I probably know it as well as
> anyone else, so I will have a look. It's probably a similar issue as for
> gimple LIM: alias checks.
> 
> Can you attach your TLC patches for gimple LIM to this PR, please?

Bah, I seem to have them in some local tree only but cannot find it.
Maybe I accidentially reverted :/

ISTR moving stuff to obstacks as well (LIM has the heaviest load on
malloc/free).

Oh well.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2013-03-07 10:15 ` rguenther at suse dot de
@ 2013-03-07 17:14 ` sergstesh at yahoo dot com
  2013-03-07 17:31 ` steven at gcc dot gnu.org
                   ` (21 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: sergstesh at yahoo dot com @ 2013-03-07 17:14 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #34 from Sergei Steshenko <sergstesh at yahoo dot com> 2013-03-07 17:13:42 UTC ---
Somehow, with -O3 LLVM clang works a little bit faster than with -O2 - 54
minutes instead of 58 minutes, though this might be a random variation:

"
sergei@amdam2:~/gcc_bug> time ~/AFSWD/install/LLVM-3.2/binsh/clang -v
gap_TlnLv4.c -O3 -c
clang version 3.2 (tags/RELEASE_32/final)
Target: i386-pc-linux-gnu
Thread model: posix
 "/mnt/sdb5/qemu/20121021/LLVM-3.2/bin/clang" -cc1 -triple i386-pc-linux-gnu
-emit-obj -disable-free -disable-llvm-verifier -main-file-name gap_TlnLv4.c
-mrelocation-model static -fmath-errno -masm-verbose -mconstructor-aliases
-target-cpu pentium4 -target-linker-version 2.22 -momit-leaf-frame-pointer -v
-coverage-file /home/sergei/gcc_bug/gap_TlnLv4.o -resource-dir
/mnt/sdb5/qemu/20121021/LLVM-3.2/bin/../lib/clang/3.2 -fmodule-cache-path
/tmp/clang-module-cache -internal-isystem /usr/local/include -internal-isystem
/mnt/sdb5/qemu/20121021/LLVM-3.2/bin/../lib/clang/3.2/include
-internal-externc-isystem /include -internal-externc-isystem /usr/include -O3
-fdebug-compilation-dir /home/sergei/gcc_bug -ferror-limit 19 -fmessage-length
182 -mstackrealign -fobjc-runtime=gcc -fdiagnostics-show-option
-fcolor-diagnostics -o gap_TlnLv4.o -x c gap_TlnLv4.c
clang -cc1 version 3.2 based upon LLVM 3.2svn default target i386-pc-linux-gnu
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /mnt/sdb5/qemu/20121021/LLVM-3.2/bin/../lib/clang/3.2/include
 /usr/include
End of search list.

real    54m18.212s
user    52m56.062s
sys     0m8.085s
sergei@amdam2:~/gcc_bug>
".


Memory consumption appears to be the same as with -O2.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2013-03-07 17:14 ` sergstesh at yahoo dot com
@ 2013-03-07 17:31 ` steven at gcc dot gnu.org
  2013-03-07 17:34 ` steven at gcc dot gnu.org
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-07 17:31 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #35 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 17:30:58 UTC ---
(In reply to comment #34)
> Memory consumption appears to be the same as with -O2.

Can you measure the peak memory with time?

 /usr/bin/time -f 'real=%e user=%U system=%S share=%P%% maxrss=%M ins=%I
outs=%O mfaults=%R waits=%w'


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2013-03-07 17:31 ` steven at gcc dot gnu.org
@ 2013-03-07 17:34 ` steven at gcc dot gnu.org
  2013-03-07 21:48 ` sergstesh at yahoo dot com
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-07 17:34 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #36 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 17:33:28 UTC ---
(In reply to comment #29)
> Yeah, one of my minor TLC patches.  Most of the excessive memory
> usage for regular testcases can be fixed by doing LIM on
> all siblings of the loop tree independently, instead of keeping
> data on all loops in the function live at the same time
> (I've not finished _that_ TLC patch - and it won't help for
> insanely deep nests, of course)

It would be very helpful if you can try to recover this one at least, or
explain a bit more what you had done.  I don't see right away in the code
what you mean.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2013-03-07 17:34 ` steven at gcc dot gnu.org
@ 2013-03-07 21:48 ` sergstesh at yahoo dot com
  2013-03-07 22:16 ` steven at gcc dot gnu.org
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: sergstesh at yahoo dot com @ 2013-03-07 21:48 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #37 from Sergei Steshenko <sergstesh at yahoo dot com> 2013-03-07 21:47:52 UTC ---
(In reply to comment #35)
> (In reply to comment #34)
> > Memory consumption appears to be the same as with -O2.
> 
> Can you measure the peak memory with time?
> 
>  /usr/bin/time -f 'real=%e user=%U system=%S share=%P%% maxrss=%M ins=%I
> outs=%O mfaults=%R waits=%w'

"
+ /usr/bin/time -f 'real=%e user=%U system=%S share=%P%% maxrss=%M
ins=%Iouts=%O mfaults=%R waits=%w'
/mnt/sdb8/sergei/AFSWD_debug/20121021/LLVM-3.2/bin/clang -v gap_TlnLv4.c -O3 -c
clang version 3.2 (tags/RELEASE_32/final)
Target: i386-pc-linux-gnu
Thread model: posix
 "/mnt/sdb5/qemu/20121021/LLVM-3.2/bin/clang" -cc1 -triple i386-pc-linux-gnu
-emit-obj -disable-free -disable-llvm-verifier -main-file-name gap_TlnLv4.c
-mrelocation-model static -fmath-errno -masm-verbose -mconstructor-aliases
-target-cpu pentium4 -target-linker-version 2.22 -momit-leaf-frame-pointer -v
-coverage-file /home/sergei/gcc_bug/gap_TlnLv4.o -resource-dir
/mnt/sdb5/qemu/20121021/LLVM-3.2/bin/../lib/clang/3.2 -fmodule-cache-path
/tmp/clang-module-cache -internal-isystem /usr/local/include -internal-isystem
/mnt/sdb5/qemu/20121021/LLVM-3.2/bin/../lib/clang/3.2/include
-internal-externc-isystem /include -internal-externc-isystem /usr/include -O3
-fdebug-compilation-dir /home/sergei/gcc_bug -ferror-limit 19 -fmessage-length
182 -mstackrealign -fobjc-runtime=gcc -fdiagnostics-show-option
-fcolor-diagnostics -o gap_TlnLv4.o -x c gap_TlnLv4.c
clang -cc1 version 3.2 based upon LLVM 3.2svn default target i386-pc-linux-gnu
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /mnt/sdb5/qemu/20121021/LLVM-3.2/bin/../lib/clang/3.2/include
 /usr/include
End of search list.
real=3323.86 user=3224.22 system=8.70 share=97%% maxrss=0 ins=82598outs=8720
mfaults=193511 waits=669
"

- I am not sure 'maxrss=0' makes sense. Anyway, several minutes before
completion 'top' showed 224m (or 228m - I do not remember exactly) in VIRT
column.

When 'gcc' wokrs, the machine becomes very slowly responsive due to memory
usage growth; with 'clang' I do not notice slow down.

My machine is dual core with 2G of physical memory.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2013-03-07 21:48 ` sergstesh at yahoo dot com
@ 2013-03-07 22:16 ` steven at gcc dot gnu.org
  2013-03-07 23:19 ` steven at gcc dot gnu.org
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-07 22:16 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #38 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 22:15:39 UTC ---
Created attachment 29612
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29612
Punt on loops with more memory references than LIM can handle

For the LIM problem, I'm thinking of a patch like this one (not tested
yet).  Basically, this means giving up on really large loops with many
loads and/or stores.  That's not an unreasonable thing to do anyway.
Code motion from really big loops usually isn't an improvement.

Richi, could you have a look at this, and see if I've got it all right,
more-or-less?  LIM is quite complicated and I'm not sure if I should
look at, or update, the set of optimizable loops somewhere.

With the patch, and with "-O2 -fgcse", I now have:

gap_TlnLv4.c: In function 'RDFT_49152_1':
gap_TlnLv4.c:37502:18: warning: -ftree-loop-im: number of memory references in
loop exceeds the --param loops-max-datarefs-for-datadeps threshold
[-Wdisabled-optimization]
      t12308[500] = t12304[6144*i0+1000];
                  ^

 dead store elim1        :  57.70 ( 8%)
 dead store elim2        :  76.82 (10%)
 combiner                : 360.07 (48%)
 reload CSE regs         :  55.44 ( 7%)
 TOTAL                   : 743.77


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (17 preceding siblings ...)
  2013-03-07 22:16 ` steven at gcc dot gnu.org
@ 2013-03-07 23:19 ` steven at gcc dot gnu.org
  2013-03-08  9:14 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-07 23:19 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #39 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 23:18:48 UTC ---
Memory usage is still pathetic. Some stats:

mem stats from /proc/self/statm on *entry* of pass:

pass (#)                          size        resident   
*warn_unused_result ( -1)    177106944       140742656
...
*init_datastructures ( -1)   209670144       169570304
ssa ( 20)                    209670144       169582592
*rebuild_cgraph_edges ( -1)  302575616       259780608
...
expand (169)                 390647808       349757440
vregs (170)                  477892608       425406464
...
dfinit (174)                 477892608       425627648
cse1 (175)                   615616512       565694464 
...
dse1 (195)                   727654400       634572800
fwprop2 (196)                791678976       723767296
...
combine (200)                823115776       752177152 
ce2 (201)                   6589759488      6512082944 
...
csa (222)                   6938136576      1057169408

This means that "combine" explodes the memory foot print by ~5.4 GB.

On entry to combine, the memory foot print is ~750MB. I don't think
that is unreasonable. But after combine, the memory footprint jumps
to ~7GB max, and 350GB resident (after a ggc_collect before csa).
I'm guessing most of that 350GB is combine leaving memory scattered
so that pages can't be released.

dfinit is also a major contributor to the huge memory foot print.  I
have seen that in other test cases also, but so far I can't figure out
how it can consume this much memory.

Interestingly, the initial memory footprint, out of the front end, is
~180 MB. From there, the memory foot print gradually goes up, but that
is probably in part because ggc_collect is just never called (this is
on a box with 16GB RAM), with GGC params:

  --param ggc-min-expand=100 --param ggc-min-heapsize=131072

With both parameters halved, the combine problem obviously remains,
and the dfinit problem becomes more pronounced:

expand (169)                 328032256       287195136
vregs (170)                  474415104       424263680
...
dfinit (174)                 474415104       424484864
cse1 (175)                   613801984       564596736
...
combine (200)                823496704       752828416
ce2 (201)                   6590169088      6512717824

The good news: There clearly is a lot of room for improvement for this 
test case. :-)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (19 preceding siblings ...)
  2013-03-08  9:14 ` rguenth at gcc dot gnu.org
@ 2013-03-08  9:14 ` rguenth at gcc dot gnu.org
  2013-03-08  9:23 ` rguenther at suse dot de
                   ` (14 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-08  9:14 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #41 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-08 09:13:53 UTC ---
Created attachment 29616
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29616
make LIM work per outermost loop


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (18 preceding siblings ...)
  2013-03-07 23:19 ` steven at gcc dot gnu.org
@ 2013-03-08  9:14 ` rguenth at gcc dot gnu.org
  2013-03-08  9:14 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-08  9:14 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #40 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-08 09:12:38 UTC ---
(In reply to comment #31)
> (In reply to comment #30)
> Hmm, RTL PRE isn't really mine either, but I probably know it as well as
> anyone else, so I will have a look. It's probably a similar issue as for
> gimple LIM: alias checks.
> 
> Can you attach your TLC patches for gimple LIM to this PR, please?

Found it on some testing machine.  I merged it and saw to it that it
compiles and passes tree-ssa.exp (some ???s remain, I've added a TODO).
The idea should be clear, and it's memory savings only and it will eventually
be a bit slower (unless it is optimized further).


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (20 preceding siblings ...)
  2013-03-08  9:14 ` rguenth at gcc dot gnu.org
@ 2013-03-08  9:23 ` rguenther at suse dot de
  2013-03-09 14:58 ` steven at gcc dot gnu.org
                   ` (13 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenther at suse dot de @ 2013-03-08  9:23 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #42 from rguenther at suse dot de <rguenther at suse dot de> 2013-03-08 09:22:39 UTC ---
On Thu, 7 Mar 2013, steven at gcc dot gnu.org wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326
> 
> --- Comment #38 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 22:15:39 UTC ---
> Created attachment 29612
>   --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29612
> Punt on loops with more memory references than LIM can handle
> 
> For the LIM problem, I'm thinking of a patch like this one (not tested
> yet).  Basically, this means giving up on really large loops with many
> loads and/or stores.  That's not an unreasonable thing to do anyway.
> Code motion from really big loops usually isn't an improvement.
> 
> Richi, could you have a look at this, and see if I've got it all right,
> more-or-less?  LIM is quite complicated and I'm not sure if I should
> look at, or update, the set of optimizable loops somewhere.

Yeah, well - it should be easy to avoid most overhead (even collecting
the memory references) with my proposed scheme (see patch).  First,
for each BB count the number of memory stmts (easy, look for VUSEs),
then, when walking the set of outermost loops we want to apply LIM to
sum over its BB counts and instead walk over its siblings if the number
is too large.

But yes, limiting it should be done as it performs O(n^2) dependence
checks (well, O(#of stores * #of memory references), so for a low
number of stores it's quite cheap).

I've yet to recover the obstack-ification of struct depend,
struct mem_ref, struct mem_ref_locs and lim_aux_data allocations ...
(LIM is the biggest load on malloc/free).

> With the patch, and with "-O2 -fgcse", I now have:
> 
> gap_TlnLv4.c: In function 'RDFT_49152_1':
> gap_TlnLv4.c:37502:18: warning: -ftree-loop-im: number of memory references in
> loop exceeds the --param loops-max-datarefs-for-datadeps threshold
> [-Wdisabled-optimization]
>       t12308[500] = t12304[6144*i0+1000];
>                   ^
> 
>  dead store elim1        :  57.70 ( 8%)
>  dead store elim2        :  76.82 (10%)
>  combiner                : 360.07 (48%)
>  reload CSE regs         :  55.44 ( 7%)
>  TOTAL                   : 743.77

Nice.  Well ... ;)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (21 preceding siblings ...)
  2013-03-08  9:23 ` rguenther at suse dot de
@ 2013-03-09 14:58 ` steven at gcc dot gnu.org
  2013-03-09 17:26 ` steven at gcc dot gnu.org
                   ` (12 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-09 14:58 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #43 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-09 14:57:52 UTC ---
The problem with combine is only "collateral damage" from what dse1 is
doing to this function.  It's loading stored values into registers and
replacing re-loads with those registers, as in:

@@ -747831,6 +780399,7 @@
  474783: r569578:SF=r569577:SF-r177936:SF
       REG_DEAD r569577:SF
       REG_DEAD r177936:SF
+ 605734: r662460:SF=r569578:SF
  474784: [const(`t18571.26403'+0x5fe8)]=r569578:SF
       REG_DEAD r569578:SF
  474786: r177942:SF=[const(`t12303.26335'+0x9008)]
...
  487110: [const(`t24716.26406'+0x2ff0)]=r578824:SF
       REG_DEAD r578824:SF
- 487113: r578827:SF=[const(`t18571.26403'+0x5fe8)]
+ 487113: r578827:SF=r662460:SF
  487114: [const(`t24716.26406'+0x2ff4)]=r578827:SF
       REG_DEAD r578827:SF

The distance between the store and load is large, and that is what is
causing combine to blow up -- walking all the insns between 605734 and
487113 (about 500 of them, which appears to be typical).  This also
increases register pressure, causing the poor IRA/LRA/postreload numbers.

Also not great: dse is loading the same stored value into multiple
registers, e.g.:

@@ -147272,6 +175190,8 @@
  110394: r288307:SF=r288306:SF-r37078:SF
       REG_DEAD r288306:SF
       REG_DEAD r37078:SF
+ 601084: r657810:SF=r288307:SF
+ 601085: r657811:SF=r288307:SF
  110395: [r288302:DI]=r288307:SF
       REG_DEAD r288307:SF

but I'm guessing (and verifying now) that this is cleaned up by fwprop2.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (22 preceding siblings ...)
  2013-03-09 14:58 ` steven at gcc dot gnu.org
@ 2013-03-09 17:26 ` steven at gcc dot gnu.org
  2013-03-11  9:40 ` steven at gcc dot gnu.org
                   ` (11 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-09 17:26 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #44 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-09 17:25:46 UTC ---
Created attachment 29628
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29628
Re-use store register if possible

This patch resolves the issue mentioned in comment #43, and makes
combine run in reasonable time.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (23 preceding siblings ...)
  2013-03-09 17:26 ` steven at gcc dot gnu.org
@ 2013-03-11  9:40 ` steven at gcc dot gnu.org
  2013-03-12 10:47 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-11  9:40 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #45 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-11 09:40:18 UTC ---
Patches posted:

* Restrict GIMPLE loop invariant code motion of loop-invariant loads and
stores to loops with fewer memory references than a certain maximum that
is controlled with --param loops-max-datarefs-for-datadeps" from the
command line.
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00380.html

* Do not create new pseudo-registers for load-after-store transformations
in RTL dead store elimination.  This reduces the memory foot print after
DSE by ~2 percent, and avoids the compile time and memory usage explosion
in combine because it gets presented fewer single-def/single-use register
moves that are really just register copies.
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00379.html

* Make gcse.c respect -fno-gcse-lm. For the RTL PRE problem, this means
compile time is reasonable with -fno-gcse-lm.
A follow-up patch will implement some mechanism to disable load motion
automatically on extreme test cases like the one from this PR.
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00386.html


The remaining compile time bottlenecks are:

- RTL dead store limination in its analysis phase.  This is mostly time
spent in dependence tests in alias analysis for instructions in a single
basic block, so it's only a problem for test cases where there is a huge
number of loads and stores in each basic block. I don't think it is worth
speeding up DSE for such extreme cases. 

- Post-reload CSE because it is in the worst-case quadratic in the number 
of instructions in a basic block.  In most practical cases, post-reload 
CSE scales linearly with the number of instructions in a basic block, but
with a large constant bound. It looks up and down through the instruction
chain to see if a reg is not  clobbered between a use and a def.  Because
it only has to do so with  hard registers  the typical bound is closer to
"number of insns in basic block" * "number of hard registers".  This is
fine, I am not going to try and improve this.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (24 preceding siblings ...)
  2013-03-11  9:40 ` steven at gcc dot gnu.org
@ 2013-03-12 10:47 ` rguenth at gcc dot gnu.org
  2013-03-12 14:05 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-12 10:47 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #46 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-12 10:46:45 UTC ---
Created attachment 29649
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29649
symmetry in reference testing

Exploit symmetry in reference testing.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (25 preceding siblings ...)
  2013-03-12 10:47 ` rguenth at gcc dot gnu.org
@ 2013-03-12 14:05 ` rguenth at gcc dot gnu.org
  2013-03-15 16:07 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-12 14:05 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #47 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-12 14:05:10 UTC ---
With caching affine-combination compute and ao_ref compute I have it down to

 tree loop invariant motion: 596.91 (79%) usr   0.73 (29%) sys 599.77 (78%)
wall   31135 kB ( 3%) ggc

without limiting anything.  I tried a more intelligent ref_indep_in_loop
(avoiding another quadraticness in loop depth ...) but it gets slower ...

*************** ref_indep_loop_p_1 (struct loop *loop, m
*** 2230,2242 ****
    bitmap refs_to_check;
    unsigned i;
    bitmap_iterator bi;
!   bool ret = true, stored = bitmap_bit_p (ref->stored, loop->num);
    mem_ref_p aref;

!   if (stored)
!     refs_to_check = memory_accesses.all_refs_in_loop[loop->num];
    else
!     refs_to_check = memory_accesses.all_refs_stored_in_loop[loop->num];

    EXECUTE_IF_SET_IN_BITMAP (refs_to_check, 0, i, bi)
      {
--- 2233,2245 ----
    bitmap refs_to_check;
    unsigned i;
    bitmap_iterator bi;
!   bool ret = true;
    mem_ref_p aref;

!   if (bitmap_bit_p (ref->stored, loop->num))
!     refs_to_check = memory_accesses.refs_in_loop[loop->num];
    else
!     refs_to_check = memory_accesses.refs_stored_in_loop[loop->num];

    EXECUTE_IF_SET_IN_BITMAP (refs_to_check, 0, i, bi)
      {
*************** ref_indep_loop_p_1 (struct loop *loop, m
*** 2259,2272 ****
  static bool
  ref_indep_loop_p (struct loop *loop, mem_ref_p ref)
  {
-   bool ret;
-
    if (bitmap_bit_p (ref->indep_loop, loop->num))
      return true;
    if (bitmap_bit_p (ref->dep_loop, loop->num))
      return false;

!   ret = ref_indep_loop_p_1 (loop, ref);

    if (dump_file && (dump_flags & TDF_DETAILS))
      fprintf (dump_file, "Querying dependencies of ref %u in loop %d: %s\n",
--- 2262,2286 ----
  static bool
  ref_indep_loop_p (struct loop *loop, mem_ref_p ref)
  {
    if (bitmap_bit_p (ref->indep_loop, loop->num))
      return true;
    if (bitmap_bit_p (ref->dep_loop, loop->num))
      return false;

!   bool ret = true;
!   struct loop *inner = loop->inner;
!   while (inner)
!     {
!       if (!ref_indep_loop_p (inner, ref))
!       {
!         ret = false;
!         break;
!       }
!       inner = inner->next;
!     }
!
!   if (ret)
!     ret = ref_indep_loop_p_1 (loop, ref);

    if (dump_file && (dump_flags & TDF_DETAILS))
      fprintf (dump_file, "Querying dependencies of ref %u in loop %d: %s\n",


with it we can also get rid of the all_refs_in_loop bitmap (only
all_refs_stored_in_loop is then used, by store-motion).

I don't see why the above should result in slower operation :/  It should
save us the time to re-query references in sub-loops (yes, they should
be cached already ... still walking the bitmap is not free).

To mimic parts of this idea the dep_loop/indep_loop checking/setting can
be improved to cover outer loops like with

@@ -2296,7 +2218,12 @@ record_indep_loop (struct loop *loop, me
   if (indep)
     bitmap_set_bit (ref->indep_loop, loop->num);
   else
-    bitmap_set_bit (ref->dep_loop, loop->num);
+    do
+      {
+       bitmap_set_bit (ref->dep_loop, loop->num);
+       loop = loop_outer (loop);
+      }
+    while (loop != current_loops->tree_root);
 }

and

@@ -2339,8 +2266,13 @@ ref_indep_loop_p (struct loop *loop, mem
 {
   bool ret;

-  if (bitmap_bit_p (ref->indep_loop, loop->num))
-    return true;
+  struct loop *oloop = loop;
+  do
+    {
+      if (bitmap_bit_p (ref->indep_loop, oloop->num))
+       return true;
+      oloop = loop_outer (oloop);
+    }
+  while (oloop != current_loops->tree_root);
   if (bitmap_bit_p (ref->dep_loop, loop->num))
     return false;

(no difference in compile-time)

anyway, another obvious improvement is to merge the two two-bitmap
sets (they are tri-state, dependent, independent and unknown).  That's
more cache-friendly (now, where's that tri-state "bitmap" implementation
that also saves the half bit we waste ... ;))  It helps a bit.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (26 preceding siblings ...)
  2013-03-12 14:05 ` rguenth at gcc dot gnu.org
@ 2013-03-15 16:07 ` rguenth at gcc dot gnu.org
  2013-03-18  8:44 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-15 16:07 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #48 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-15 16:06:33 UTC ---
Removing all the caching (dep/indep_loop and dep/indep_ref) makes things
faster ... (they have a hit rate of 5% resp. 6.2% only)

Fastest timing sofar:

 tree loop invariant motion:  57.42 (26%) usr

that's without any limiting and with just the dep/indep_ref cache removed
and a (working) patch according to comment #47.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (27 preceding siblings ...)
  2013-03-15 16:07 ` rguenth at gcc dot gnu.org
@ 2013-03-18  8:44 ` rguenth at gcc dot gnu.org
  2013-03-21 20:04 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-18  8:44 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #49 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-18 08:43:08 UTC ---
Author: rguenth
Date: Mon Mar 18 08:42:57 2013
New Revision: 196768

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196768
Log:
2013-03-18  Richard Biener  <rguenther@suse.de>

    PR tree-optimization/39326
    * tree-ssa-loop-im.c (refs_independent_p): Exploit symmetry.
    (struct mem_ref): Replace mem member with ao_ref typed member.
    (MEM_ANALYZABLE): Adjust.
    (memref_eq): Likewise.
    (mem_ref_alloc): Likewise.
    (gather_mem_refs_stmt): Likewise.
    (mem_refs_may_alias_p): Use the ao_ref to query the alias oracle.
    (execute_sm_if_changed_flag_set): Adjust.
    (execute_sm): Likewise.
    (ref_always_accessed_p): Likewise.
    (refs_independent_p): Likewise.
    (can_sm_ref_p): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-loop-im.c


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (28 preceding siblings ...)
  2013-03-18  8:44 ` rguenth at gcc dot gnu.org
@ 2013-03-21 20:04 ` jakub at gcc dot gnu.org
  2013-03-22 14:01 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-03-21 20:04 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #50 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-03-21 20:04:45 UTC ---
Author: rguenth
Date: Thu Mar 21 14:45:36 2013
New Revision: 196878

URL: http://gcc.gnu.org/viewcvs?rev=196878&root=gcc&view=rev
Log:
2013-03-21  Richard Biener  <rguenther@suse.de>

    PR tree-optimization/39326
    * tree-ssa-loop-im.c (UNANALYZABLE_MEM_ID): New define.
    (MEM_ANALYZABLE): Adjust.
    (record_mem_ref_loc): Move bitmap ops ...
    (gather_mem_refs_stmt): ... here.  Use the shared mem-ref for
    unanalyzable refs, do not record locations for it.
    (analyze_memory_references): Allocate ref zero as shared
    unanalyzable ref.
    (refs_independent_p): Do not test for unanalyzed mems here.
    (ref_indep_loop_p_1): Special-case disambiguation against
    the unanalyzed ref.
    (ref_indep_loop_p): Assert we are not queried for the
    unanalyzed mem.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-loop-im.c


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (29 preceding siblings ...)
  2013-03-21 20:04 ` jakub at gcc dot gnu.org
@ 2013-03-22 14:01 ` rguenth at gcc dot gnu.org
  2013-03-26 10:10 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-22 14:01 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #51 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-22 14:00:25 UTC ---

>     (struct mem_ref): Replace mem member with ao_ref typed member.

RTL gcse (-O2) suffers from the same slowness in its dependence tests.  Caching
ao_ref instead of just mem and alias-set in RTL mem-attrs would improve
this tremendously.

A quick try reveals that gengtype is not at all happy with including
something in rtl.h though :/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (30 preceding siblings ...)
  2013-03-22 14:01 ` rguenth at gcc dot gnu.org
@ 2013-03-26 10:10 ` rguenth at gcc dot gnu.org
  2014-01-15 15:14 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-26 10:10 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #52 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-26 10:09:52 UTC ---
(In reply to comment #51)
> >     (struct mem_ref): Replace mem member with ao_ref typed member.
> 
> RTL gcse (-O2) suffers from the same slowness in its dependence tests.  Caching
> ao_ref instead of just mem and alias-set in RTL mem-attrs would improve
> this tremendously.
>
> A quick try reveals that gengtype is not at all happy with including
> something in rtl.h though :/

Doing that (caching the ao_ref) doesn't help as much as it did on the
GIMPLE level.  The most time-consuming part of canon_true_dependence
(which consumes 56% of compile-time) is find_base_term (37% of that 56%)
followed by memrefs_conflict_p and only then (after caching ao_ref)
rtx_refs_may_alias_p (18%).  The find_base_term result is another thing
that could be easily cached when doing dependence checks that repeatedly
use one or another operand.  For this testcase the most expensive caller
is compute_transp.

There is some obvious way to do less find_base_term calls in alias.c
itself.  I'm going to cleanup things there a bit.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (31 preceding siblings ...)
  2013-03-26 10:10 ` rguenth at gcc dot gnu.org
@ 2014-01-15 15:14 ` rguenth at gcc dot gnu.org
  2014-01-17 13:53 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-15 15:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #53 from Richard Biener <rguenth at gcc dot gnu.org> ---
With the PR59802 and PR38518 fixes on trunk I see for the 2nd testcase at -O2

 PRE                     :  19.23 (35%) usr   0.01 ( 1%) sys  19.22 (34%) wall 
  1421 kB ( 0%) ggc
 combiner                :  13.24 (24%) usr   0.14 (19%) sys  13.37 (24%) wall 
402488 kB (73%) ggc
 integrated RA           :   9.28 (17%) usr   0.35 (47%) sys   9.65 (17%) wall 
 26707 kB ( 5%) ggc
 TOTAL                 :  55.56             0.74            56.34            
550215 kB

compared to the 4.8 branch:

 tree loop invariant motion:  17.29 (22%) usr
 PRE                     :  24.25 (31%) usr
 combiner                :  12.52 (16%) usr   0.14 (17%) sys  12.66 (16%) wall 
402491 kB (74%) ggc
 integrated RA           :   7.96 (10%) usr   0.32 (38%) sys   8.25 (11%) wall 
 26420 kB ( 5%) ggc
 TOTAL                 :  77.53             0.84            78.53            
542544 kB

so it's an improvement (tree LIM is off the radar completely, RTL PRE
is improved a bit - looking at it now).


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (32 preceding siblings ...)
  2014-01-15 15:14 ` rguenth at gcc dot gnu.org
@ 2014-01-17 13:53 ` rguenth at gcc dot gnu.org
  2022-03-10 15:05 ` rguenth at gcc dot gnu.org
  2024-02-20 13:14 ` rguenth at gcc dot gnu.org
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-17 13:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #54 from Richard Biener <rguenth at gcc dot gnu.org> ---
For the original testcase on trunk we get at -O1

 tree loop invariant motion:  37.20 (16%) usr   0.02 ( 1%) sys  37.20 (16%)
wall   12127 kB ( 1%) ggc
 dead store elim1        :  17.42 ( 7%) usr   0.04 ( 2%) sys  17.44 ( 7%) wall 
 28345 kB ( 3%) ggc
 dead store elim2        :  26.22 (11%) usr   0.00 ( 0%) sys  26.20 (11%) wall 
 18664 kB ( 2%) ggc
 combiner                :  15.36 ( 6%) usr   0.04 ( 2%) sys  15.39 ( 6%) wall 
 70073 kB ( 7%) ggc
 LRA hard reg assignment :  74.07 (31%) usr   0.02 ( 1%) sys  74.05 (31%) wall 
     0 kB ( 0%) ggc
 reload CSE regs         :  12.67 ( 5%) usr   0.00 ( 0%) sys  12.66 ( 5%) wall 
 13315 kB ( 1%) ggc
 TOTAL                 : 237.86             1.83           239.60            
999181 kB

so that's still slow (memory usage is peaking at about 1.5GB now, during RA).

tree LIM:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
  4.22    120.72     9.83   299080     0.00     0.00  ref_indep_loop_p_2(loop*, 
mem_ref*, bool)
  1.89    151.82     4.41 1712943886     0.00     0.00 
mem_refs_may_alias_p(mem
_ref*, mem_ref*, pointer_map_t**) [clone .constprop.60]
  0.21    197.02     0.48    12192     0.00     0.00 
force_move_till(tree_node*, tree_node**, void*)

at -O2 this will blow up completely as it then runs into the affine-based
disambiguation.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (33 preceding siblings ...)
  2014-01-17 13:53 ` rguenth at gcc dot gnu.org
@ 2022-03-10 15:05 ` rguenth at gcc dot gnu.org
  2024-02-20 13:14 ` rguenth at gcc dot gnu.org
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-03-10 15:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

--- Comment #61 from Richard Biener <rguenth at gcc dot gnu.org> ---
r12-7592, first testcase, x86_64:

-O0:   6s, 1GB

-O1: 264s, 1.4GB

 callgraph ipa passes               :  30.47 ( 12%)
 alias stmt walking                 :  67.44 ( 26%)
 tree loop invariant motion         : 120.86 ( 46%)

-O2: 1383s, 11.8GB

 alias stmt walking                 : 174.01 ( 13%)
 tree PRE                           :  87.85 (  6%)
 tree loop invariant motion         : 130.54 (  9%)
 tree slp vectorization             : 100.01 (  7%)
 PRE                                : 716.64 ( 52%)

quite some regression on the peak memory usage side here.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
       [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
                   ` (34 preceding siblings ...)
  2022-03-10 15:05 ` rguenth at gcc dot gnu.org
@ 2024-02-20 13:14 ` rguenth at gcc dot gnu.org
  35 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-20 13:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2019-03-05 00:00:00         |2024-2-20

--- Comment #62 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #61)
> r12-7592, first testcase, x86_64:
> 
> -O0:   6s, 1GB
> 
> -O1: 264s, 1.4GB
> 
>  callgraph ipa passes               :  30.47 ( 12%)
>  alias stmt walking                 :  67.44 ( 26%)
>  tree loop invariant motion         : 120.86 ( 46%)
> 
> -O2: 1383s, 11.8GB
> 
>  alias stmt walking                 : 174.01 ( 13%)
>  tree PRE                           :  87.85 (  6%)
>  tree loop invariant motion         : 130.54 (  9%)
>  tree slp vectorization             : 100.01 (  7%)
>  PRE                                : 716.64 ( 52%)
> 
> quite some regression on the peak memory usage side here.

r14-9064

-O2: 683s, 5GB

 alias stmt walking                 :  95.35 ( 14%)
 tree PRE                           :  58.27 (  9%)
 tree loop invariant motion         :  75.47 ( 11%)
 PRE                                : 335.15 ( 49%)

so improvements on the memory usage side.  -O1 didn't change.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
  2009-02-28 15:23 [Bug c/39326] New: " sergstesh at yahoo dot com
                   ` (6 preceding siblings ...)
  2009-03-17 12:59 ` rguenth at gcc dot gnu dot org
@ 2009-03-17 13:10 ` rguenth at gcc dot gnu dot org
  7 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-03-17 13:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from rguenth at gcc dot gnu dot org  2009-03-17 13:09 -------
Btw, it looks like internal IRA data-structures can be shrinked on 64bit
platforms by avoiding padding between pointer and integer members a lot.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
  2009-02-28 15:23 [Bug c/39326] New: " sergstesh at yahoo dot com
                   ` (5 preceding siblings ...)
  2009-03-17 12:42 ` rguenth at gcc dot gnu dot org
@ 2009-03-17 12:59 ` rguenth at gcc dot gnu dot org
  2009-03-17 13:10 ` rguenth at gcc dot gnu dot org
  7 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-03-17 12:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from rguenth at gcc dot gnu dot org  2009-03-17 12:58 -------
For trunk -O1 I see

CPU: AMD64 processors, speed 1000 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 100000
samples  %        symbol name
368029   19.8218  allocno_spill_priority_compare
244104   13.1473  color_pass
205725   11.0802  build_allocno_conflicts
141509    7.6216  push_allocno_to_stack
106750    5.7495  assign_hard_reg
95558     5.1467  bitmap_bit_p
88785     4.7819  add_to_allocno_conflicts
51962     2.7986  splay_tree_splay
42764     2.3032  ira_build_conflicts
40718     2.1930  ira_flattening
27977     1.5068  canon_true_dependence
25608     1.3792  ira_allocno_live_ranges_intersect_p
20403     1.0989  nonoverlapping_memrefs_p
20005     1.0775  ira_add_allocno_conflict


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
  2009-02-28 15:23 [Bug c/39326] New: " sergstesh at yahoo dot com
                   ` (4 preceding siblings ...)
  2009-03-17 11:05 ` rguenth at gcc dot gnu dot org
@ 2009-03-17 12:42 ` rguenth at gcc dot gnu dot org
  2009-03-17 12:59 ` rguenth at gcc dot gnu dot org
  2009-03-17 13:10 ` rguenth at gcc dot gnu dot org
  7 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-03-17 12:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from rguenth at gcc dot gnu dot org  2009-03-17 12:42 -------
Vlad, for the second testcase I see

-O0:

 expand                :   0.78 (19%) usr   0.04 ( 5%) sys   0.83 (16%) wall  
44335 kB (49%) ggc
 integrated RA         :   1.05 (25%) usr   0.03 ( 4%) sys   1.13 (22%) wall   
1036 kB ( 1%) ggc
 reload                :   0.55 (13%) usr   0.02 ( 3%) sys   0.59 (12%) wall   
3585 kB ( 4%) ggc

-O1:

 integrated RA         :  73.95 (77%) usr   1.18 (50%) sys  76.20 (76%) wall   
2898 kB ( 3%) ggc

(on alias improvements branch I see
 combiner              :  24.89 (17%) usr   0.51 (28%) sys  25.41 (17%) wall 
448053 kB (82%) ggc
 integrated RA         :  69.81 (47%) usr   0.36 (20%) sys  70.22 (46%) wall   
3212 kB ( 1%) ggc
as tree loop invariant motion does lots of stuff there)

-O2:

 PRE                   :  25.49 (26%) usr   0.03 ( 3%) sys  25.52 (26%) wall   
3513 kB ( 3%) ggc
 integrated RA         :  36.98 (38%) usr   0.24 (21%) sys  37.31 (38%) wall   
2158 kB ( 2%) ggc


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
  2009-02-28 15:23 [Bug c/39326] New: " sergstesh at yahoo dot com
                   ` (3 preceding siblings ...)
  2009-03-03 14:15 ` sergstesh at yahoo dot com
@ 2009-03-17 11:05 ` rguenth at gcc dot gnu dot org
  2009-03-17 12:42 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-03-17 11:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from rguenth at gcc dot gnu dot org  2009-03-17 11:05 -------
Created an attachment (id=17476)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17476&action=view)
the other testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
  2009-02-28 15:23 [Bug c/39326] New: " sergstesh at yahoo dot com
                   ` (2 preceding siblings ...)
  2009-03-03 13:49 ` rguenther at suse dot de
@ 2009-03-03 14:15 ` sergstesh at yahoo dot com
  2009-03-17 11:05 ` rguenth at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: sergstesh at yahoo dot com @ 2009-03-03 14:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from sergstesh at yahoo dot com  2009-03-03 14:15 -------
(In reply to comment #15)
> Subject: Re:  Segmentation fault with -O1, out of memory
>  with -O2
> 
> On Tue, 3 Mar 2009, sergstesh at yahoo dot com wrote:
> 
> > ------- Comment #14 from sergstesh at yahoo dot com  2009-03-03 13:36 -------
> > 'spiral' has produced another testcase which segfaults with -O2 - the original
> > testcase segfaults with -O1.
> > 
> > The testcase, though has half the points if terms of FFT, is big as a file:
> > 
> > -rw-r--r-- 1 sergei users  1656419 2009-03-03 06:36 gap_CPEodL.c.bz2
> > 
> > , so I think I can't upload it directly, or can I ?
> > 
> > I can send it as Email attachment to, say, rguenth.
> 
> That would be fine.
> 
> Richard.
> 

I have just sent Richard an Email with 'gap_CPEodL.c.bz2' file attached.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
  2009-02-28 15:23 [Bug c/39326] New: " sergstesh at yahoo dot com
  2009-03-02 17:16 ` [Bug middle-end/39326] " pinskia at gcc dot gnu dot org
  2009-03-03 13:36 ` sergstesh at yahoo dot com
@ 2009-03-03 13:49 ` rguenther at suse dot de
  2009-03-03 14:15 ` sergstesh at yahoo dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: rguenther at suse dot de @ 2009-03-03 13:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from rguenther at suse dot de  2009-03-03 13:48 -------
Subject: Re:  Segmentation fault with -O1, out of memory
 with -O2

On Tue, 3 Mar 2009, sergstesh at yahoo dot com wrote:

> ------- Comment #14 from sergstesh at yahoo dot com  2009-03-03 13:36 -------
> 'spiral' has produced another testcase which segfaults with -O2 - the original
> testcase segfaults with -O1.
> 
> The testcase, though has half the points if terms of FFT, is big as a file:
> 
> -rw-r--r-- 1 sergei users  1656419 2009-03-03 06:36 gap_CPEodL.c.bz2
> 
> , so I think I can't upload it directly, or can I ?
> 
> I can send it as Email attachment to, say, rguenth.

That would be fine.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
  2009-02-28 15:23 [Bug c/39326] New: " sergstesh at yahoo dot com
  2009-03-02 17:16 ` [Bug middle-end/39326] " pinskia at gcc dot gnu dot org
@ 2009-03-03 13:36 ` sergstesh at yahoo dot com
  2009-03-03 13:49 ` rguenther at suse dot de
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: sergstesh at yahoo dot com @ 2009-03-03 13:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from sergstesh at yahoo dot com  2009-03-03 13:36 -------
'spiral' has produced another testcase which segfaults with -O2 - the original
testcase segfaults with -O1.

The testcase, though has half the points if terms of FFT, is big as a file:

-rw-r--r-- 1 sergei users  1656419 2009-03-03 06:36 gap_CPEodL.c.bz2

, so I think I can't upload it directly, or can I ?

I can send it as Email attachment to, say, rguenth.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2
  2009-02-28 15:23 [Bug c/39326] New: " sergstesh at yahoo dot com
@ 2009-03-02 17:16 ` pinskia at gcc dot gnu dot org
  2009-03-03 13:36 ` sergstesh at yahoo dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-03-02 17:16 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|major                       |normal


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2024-02-20 13:14 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-39326-4@http.gcc.gnu.org/bugzilla/>
2013-03-06 11:08 ` [Bug middle-end/39326] Segmentation fault with -O1, out of memory with -O2 steven at gcc dot gnu.org
2013-03-06 11:38 ` rguenth at gcc dot gnu.org
2013-03-06 16:50 ` sergstesh at yahoo dot com
2013-03-06 23:39 ` steven at gcc dot gnu.org
2013-03-07  0:08 ` steven at gcc dot gnu.org
2013-03-07  0:27 ` steven at gcc dot gnu.org
2013-03-07  8:10 ` steven at gcc dot gnu.org
2013-03-07  8:45 ` rguenther at suse dot de
2013-03-07  8:48 ` rguenther at suse dot de
2013-03-07  8:53 ` rguenther at suse dot de
2013-03-07  9:58 ` steven at gcc dot gnu.org
2013-03-07 10:14 ` sergstesh at yahoo dot com
2013-03-07 10:15 ` rguenther at suse dot de
2013-03-07 17:14 ` sergstesh at yahoo dot com
2013-03-07 17:31 ` steven at gcc dot gnu.org
2013-03-07 17:34 ` steven at gcc dot gnu.org
2013-03-07 21:48 ` sergstesh at yahoo dot com
2013-03-07 22:16 ` steven at gcc dot gnu.org
2013-03-07 23:19 ` steven at gcc dot gnu.org
2013-03-08  9:14 ` rguenth at gcc dot gnu.org
2013-03-08  9:14 ` rguenth at gcc dot gnu.org
2013-03-08  9:23 ` rguenther at suse dot de
2013-03-09 14:58 ` steven at gcc dot gnu.org
2013-03-09 17:26 ` steven at gcc dot gnu.org
2013-03-11  9:40 ` steven at gcc dot gnu.org
2013-03-12 10:47 ` rguenth at gcc dot gnu.org
2013-03-12 14:05 ` rguenth at gcc dot gnu.org
2013-03-15 16:07 ` rguenth at gcc dot gnu.org
2013-03-18  8:44 ` rguenth at gcc dot gnu.org
2013-03-21 20:04 ` jakub at gcc dot gnu.org
2013-03-22 14:01 ` rguenth at gcc dot gnu.org
2013-03-26 10:10 ` rguenth at gcc dot gnu.org
2014-01-15 15:14 ` rguenth at gcc dot gnu.org
2014-01-17 13:53 ` rguenth at gcc dot gnu.org
2022-03-10 15:05 ` rguenth at gcc dot gnu.org
2024-02-20 13:14 ` rguenth at gcc dot gnu.org
2009-02-28 15:23 [Bug c/39326] New: " sergstesh at yahoo dot com
2009-03-02 17:16 ` [Bug middle-end/39326] " pinskia at gcc dot gnu dot org
2009-03-03 13:36 ` sergstesh at yahoo dot com
2009-03-03 13:49 ` rguenther at suse dot de
2009-03-03 14:15 ` sergstesh at yahoo dot com
2009-03-17 11:05 ` rguenth at gcc dot gnu dot org
2009-03-17 12:42 ` rguenth at gcc dot gnu dot org
2009-03-17 12:59 ` rguenth at gcc dot gnu dot org
2009-03-17 13:10 ` rguenth at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).