public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/33922]  New: [4.3 Regression] slow compilation on ia64
@ 2007-10-27 15:40 tbm at cyrius dot com
  2007-10-27 15:48 ` [Bug tree-optimization/33922] " rguenth at gcc dot gnu dot org
                   ` (26 more replies)
  0 siblings, 27 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 15:40 UTC (permalink / raw)
  To: gcc-bugs

The following program take about 30 seconds to compile on IA64 with -O3 and
trunk, but took less than a second with 4.2.  On x86_x86 it take about 3
seconds.

(sid)tbm@coconut0:~$ time /usr/lib/gcc-snapshot/bin/gcc -c -O3 slow.c

real    0m32.572s
user    0m18.838s
sys     0m0.049s
(sid)tbm@coconut0:~$ time gcc-4.2 -c -O3 slow.c

real    0m0.696s
user    0m0.062s
sys     0m0.022s
(sid)tbm@coconut0:~$


-- 
           Summary: [4.3 Regression] slow compilation on ia64
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tbm at cyrius dot com
GCC target triplet: ia64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
@ 2007-10-27 15:48 ` rguenth at gcc dot gnu dot org
  2007-10-27 16:03 ` tbm at cyrius dot com
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-10-27 15:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2007-10-27 15:48 -------
-ftime-report output please?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
  2007-10-27 15:48 ` [Bug tree-optimization/33922] " rguenth at gcc dot gnu dot org
@ 2007-10-27 16:03 ` tbm at cyrius dot com
  2007-10-27 16:07 ` tbm at cyrius dot com
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 16:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from tbm at cyrius dot com  2007-10-27 16:03 -------
compile times:

  20070303  0m25.928s
  20070422  0m8.723s
  20070515  0m7.345s
  20070613  0m8.996s
  20070811  0m8.172s
  20070916  0m24.503s
  20071020  0m34.445s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
  2007-10-27 15:48 ` [Bug tree-optimization/33922] " rguenth at gcc dot gnu dot org
  2007-10-27 16:03 ` tbm at cyrius dot com
@ 2007-10-27 16:07 ` tbm at cyrius dot com
  2007-10-27 16:14 ` tbm at cyrius dot com
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 16:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from tbm at cyrius dot com  2007-10-27 16:07 -------
(In reply to comment #1)
> -ftime-report output please?

(sid)tbm@coconut0:~/x$ /usr/lib/gcc-snapshot/bin/gcc -c -O3 -ftime-report
slow.c

Execution times (seconds)
 garbage collection    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.32 ( 1%) wall   
   0 kB ( 0%) ggc
 callgraph construction:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
  13 kB ( 0%) ggc
 callgraph optimization:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   2 kB ( 0%) ggc
 CFG verifier          :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 df live regs          :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   0 kB ( 0%) ggc
 df live&initialized regs:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 df reg dead/unused notes:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
   142 kB ( 1%) ggc
 register information  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 alias analysis        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
 224 kB ( 2%) ggc
 rebuild jump labels   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 1%) wall   
   0 kB ( 0%) ggc
 parser                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall   
  83 kB ( 1%) ggc
 tree gimplify         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
  14 kB ( 0%) ggc
 tree CFG construction :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
  23 kB ( 0%) ggc
 tree CFG cleanup      :   0.00 ( 0%) usr   0.00 ( 2%) sys   0.02 ( 0%) wall   
1018 kB ( 8%) ggc
 tree VRP              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
 132 kB ( 1%) ggc
 tree reassociation    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree PRE              :   0.39 ( 2%) usr   0.00 ( 4%) sys   0.41 ( 2%) wall   
1052 kB ( 8%) ggc
 tree conservative DCE :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 predictive commoning  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 tree SSA to normal    :   0.06 ( 0%) usr   0.00 ( 4%) sys   0.06 ( 0%) wall   
1010 kB ( 8%) ggc
 tree SSA verifier     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
  10 kB ( 0%) ggc
 tree STMT verifier    :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   0 kB ( 0%) ggc
 expand                :   0.08 ( 0%) usr   0.00 ( 2%) sys   0.77 ( 3%) wall   
1163 kB ( 9%) ggc
 jump                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 CSE                   :   0.03 ( 0%) usr   0.00 ( 2%) sys   0.04 ( 0%) wall   
   1 kB ( 0%) ggc
 dead store elim1      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
 129 kB ( 1%) ggc
 dead store elim2      :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 1%) wall   
 267 kB ( 2%) ggc
 CPROP 2               :   0.01 ( 0%) usr   0.00 ( 2%) sys   0.01 ( 0%) wall   
 132 kB ( 1%) ggc
 bypass jumps          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
 130 kB ( 1%) ggc
 CSE 2                 :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
   1 kB ( 0%) ggc
 branch prediction     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 combiner              :   0.82 ( 4%) usr   0.00 ( 0%) sys   0.91 ( 3%) wall   
 452 kB ( 3%) ggc
 if-conversion         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
 352 kB ( 3%) ggc
 regmove               :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 scheduling            :   1.32 ( 7%) usr   0.00 ( 2%) sys   1.55 ( 6%) wall   
 194 kB ( 1%) ggc
 local alloc           :   0.14 ( 1%) usr   0.00 ( 0%) sys   0.14 ( 1%) wall   
  50 kB ( 0%) ggc
 global alloc          :   0.54 ( 3%) usr   0.00 ( 9%) sys   0.78 ( 3%) wall   
2537 kB (19%) ggc
 reload CSE regs       :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall   
 584 kB ( 4%) ggc
 load CSE after reload :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
   0 kB ( 0%) ggc
 thread pro- & epilogue:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
  24 kB ( 0%) ggc
 rename registers      :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
   0 kB ( 0%) ggc
 scheduling 2          :  14.45 (78%) usr   0.03 (65%) sys  19.36 (74%) wall   
2099 kB (16%) ggc
 machine dep reorg     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 final                 :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 1%) wall   
   0 kB ( 0%) ggc
 symout                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 TOTAL                 :  18.63             0.04            26.28             
13034 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (2 preceding siblings ...)
  2007-10-27 16:07 ` tbm at cyrius dot com
@ 2007-10-27 16:14 ` tbm at cyrius dot com
  2007-10-27 16:44 ` tbm at cyrius dot com
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 16:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from tbm at cyrius dot com  2007-10-27 16:13 -------
Maybe something for Maxim to look at?


-- 

tbm at cyrius dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mkuvyrkov at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (3 preceding siblings ...)
  2007-10-27 16:14 ` tbm at cyrius dot com
@ 2007-10-27 16:44 ` tbm at cyrius dot com
  2007-10-27 17:51 ` tbm at cyrius dot com
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 16:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from tbm at cyrius dot com  2007-10-27 16:44 -------
Oops, I forgot to add the testcase:

typedef enum
{
  ST_TiemanStyle,
}
BrailleDisplay;
static int pendingCommand;
static int currentModifiers;
typedef struct
{
  int (*updateKeys) (BrailleDisplay * brl, int *keyPressed);
}
ProtocolOperations;
static const ProtocolOperations *protocol;
brl_readCommand (BrailleDisplay * brl)
{
  unsigned long int keys;
  int command;
  int keyPressed;
  unsigned char routingKeys[200];
  int routingKeyCount;
  signed char rightVerticalSensor;
  if (pendingCommand != (-1))
    {
      return command;
    }
  if (!protocol->updateKeys (brl, &keyPressed))
    {
      if (rightVerticalSensor >= 0)
        keys |= 1;
      if ((routingKeyCount == 0) && keys)
        {
          if (currentModifiers)
            {
            doChord:switch (keys);
            }
          else
            {
            doCharacter:
              command = 0X2200;
              if (keys & 0X01UL)
                command |= 0001;
              if (keys & 0X02UL)
                command |= 0002;
              if (keys & 0X04UL)
                command |= 0004;
              if (keys & 0X08UL)
                command |= 0010;
              if (keys & 0X10UL)
                command |= 0020;
              if (keys & 0X20UL)
                command |= 0040;
              if (currentModifiers & (0X0010 | 0X0200))
                command |= 0100;
              if (currentModifiers & 0X0040)
                command |= 0200;
              if (currentModifiers & 0X0100)
                command |= 0X020000;
              if (currentModifiers & 0X0400)
                command |= 0X080000;
              if (currentModifiers & 0X0800)
                command |= 0X040000;
            }
          unsigned char key1 = routingKeys[0];
          if (key1 == 0)
            {
            }
          if (key1 == 1)
            if (keys)
              {
                currentModifiers |= 0X0010;
                goto doCharacter;
              }
        }
    }
  return command;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (4 preceding siblings ...)
  2007-10-27 16:44 ` tbm at cyrius dot com
@ 2007-10-27 17:51 ` tbm at cyrius dot com
  2007-10-27 17:52 ` tbm at cyrius dot com
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 17:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from tbm at cyrius dot com  2007-10-27 17:50 -------
As a comparison, here is what I get with 20070811:

(sid)tbm@coconut0:~/x$ /usr/lib/gcc-snapshot/bin/gcc -c -O3 -ftime-report
slow.c

Execution times (seconds)
 garbage collection    :   0.06 ( 2%) usr   0.00 ( 0%) sys   0.43 ( 5%) wall   
   0 kB ( 0%) ggc
 CFG verifier          :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
   0 kB ( 0%) ggc
 df live regs          :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 df use-def / def-use chains:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 1%)
wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.01 ( 0%) usr   0.00 ( 2%) sys   0.01 ( 0%) wall 
   198 kB ( 2%) ggc
 register information  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
   0 kB ( 0%) ggc
 alias analysis        :   0.03 ( 1%) usr   0.00 ( 0%) sys   0.15 ( 2%) wall   
 224 kB ( 2%) ggc
 parser                :   0.00 ( 0%) usr   0.00 ( 8%) sys   0.01 ( 0%) wall   
  81 kB ( 1%) ggc
 tree VRP              :   0.01 ( 0%) usr   0.00 ( 3%) sys   0.01 ( 0%) wall   
 132 kB ( 1%) ggc
 tree operand scan     :   0.01 ( 0%) usr   0.00 ( 3%) sys   0.01 ( 0%) wall   
 106 kB ( 1%) ggc
 tree PRE              :   0.41 (13%) usr   0.00 ( 3%) sys   1.00 (11%) wall   
1052 kB ( 9%) ggc
 tree SSA to normal    :   0.08 ( 3%) usr   0.00 ( 2%) sys   0.32 ( 3%) wall   
1023 kB ( 8%) ggc
 tree SSA verifier     :   0.03 ( 1%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
  10 kB ( 0%) ggc
 tree STMT verifier    :   0.04 ( 1%) usr   0.00 ( 0%) sys   0.24 ( 3%) wall   
   0 kB ( 0%) ggc
 expand                :   0.02 ( 1%) usr   0.01 (12%) sys   0.03 ( 0%) wall   
 571 kB ( 5%) ggc
 CSE                   :   0.04 ( 1%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   1 kB ( 0%) ggc
 dead code elimination :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 dead store elim2      :   0.04 ( 1%) usr   0.00 ( 0%) sys   0.05 ( 1%) wall   
 122 kB ( 1%) ggc
 CPROP 1               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 1%) wall   
  97 kB ( 1%) ggc
 CPROP 2               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
 131 kB ( 1%) ggc
 bypass jumps          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
 130 kB ( 1%) ggc
 CSE 2                 :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   1 kB ( 0%) ggc
 combiner              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
  23 kB ( 0%) ggc
 if-conversion         :   0.04 ( 1%) usr   0.00 ( 0%) sys   0.16 ( 2%) wall   
   0 kB ( 0%) ggc
 regmove               :   0.04 ( 1%) usr   0.00 ( 3%) sys   0.13 ( 1%) wall   
   0 kB ( 0%) ggc
 scheduling            :   0.40 (12%) usr   0.00 ( 5%) sys   1.17 (13%) wall   
  61 kB ( 1%) ggc
 local alloc           :   0.03 ( 1%) usr   0.00 ( 0%) sys   0.15 ( 2%) wall   
 162 kB ( 1%) ggc
 global alloc          :   0.35 (11%) usr   0.01 ( 9%) sys   1.03 (11%) wall   
2694 kB (22%) ggc
 reload CSE regs       :   0.22 ( 7%) usr   0.00 ( 2%) sys   0.67 ( 7%) wall   
 686 kB ( 6%) ggc
 load CSE after reload :   0.07 ( 2%) usr   0.00 ( 2%) sys   0.18 ( 2%) wall   
   0 kB ( 0%) ggc
 rename registers      :   0.07 ( 2%) usr   0.00 ( 0%) sys   0.22 ( 2%) wall   
   3 kB ( 0%) ggc
 scheduling 2          :   1.02 (31%) usr   0.01 (11%) sys   2.50 (27%) wall   
1192 kB (10%) ggc
 machine dep reorg     :   0.03 ( 1%) usr   0.00 ( 2%) sys   0.04 ( 0%) wall   
   1 kB ( 0%) ggc
 final                 :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 TOTAL                 :   3.24             0.06             9.11             
12164 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (5 preceding siblings ...)
  2007-10-27 17:51 ` tbm at cyrius dot com
@ 2007-10-27 17:52 ` tbm at cyrius dot com
  2007-10-27 18:01 ` [Bug middle-end/33922] " pinskia at gcc dot gnu dot org
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 17:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from tbm at cyrius dot com  2007-10-27 17:52 -------
So scheduling 2 has gone from 2.5 to 19.36 seconds from 20070811 to
20071020 (both with checking enabled).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (6 preceding siblings ...)
  2007-10-27 17:52 ` tbm at cyrius dot com
@ 2007-10-27 18:01 ` pinskia at gcc dot gnu dot org
  2007-10-27 18:08 ` tbm at cyrius dot com
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-10-27 18:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from pinskia at gcc dot gnu dot org  2007-10-27 18:00 -------
> Extra diagnostic checks enabled; compiler may run slowly.
> Configure with --enable-checking=release to disable checks.

We added this message for a reason, seems like you should try that for first. 
The release branches defaults to --enable-checking=release.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
          Component|tree-optimization           |middle-end


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (7 preceding siblings ...)
  2007-10-27 18:01 ` [Bug middle-end/33922] " pinskia at gcc dot gnu dot org
@ 2007-10-27 18:08 ` tbm at cyrius dot com
  2007-10-27 18:10   ` Andrew Pinski
  2007-10-27 18:10 ` pinskia at gmail dot com
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 18:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from tbm at cyrius dot com  2007-10-27 18:08 -------
(In reply to comment #8)
> > Extra diagnostic checks enabled; compiler may run slowly.
> > Configure with --enable-checking=release to disable checks.
> 
> We added this message for a reason, seems like you should try that for first. 
> The release branches defaults to --enable-checking=release.

Well, I showed that even with checking enabled the compiler was _much_ faster
2 months ago.  But, ok, I'll try with checking disabled too.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 18:08 ` tbm at cyrius dot com
@ 2007-10-27 18:10   ` Andrew Pinski
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Pinski @ 2007-10-27 18:10 UTC (permalink / raw)
  To: gcc-bugzilla; +Cc: gcc-bugs

On 27 Oct 2007 18:08:21 -0000, tbm at cyrius dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> Well, I showed that even with checking enabled the compiler was _much_ faster
> 2 months ago.  But, ok, I'll try with checking disabled too.

Well someone (maybe DF) could have added a lot of checking.

-- Pinski


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (8 preceding siblings ...)
  2007-10-27 18:08 ` tbm at cyrius dot com
@ 2007-10-27 18:10 ` pinskia at gmail dot com
  2007-10-27 18:13 ` tbm at cyrius dot com
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: pinskia at gmail dot com @ 2007-10-27 18:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from pinskia at gmail dot com  2007-10-27 18:10 -------
Subject: Re:  [4.3 Regression] slow compilation on ia64

On 27 Oct 2007 18:08:21 -0000, tbm at cyrius dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> Well, I showed that even with checking enabled the compiler was _much_ faster
> 2 months ago.  But, ok, I'll try with checking disabled too.

Well someone (maybe DF) could have added a lot of checking.

-- Pinski


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (9 preceding siblings ...)
  2007-10-27 18:10 ` pinskia at gmail dot com
@ 2007-10-27 18:13 ` tbm at cyrius dot com
  2007-10-27 18:53 ` tbm at cyrius dot com
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 18:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from tbm at cyrius dot com  2007-10-27 18:13 -------
(In reply to comment #10)
> Well someone (maybe DF) could have added a lot of checking.

OK, good point.

I'll report my findings in a few hours.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (10 preceding siblings ...)
  2007-10-27 18:13 ` tbm at cyrius dot com
@ 2007-10-27 18:53 ` tbm at cyrius dot com
  2007-10-27 18:59 ` [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling) pinskia at gcc dot gnu dot org
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 18:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from tbm at cyrius dot com  2007-10-27 18:53 -------
Same results without checking (actually, even slower - is that possible?):

(sid)tbm@coconut0:~/tmp/gcc/gcc-4.3-20071027-r129674-no-checking/gcc$ ./xgcc
-B. -ftime-report -O3 -c ~/slow.c

Execution times (seconds)
 df live regs          :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 df live&initialized regs:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 df reg dead/unused notes:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
   142 kB ( 1%) ggc
 register information  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall   
   0 kB ( 0%) ggc
 alias analysis        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
 224 kB ( 2%) ggc
 tree VRP              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall   
 132 kB ( 1%) ggc
 tree PRE              :   0.37 ( 2%) usr   0.00 ( 3%) sys   0.64 ( 1%) wall   
1052 kB ( 8%) ggc
 tree SSA to normal    :   0.06 ( 0%) usr   0.00 ( 3%) sys   0.06 ( 0%) wall   
1010 kB ( 8%) ggc
 expand                :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
1182 kB ( 9%) ggc
 CSE                   :   0.03 ( 0%) usr   0.00 ( 7%) sys   0.14 ( 0%) wall   
   1 kB ( 0%) ggc
 dead store elim2      :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
 267 kB ( 2%) ggc
 CPROP 2               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
 132 kB ( 1%) ggc
 bypass jumps          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
 130 kB ( 1%) ggc
 CSE 2                 :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 1%) wall   
   0 kB ( 0%) ggc
 combiner              :   0.81 ( 4%) usr   0.00 ( 3%) sys   1.77 ( 3%) wall   
 452 kB ( 4%) ggc
 if-conversion         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
 352 kB ( 3%) ggc
 regmove               :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 scheduling            :   1.34 ( 6%) usr   0.00 ( 0%) sys   3.53 ( 7%) wall   
 194 kB ( 2%) ggc
 local alloc           :   0.14 ( 1%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall   
  50 kB ( 0%) ggc
 global alloc          :   0.53 ( 2%) usr   0.00 ( 3%) sys   0.70 ( 1%) wall   
2537 kB (20%) ggc
 reload CSE regs       :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.24 ( 0%) wall   
 584 kB ( 5%) ggc
 load CSE after reload :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   0 kB ( 0%) ggc
 rename registers      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 scheduling 2          :  18.96 (83%) usr   0.02 (66%) sys  43.24 (84%) wall   
1970 kB (15%) ggc
 final                 :   0.02 ( 0%) usr   0.00 ( 3%) sys   0.12 ( 0%) wall   
   0 kB ( 0%) ggc
 TOTAL                 :  22.83             0.03            51.54             
12913 kB


-- 

tbm at cyrius dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |UNCONFIRMED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (11 preceding siblings ...)
  2007-10-27 18:53 ` tbm at cyrius dot com
@ 2007-10-27 18:59 ` pinskia at gcc dot gnu dot org
  2007-10-27 19:27 ` tbm at cyrius dot com
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-10-27 18:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from pinskia at gcc dot gnu dot org  2007-10-27 18:59 -------
What happens if you compile with -O3 -fno-tree-vectorize ?


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pinskia at gcc dot gnu dot
                   |                            |org
          Component|middle-end                  |rtl-optimization
            Summary|[4.3 Regression] slow       |[4.3 Regression] slow
                   |compilation on ia64         |compilation on ia64
                   |                            |(postreload scheduling)
   Target Milestone|---                         |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (12 preceding siblings ...)
  2007-10-27 18:59 ` [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling) pinskia at gcc dot gnu dot org
@ 2007-10-27 19:27 ` tbm at cyrius dot com
  2007-10-28 19:11 ` jakub at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 19:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from tbm at cyrius dot com  2007-10-27 19:27 -------
(In reply to comment #13)
> What happens if you compile with -O3 -fno-tree-vectorize ?

It's still slow:

(sid)tbm@coconut0:~/tmp/gcc/gcc-4.3-20071027-r129674-no-checking/gcc$ ./xgcc
-B. -ftime-report -O3 -fno-tree-vectorize  -c ~/slow.c

Execution times (seconds)
 callgraph construction:   0.00 ( 0%) usr   0.00 ( 2%) sys   0.07 ( 0%) wall   
  13 kB ( 0%) ggc
 callgraph optimization:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
   2 kB ( 0%) ggc
 df reaching defs      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall   
   0 kB ( 0%) ggc
 df live regs          :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
   0 kB ( 0%) ggc
 df live&initialized regs:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 df reg dead/unused notes:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
   142 kB ( 1%) ggc
 register information  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 alias analysis        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
 224 kB ( 2%) ggc
 parser                :   0.00 ( 0%) usr   0.00 ( 1%) sys   0.04 ( 0%) wall   
  83 kB ( 1%) ggc
 inline heuristics     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree gimplify         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
  14 kB ( 0%) ggc
 tree CFG construction :   0.00 ( 0%) usr   0.00 ( 1%) sys   0.02 ( 0%) wall   
  23 kB ( 0%) ggc
 tree CFG cleanup      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
1018 kB ( 8%) ggc
 tree VRP              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
 132 kB ( 1%) ggc
 tree copy propagation :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
  24 kB ( 0%) ggc
 tree PRE              :   0.37 ( 2%) usr   0.00 ( 0%) sys   0.47 ( 1%) wall   
1052 kB ( 8%) ggc
 tree SSA to normal    :   0.06 ( 0%) usr   0.00 ( 1%) sys   0.06 ( 0%) wall   
1010 kB ( 8%) ggc
 expand                :   0.04 ( 0%) usr   0.00 ( 2%) sys   0.37 ( 1%) wall   
1182 kB ( 9%) ggc
 forward prop          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   2 kB ( 0%) ggc
 CSE                   :   0.03 ( 0%) usr   0.00 ( 1%) sys   0.03 ( 0%) wall   
   1 kB ( 0%) ggc
 dead store elim2      :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
 267 kB ( 2%) ggc
 CPROP 2               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
 132 kB ( 1%) ggc
 bypass jumps          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
 130 kB ( 1%) ggc
 CSE 2                 :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall   
   0 kB ( 0%) ggc
 branch prediction     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 combiner              :   0.82 ( 3%) usr   0.00 ( 0%) sys   1.66 ( 4%) wall   
 452 kB ( 4%) ggc
 if-conversion         :   0.02 ( 0%) usr   0.00 ( 1%) sys   0.03 ( 0%) wall   
 352 kB ( 3%) ggc
 regmove               :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 scheduling            :   1.34 ( 5%) usr   0.00 ( 0%) sys   2.99 ( 7%) wall   
 194 kB ( 2%) ggc
 local alloc           :   0.14 ( 1%) usr   0.00 ( 0%) sys   0.34 ( 1%) wall   
  50 kB ( 0%) ggc
 global alloc          :   0.53 ( 2%) usr   0.00 ( 1%) sys   1.15 ( 3%) wall   
2537 kB (20%) ggc
 reload CSE regs       :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.36 ( 1%) wall   
 584 kB ( 5%) ggc
 load CSE after reload :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall   
   0 kB ( 0%) ggc
 if-conversion 2       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall   
   0 kB ( 0%) ggc
 rename registers      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 scheduling 2          :  20.44 (84%) usr   0.08 (84%) sys  31.73 (79%) wall   
1970 kB (15%) ggc
 final                 :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   0 kB ( 0%) ggc
 TOTAL                 :  24.34             0.10            40.40             
12913 kB


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (13 preceding siblings ...)
  2007-10-27 19:27 ` tbm at cyrius dot com
@ 2007-10-28 19:11 ` jakub at gcc dot gnu dot org
  2007-10-28 19:42 ` jakub at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-28 19:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from jakub at gcc dot gnu dot org  2007-10-28 19:10 -------
Compared to 20070803 with -O3 -fno-tree-vectorize there are now 100 times more
calls to rtx_needs_barrier and 44 times more calls to
safe_group_barrier_needed.
E.g. the latter is horribly expensive, e.g. copying around 401 * sizeof (struct
reg_write_state) == 1604 bytes several times.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (14 preceding siblings ...)
  2007-10-28 19:11 ` jakub at gcc dot gnu dot org
@ 2007-10-28 19:42 ` jakub at gcc dot gnu dot org
  2007-10-28 20:04 ` maxim at codesourcery dot com
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-28 19:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from jakub at gcc dot gnu dot org  2007-10-28 19:42 -------
Haven't analyzed why exactly there are so many more safe_group_barrier_needed
calls, but they are certainly much more common than direct group_barrier_needed
calls on this testcase (14579701 safe_group_barrier_needed calls,
14604168 group_barrier_needed calls).  But if so, the only thing that call
cares about is the return value, all the state is thrown away.  From what I see
the need_barrier retval is ored together from all the recursive calls, couldn't
we gain something by just returning 1 immediately whenever one of the recursive
calls returned non-zero?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (15 preceding siblings ...)
  2007-10-28 19:42 ` jakub at gcc dot gnu dot org
@ 2007-10-28 20:04 ` maxim at codesourcery dot com
  2007-10-28 20:21 ` jakub at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: maxim at codesourcery dot com @ 2007-10-28 20:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from maxim at codesourcery dot com  2007-10-28 20:04 -------
Subject: Re:  [4.3 Regression] slow compilation
 on ia64 (postreload scheduling)

jakub at gcc dot gnu dot org wrote:
> ------- Comment #15 from jakub at gcc dot gnu dot org  2007-10-28 19:10 -------
> Compared to 20070803 with -O3 -fno-tree-vectorize there are now 100 times more
> calls to rtx_needs_barrier and 44 times more calls to
> safe_group_barrier_needed.
> E.g. the latter is horribly expensive, e.g. copying around 401 * sizeof (struct
> reg_write_state) == 1604 bytes several times.

The underlying problem is that list of ready to schedule instructions 
now became larger than it was before and the scheduler tends to slow 
down with the size of the list growing.  There already is a workaround 
for this problem (limiting ready list in case it is too large; see 
PARAM_MAX_SCHED_READY_INSNS) but it doesn't seem to do best in this case.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (16 preceding siblings ...)
  2007-10-28 20:04 ` maxim at codesourcery dot com
@ 2007-10-28 20:21 ` jakub at gcc dot gnu dot org
  2007-10-28 20:43 ` jakub at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-28 20:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from jakub at gcc dot gnu dot org  2007-10-28 20:20 -------
Created an attachment (id=14429)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14429&action=view)
rws_insn.patch

Just a side note.  Maintaining the rws_insn array seems to be horribly
expensive to me, and for each regno only one bit is actually used just to check
one gcc_assert and only two regnos are actually checked in some other code.  So
memsetting and maintaining a 1604 bytes long array all the time seems to be an
overkill - a bitmap can do just fine or, if we just remove that gcc_assert
when not ENABLE_CHECKING, we need just 2 bits altogether instead of those 1604
bytes.
Doesn't help much on this testcase (as it is not addressing the algorithmic
issue), but is already noticeable.
 scheduling 2          :  10.60 (88%) usr   0.00 ( 0%) sys  10.60 (88%) wall   
1970 kB (15%) ggc
went down to
 scheduling 2          :   8.99 (86%) usr   0.01 (50%) sys   9.00 (86%) wall   
1970 kB (15%) ggc
with this patch and --enable-checking=release, so about 14% speedup in wall
time for the whole compilation of this file.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (17 preceding siblings ...)
  2007-10-28 20:21 ` jakub at gcc dot gnu dot org
@ 2007-10-28 20:43 ` jakub at gcc dot gnu dot org
  2007-10-28 21:11 ` jakub at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-28 20:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from jakub at gcc dot gnu dot org  2007-10-28 20:42 -------
Another trivial patch that improves speed is:
--- ia64.c      (revision 129700)
+++ ia64.c      (working copy)
@@ -5310,11 +5310,11 @@ ia64_safe_type (rtx insn)

 struct reg_write_state
 {
-  unsigned int write_count : 2;
-  unsigned int first_pred : 16;
-  unsigned int written_by_fp : 1;
-  unsigned int written_by_and : 1;
-  unsigned int written_by_or : 1;
+  unsigned short write_count : 2;
+  unsigned short first_pred : 10;
+  unsigned short written_by_fp : 1;
+  unsigned short written_by_and : 1;
+  unsigned short written_by_or : 1;
 };

 /* Cumulative info for the current instruction group.  */

which cuts the size of rws_sum and rws_saved arrays into half (1604 to 802
bytes)
and with both patches in I get:
 scheduling 2          :   6.86 (82%) usr   0.01 (50%) sys   6.87 (82%) wall   
1970 kB (15%) ggc

or 31% speedup in wall time both patches together.  first_pred is either 0 or
PR_REG(0) through PR_REG(63), so it certainly fits into 10 bit bitfield.  If
needed it would fit even into 6 bit (as when pred == 0, write_count will be
already 2 and we could subtract PR_REG(0) from it), but that's still too big to
squeeze it into 1 byte per register.

Even when this bug is fixed for real, both changes IMHO make sense anyway (the
first patch could perhaps use some cleanup, nice macros to hide it or
something).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (18 preceding siblings ...)
  2007-10-28 20:43 ` jakub at gcc dot gnu dot org
@ 2007-10-28 21:11 ` jakub at gcc dot gnu dot org
  2007-10-29  8:43 ` jakub at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-28 21:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from jakub at gcc dot gnu dot org  2007-10-28 21:11 -------
Actually, we don't probably need to write to rws_sum array at all when in
safe_group_barried_needed and then we wouldn't need to copy it around (save and
restore it) at all.

--- config/ia64/ia64.c~ 2007-10-28 22:00:24.000000000 +0100
+++ config/ia64/ia64.c  2007-10-28 22:04:26.000000000 +0100
@@ -5353,6 +5353,7 @@ static int rtx_needs_barrier (rtx, struc
 static void init_insn_group_barriers (void);
 static int group_barrier_needed (rtx);
 static int safe_group_barrier_needed (rtx);
+static int in_safe_group_barrier;

 /* Update *RWS for REGNO, which is being written by the current instruction,
    with predicate PRED, and associated register flags in FLAGS.  */
@@ -5407,7 +5408,8 @@ rws_access_regno (int regno, struct reg_
        {
        case 0:
          /* The register has not been written yet.  */
-         rws_update (regno, flags, pred);
+         if (!in_safe_group_barrier)
+           rws_update (regno, flags, pred);
          break;

        case 1:
@@ -5421,7 +5423,8 @@ rws_access_regno (int regno, struct reg_
            ;
          else if ((rws_sum[regno].first_pred ^ 1) != pred)
            need_barrier = 1;
-         rws_update (regno, flags, pred);
+         if (!in_safe_group_barrier)
+           rws_update (regno, flags, pred);
          break;

        case 2:
@@ -5433,8 +5436,11 @@ rws_access_regno (int regno, struct reg_
            ;
          else
            need_barrier = 1;
-         rws_sum[regno].written_by_and = flags.is_and;
-         rws_sum[regno].written_by_or = flags.is_or;
+          if (!in_safe_group_barrier)
+           {
+             rws_sum[regno].written_by_and = flags.is_and;
+             rws_sum[regno].written_by_or = flags.is_or;
+           }
          break;

        default:
@@ -6099,17 +6105,16 @@ int safe_group_barrier_needed_cnt[5];
 static int
 safe_group_barrier_needed (rtx insn)
 {
-  struct reg_write_state rws_saved[NUM_REGS];
   int saved_first_instruction;
   int t;

-  memcpy (rws_saved, rws_sum, NUM_REGS * sizeof *rws_saved);
   saved_first_instruction = first_instruction;
+  in_safe_group_barrier = 1;

   t = group_barrier_needed (insn);

-  memcpy (rws_sum, rws_saved, NUM_REGS * sizeof *rws_saved);
   first_instruction = saved_first_instruction;
+  in_safe_group_barrier = 0;

   return t;
 }

together with the other patches gives (everything is x86_64-linux -> ia64-linux
cross, would need to measure it on ia64-linux native) 
 scheduling 2          :   5.20 (78%) usr   0.01 (50%) sys   5.20 (77%) wall   
1970 kB (15%) ggc

or ~ 45% speedup on this testcase.


-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilson at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (19 preceding siblings ...)
  2007-10-28 21:11 ` jakub at gcc dot gnu dot org
@ 2007-10-29  8:43 ` jakub at gcc dot gnu dot org
  2007-11-01 20:59 ` jakub at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-29  8:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from jakub at gcc dot gnu dot org  2007-10-29 08:43 -------
Created an attachment (id=14433)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14433&action=view)
gcc43-ia64-rws-speedups.patch

All 3 patches together, with macros.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (20 preceding siblings ...)
  2007-10-29  8:43 ` jakub at gcc dot gnu dot org
@ 2007-11-01 20:59 ` jakub at gcc dot gnu dot org
  2007-11-01 21:02 ` rguenther at suse dot de
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-01 20:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from jakub at gcc dot gnu dot org  2007-11-01 20:59 -------
The most important cause of the slowdown e.g. compared to 4.2.x is the totally
insane thing -ftree-pre creates though.
For -O3 -fno-tree-vectorize -fdump-tree-all pr33922.c
wc -l shows
2361 pr33922.c.090t.sink
while for -O3 -fno-tree-vectorize -fno-tree-pre -fdump-tree-all pr33922.c
324 pr33922.c.090t.sink
and of course the size of assembly corresponds to this:
11400 pr33922.s # -O3 -fno-tree-vectorize
195 pr33922.s # -O3 -fno-tree-vectorize -fno-tree-pre

-O3 -fno-tree-vectorize -fdump-tree-pre-all dump contains
2081 ^Created.*value lines and all those constants are actually created and
many PHI nodes as well.  I believe this might be what nickc was trying to fix
today by adding a limit, but wasn't that limit huge (131072 bits)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (21 preceding siblings ...)
  2007-11-01 20:59 ` jakub at gcc dot gnu dot org
@ 2007-11-01 21:02 ` rguenther at suse dot de
  2007-11-03 10:48 ` ebotcazou at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: rguenther at suse dot de @ 2007-11-01 21:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from rguenther at suse dot de  2007-11-01 21:01 -------
Subject: Re:  [4.3 Regression] slow compilation
 on ia64 (postreload scheduling)

On Thu, 1 Nov 2007, jakub at gcc dot gnu dot org wrote:

> ------- Comment #22 from jakub at gcc dot gnu dot org  2007-11-01 20:59 -------
> The most important cause of the slowdown e.g. compared to 4.2.x is the totally
> insane thing -ftree-pre creates though.
> For -O3 -fno-tree-vectorize -fdump-tree-all pr33922.c
> wc -l shows
> 2361 pr33922.c.090t.sink
> while for -O3 -fno-tree-vectorize -fno-tree-pre -fdump-tree-all pr33922.c
> 324 pr33922.c.090t.sink
> and of course the size of assembly corresponds to this:
> 11400 pr33922.s # -O3 -fno-tree-vectorize
> 195 pr33922.s # -O3 -fno-tree-vectorize -fno-tree-pre
> 
> -O3 -fno-tree-vectorize -fdump-tree-pre-all dump contains
> 2081 ^Created.*value lines and all those constants are actually created and
> many PHI nodes as well.  I believe this might be what nickc was trying to fix
> today by adding a limit, but wasn't that limit huge (131072 bits)?

The limit was to cut off exponential behavior.  But yes, PRE (and more 
PPRE) is known to increase code-size.  Looks like some better heuristics
are needed.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (22 preceding siblings ...)
  2007-11-01 21:02 ` rguenther at suse dot de
@ 2007-11-03 10:48 ` ebotcazou at gcc dot gnu dot org
  2007-11-05  3:10 ` mmitchel at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 29+ messages in thread
From: ebotcazou at gcc dot gnu dot org @ 2007-11-03 10:48 UTC (permalink / raw)
  To: gcc-bugs



-- 

ebotcazou at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2007-11-03 10:48:04
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (23 preceding siblings ...)
  2007-11-03 10:48 ` ebotcazou at gcc dot gnu dot org
@ 2007-11-05  3:10 ` mmitchel at gcc dot gnu dot org
  2007-11-05 15:42 ` spop at gcc dot gnu dot org
  2007-11-05 15:44 ` spop at gcc dot gnu dot org
  26 siblings, 0 replies; 29+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-11-05  3:10 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (24 preceding siblings ...)
  2007-11-05  3:10 ` mmitchel at gcc dot gnu dot org
@ 2007-11-05 15:42 ` spop at gcc dot gnu dot org
  2007-11-05 15:44 ` spop at gcc dot gnu dot org
  26 siblings, 0 replies; 29+ messages in thread
From: spop at gcc dot gnu dot org @ 2007-11-05 15:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from spop at gcc dot gnu dot org  2007-11-05 15:42 -------
Subject: Bug 33922

Author: spop
Date: Mon Nov  5 15:42:30 2007
New Revision: 129901

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129901
Log:
2007-11-05  Nick Clifton  <nickc@redhat.com>
            Sebastian Pop  <sebastian.pop@amd.com>

        PR tree-optimization/32540
        PR tree-optimization/33922
        * doc/invoke.texi: Document PARAM_MAX_PARTIAL_ANTIC_LENGTH.
        * tree-ssa-pre.c: Include params.h.
        (compute_partial_antic_aux): Use PARAM_MAX_PARTIAL_ANTIC_LENGTH
        to limit the maximum length of the PA set for a given block.
        * Makefile.in: Add a dependency upon params.h for tree-ssa-pre.c
        * params.def (PARAM_MAX_PARTIAL_ANTIC_LENGTH): New parameter.

        * gcc.dg/tree-ssa/pr32540-1.c: New.
        * gcc.dg/tree-ssa/pr32540-2.c: New.
        * gcc.dg/tree-ssa/pr33922.c: New.


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr32540-1.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr32540-2.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr33922.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/doc/invoke.texi
    trunk/gcc/params.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-pre.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
  2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
                   ` (25 preceding siblings ...)
  2007-11-05 15:42 ` spop at gcc dot gnu dot org
@ 2007-11-05 15:44 ` spop at gcc dot gnu dot org
  26 siblings, 0 replies; 29+ messages in thread
From: spop at gcc dot gnu dot org @ 2007-11-05 15:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from spop at gcc dot gnu dot org  2007-11-05 15:44 -------
Fixed.


-- 

spop at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2007-11-05 15:44 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
2007-10-27 15:48 ` [Bug tree-optimization/33922] " rguenth at gcc dot gnu dot org
2007-10-27 16:03 ` tbm at cyrius dot com
2007-10-27 16:07 ` tbm at cyrius dot com
2007-10-27 16:14 ` tbm at cyrius dot com
2007-10-27 16:44 ` tbm at cyrius dot com
2007-10-27 17:51 ` tbm at cyrius dot com
2007-10-27 17:52 ` tbm at cyrius dot com
2007-10-27 18:01 ` [Bug middle-end/33922] " pinskia at gcc dot gnu dot org
2007-10-27 18:08 ` tbm at cyrius dot com
2007-10-27 18:10   ` Andrew Pinski
2007-10-27 18:10 ` pinskia at gmail dot com
2007-10-27 18:13 ` tbm at cyrius dot com
2007-10-27 18:53 ` tbm at cyrius dot com
2007-10-27 18:59 ` [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling) pinskia at gcc dot gnu dot org
2007-10-27 19:27 ` tbm at cyrius dot com
2007-10-28 19:11 ` jakub at gcc dot gnu dot org
2007-10-28 19:42 ` jakub at gcc dot gnu dot org
2007-10-28 20:04 ` maxim at codesourcery dot com
2007-10-28 20:21 ` jakub at gcc dot gnu dot org
2007-10-28 20:43 ` jakub at gcc dot gnu dot org
2007-10-28 21:11 ` jakub at gcc dot gnu dot org
2007-10-29  8:43 ` jakub at gcc dot gnu dot org
2007-11-01 20:59 ` jakub at gcc dot gnu dot org
2007-11-01 21:02 ` rguenther at suse dot de
2007-11-03 10:48 ` ebotcazou at gcc dot gnu dot org
2007-11-05  3:10 ` mmitchel at gcc dot gnu dot org
2007-11-05 15:42 ` spop at gcc dot gnu dot org
2007-11-05 15:44 ` spop at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).