[Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/33761]  New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
@ 2007-10-13 11:27 ubizjak at gmail dot com
  2007-10-13 12:31 ` [Bug target/33761] " rguenth at gcc dot gnu dot org
                   ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-10-13 11:27 UTC (permalink / raw)
  To: gcc-bugs

The measurements were actually done on gzip-1.2.4 sources on core2-d with:

a) gcc -mtune=generic -m32 -O2
b) gcc -mtune=generic -m32 -O3

The testfile was created as the tar archive of current SVN trunk repository,
which currently accounts for 865M uncompressed.

profile of a)

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 54.63     14.76    14.76 102254750     0.00     0.00  longest_match
 18.47     19.75     4.99        1     4.99    27.02  deflate
 10.25     22.52     2.77    27389     0.00     0.00  fill_window
  6.81     24.36     1.84    27390     0.00     0.00  updcrc
  3.15     25.21     0.85     5901     0.00     0.00  compress_block
  2.85     25.98     0.77 203123663     0.00     0.00  send_bits
  2.66     26.70     0.72 89123566     0.00     0.00  ct_tally
  0.67     26.88     0.18  3378994     0.00     0.00  pqdownheap
  0.22     26.94     0.06    17709     0.00     0.00  build_tree
  0.15     26.98     0.04    11802     0.00     0.00  send_tree
  0.07     27.00     0.02  1367732     0.00     0.00  bi_reverse
  0.07     27.02     0.02    17710     0.00     0.00  gen_codes
  0.00     27.02     0.00    27390     0.00     0.00  file_read

profile of b)

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 86.86     29.35    29.35        1    29.35    33.79  deflate
  5.27     31.13     1.78    27390     0.00     0.00  updcrc
  2.69     32.04     0.91     5901     0.00     0.00  compress_block
  2.55     32.90     0.86 89123566     0.00     0.00  ct_tally
  2.04     33.59     0.69 203123663     0.00     0.00  send_bits
  0.44     33.74     0.15    17709     0.00     0.00  build_tree
  0.06     33.76     0.02  1367732     0.00     0.00  bi_reverse
  0.06     33.78     0.02     5903     0.00     0.00  flush_block
  0.03     33.79     0.01    11802     0.00     0.00  send_tree
  0.00     33.79     0.00    27390     0.00     0.00  file_read
  0.00     33.79     0.00     9237     0.00     0.00  flush_outbuf
  0.00     33.79     0.00        2     0.00     0.00  basename
  0.00     33.79     0.00        2     0.00     0.00  copy_block
  0.00     33.79     0.00        1     0.00     0.00  add_envopt

As can be seen from profiles, longest_match was inlined into deflate. Adding
__attribute__((noinline)) to longest_match prototype, we obtain:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 55.80     13.86    13.86 102254750     0.00     0.00  longest_match
 27.62     20.72     6.86        1     6.86    24.84  deflate
  7.09     22.48     1.76    27390     0.00     0.00  updcrc
  3.74     23.41     0.93     5901     0.00     0.00  compress_block
  2.62     24.06     0.65 89123566     0.00     0.00  ct_tally
  2.42     24.66     0.60 203123663     0.00     0.00  send_bits
  0.56     24.80     0.14    17709     0.00     0.00  build_tree
  0.08     24.82     0.02  1367732     0.00     0.00  bi_reverse
  0.08     24.84     0.02    11802     0.00     0.00  send_tree
  0.00     24.84     0.00    27390     0.00     0.00  file_read
  0.00     24.84     0.00     9237     0.00     0.00  flush_outbuf
  0.00     24.84     0.00     5903     0.00     0.00  flush_block
  0.00     24.84     0.00        2     0.00     0.00  basename
  0.00     24.84     0.00        2     0.00     0.00  copy_block

or ~26.5% improvement. I speculate that inlining increases register pressure on
SMALL_REGISTER_CLASS target, as this problem is not that noticeable on x86_64.

The results of 32bit run are at [1] (valid from 13. oct) and results of 64bit
run at [2].

[1]
http://vmakarov.fedorapeople.org/spec/spec2000.toolbox_32/gcc/individual-run-ratio.html
[2]
http://vmakarov.fedorapeople.org/spec/spec2000.toolbox/gcc/individual-run-ratio.html


-- 
           Summary: non-optimal inlining heuristics pessimizes gzip SPEC
                    score at -O3
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: ubizjak at gmail dot com
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
@ 2007-10-13 12:31 ` rguenth at gcc dot gnu dot org
  2007-12-10 10:14 ` [Bug target/33761] [4.3 regression] " ubizjak at gmail dot com
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-10-13 12:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2007-10-13 12:31 -------
I suppose that alias partitioning makes the difference instead.  This is not
really a fault of the inliner but our dumb memory optimizers.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/33761] [4.3 regression] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
  2007-10-13 12:31 ` [Bug target/33761] " rguenth at gcc dot gnu dot org
@ 2007-12-10 10:14 ` ubizjak at gmail dot com
  2007-12-10 10:52 ` [Bug tree-optimization/33761] " rguenth at gcc dot gnu dot org
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-10 10:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from ubizjak at gmail dot com  2007-12-10 10:14 -------
According to Issue 2 from http://gcc.gnu.org/ml/gcc/2007-11/msg00753.html, I
think that this bug qualifies as a 4.3 regression.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2007-12-10 10:14:39
               date|                            |
            Summary|non-optimal inlining        |[4.3 regression] non-optimal
                   |heuristics pessimizes gzip  |inlining heuristics
                   |SPEC score at -O3           |pessimizes gzip SPEC score
                   |                            |at -O3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
  2007-10-13 12:31 ` [Bug target/33761] " rguenth at gcc dot gnu dot org
  2007-12-10 10:14 ` [Bug target/33761] [4.3 regression] " ubizjak at gmail dot com
@ 2007-12-10 10:52 ` rguenth at gcc dot gnu dot org
  2007-12-10 12:31 ` ubizjak at gmail dot com
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-12-10 10:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2007-12-10 10:52 -------
I don't think this qualifies as a 4.3 regression -
http://www.suse.de/~gcctest/SPEC/CINT/sb-haydn-head-64-32o-32bit/index.html
shows that while there were jumps, the numbers close to the 4.2 release are
actually quite similar to what we have now.  So, unless somebody produces
numbers with 4.2 or earlier, this is not a 'regression', but a
missed-optimization only.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org
          Component|target                      |tree-optimization
           Keywords|                            |missed-optimization
            Summary|[4.3 regression] non-optimal|non-optimal inlining
                   |inlining heuristics         |heuristics pessimizes gzip
                   |pessimizes gzip SPEC score  |SPEC score at -O3
                   |at -O3                      |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (2 preceding siblings ...)
  2007-12-10 10:52 ` [Bug tree-optimization/33761] " rguenth at gcc dot gnu dot org
@ 2007-12-10 12:31 ` ubizjak at gmail dot com
  2007-12-10 17:12 ` ubizjak at gmail dot com
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-10 12:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from ubizjak at gmail dot com  2007-12-10 12:31 -------
(In reply to comment #3)
> I don't think this qualifies as a 4.3 regression -

Fair enough. It looks that this problem is specific to Core2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (3 preceding siblings ...)
  2007-12-10 12:31 ` ubizjak at gmail dot com
@ 2007-12-10 17:12 ` ubizjak at gmail dot com
  2007-12-10 17:14 ` rguenther at suse dot de
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-10 17:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from ubizjak at gmail dot com  2007-12-10 17:12 -------
(In reply to comment #4)

> Fair enough. It looks that this problem is specific to Core2.

Here are timings with 'gcc version 4.3.0 20071201 (experimental) [trunk
revision 130554] (GCC)' on

vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU         X6800  @ 2.93GHz
stepping        : 5
cpu MHz         : 2933.389
cache size      : 4096 KB

-mtune=generic -m32 -O3: 40.763s   [*]
-mtune=generic -m32 -O2: 32.170s
-mtune=core2 -m32 -O3  : 36.850s
-mtune=core2 -m32 -O2  : 32.170s

-mtune=generic -m64 -O3: 28.550s
-mtune=generic -m64 -O2: 28.682s
-mtune=core2 -m64 -O3  : 28.670s
-mtune=core2 -m64 -O2  : 28.714s

With __attribute__((noinline)) to longest_match():

-mtune=generic -m32 -O3: 30.658s
-mtune=generic -m32 -O2: 32.154s
-mtune=core2 -m32 -O3  : 30.690s
-mtune=core2 -m32 -O2  : 32.247s

And with FC6 system compiler 'gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)':

-mtune=generic -m32 -O3: 30.154s   [**]
-mtune=generic -m32 -O2: 30.275s

Comparing [*] to [**], it _is_ a regression, at least on Core2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (4 preceding siblings ...)
  2007-12-10 17:12 ` ubizjak at gmail dot com
@ 2007-12-10 17:14 ` rguenther at suse dot de
  2007-12-10 17:26 ` ubizjak at gmail dot com
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenther at suse dot de @ 2007-12-10 17:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rguenther at suse dot de  2007-12-10 17:13 -------
Subject: Re:  non-optimal inlining heuristics
 pessimizes gzip SPEC score at -O3

On Mon, 10 Dec 2007, ubizjak at gmail dot com wrote:

> (In reply to comment #4)
> 
> > Fair enough. It looks that this problem is specific to Core2.
> 
> Here are timings with 'gcc version 4.3.0 20071201 (experimental) [trunk
> revision 130554] (GCC)' on
> 
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 15
> model name      : Intel(R) Core(TM)2 CPU         X6800  @ 2.93GHz
> stepping        : 5
> cpu MHz         : 2933.389
> cache size      : 4096 KB
> 
> -mtune=generic -m32 -O3: 40.763s   [*]
> -mtune=generic -m32 -O2: 32.170s
> -mtune=core2 -m32 -O3  : 36.850s
> -mtune=core2 -m32 -O2  : 32.170s
> 
> -mtune=generic -m64 -O3: 28.550s
> -mtune=generic -m64 -O2: 28.682s
> -mtune=core2 -m64 -O3  : 28.670s
> -mtune=core2 -m64 -O2  : 28.714s
> 
> With __attribute__((noinline)) to longest_match():
> 
> -mtune=generic -m32 -O3: 30.658s
> -mtune=generic -m32 -O2: 32.154s
> -mtune=core2 -m32 -O3  : 30.690s
> -mtune=core2 -m32 -O2  : 32.247s
> 
> And with FC6 system compiler 'gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)':
> 
> -mtune=generic -m32 -O3: 30.154s   [**]
> -mtune=generic -m32 -O2: 30.275s
> 
> Comparing [*] to [**], it _is_ a regression, at least on Core2.

FSF GCC 4.1 does not have -mtune=generic.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (5 preceding siblings ...)
  2007-12-10 17:14 ` rguenther at suse dot de
@ 2007-12-10 17:26 ` ubizjak at gmail dot com
  2007-12-11  6:00 ` [Bug tree-optimization/33761] [4.3 regression] " ubizjak at gmail dot com
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-10 17:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from ubizjak at gmail dot com  2007-12-10 17:26 -------
(In reply to comment #6)

> FSF GCC 4.1 does not have -mtune=generic.

OK, OK. Now with 'gcc version 4.1.3 20070716 (prerelease)':

-m32 -O2: 29.306s
-m32 -O3: 29.582s

I don't have 4.2 here.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] [4.3 regression] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (6 preceding siblings ...)
  2007-12-10 17:26 ` ubizjak at gmail dot com
@ 2007-12-11  6:00 ` ubizjak at gmail dot com
  2007-12-11  6:09 ` steven at gcc dot gnu dot org
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-11  6:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from ubizjak at gmail dot com  2007-12-11 06:00 -------
Regression at least for 4.3.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|missed-optimization         |
            Summary|non-optimal inlining        |[4.3 regression] non-optimal
                   |heuristics pessimizes gzip  |inlining heuristics
                   |SPEC score at -O3           |pessimizes gzip SPEC score
                   |                            |at -O3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] [4.3 regression] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (7 preceding siblings ...)
  2007-12-11  6:00 ` [Bug tree-optimization/33761] [4.3 regression] " ubizjak at gmail dot com
@ 2007-12-11  6:09 ` steven at gcc dot gnu dot org
  2007-12-11  6:17 ` ubizjak at gmail dot com
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: steven at gcc dot gnu dot org @ 2007-12-11  6:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from steven at gcc dot gnu dot org  2007-12-11 06:09 -------
One of those "regressions" where actually GCC made progress overall.  This
should be low priority for GCC 4.3.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] [4.3 regression] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (8 preceding siblings ...)
  2007-12-11  6:09 ` steven at gcc dot gnu dot org
@ 2007-12-11  6:17 ` ubizjak at gmail dot com
  2008-01-16 17:40 ` [Bug tree-optimization/33761] " hubicka at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-11  6:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from ubizjak at gmail dot com  2007-12-11 06:16 -------
(In reply to comment #9)
> One of those "regressions" where actually GCC made progress overall.  This
> should be low priority for GCC 4.3.

Probably it is too early in the morning here, but I can't see any traces of
overall progress, when execution time of the testcase in _this_ PR goes from
30s to 40s.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (9 preceding siblings ...)
  2007-12-11  6:17 ` ubizjak at gmail dot com
@ 2008-01-16 17:40 ` hubicka at gcc dot gnu dot org
  2008-02-02 16:23 ` hubicka at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-01-16 17:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from hubicka at gcc dot gnu dot org  2008-01-16 16:46 -------
Last time I looked into it, it was code                                        
  alignment affected by inlining in the string matching loop (longest_match). 
This code is very atypical, since the internal loop comparing strings is hand
unrolled but it almost never rolls, since the compressed strings tends to be
all different.  GCC mispredicts this                                           
  moving some stuff out of the loop and bb-reorder aligns the code in a        
                                          way that the default path not doing
the loop is jumping pretty far                                                
hurting decode bandwidth of K8 especially because the jumps are hard to        
                                   predict.                                     

I don't see any direct things in the code heuristics can use to realize        
                                   that the loop is not rooling, except for
special casing the particular                                            
benchmark.                                                                      

FDO scores of gzip are not doing that bad, but there is still gap              
                                   relative to ICC (even archaic version of it
running 32bit compared to 64bit GCC).                                           
http://www.suse.de/~gcctest/SPEC-britten/CINT/sandbox-britten-FDO/index.html    
It would be nice to convince gzip/zlibc/bzip2 people to use profiling by       
                                   default in the build process - those
packages are ideal targets.                                                  

But since core is not that much sensitive to code alignment and nuber of       
                                   jumps as K8, perhaps there are extra
problems demonstrated by this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (10 preceding siblings ...)
  2008-01-16 17:40 ` [Bug tree-optimization/33761] " hubicka at gcc dot gnu dot org
@ 2008-02-02 16:23 ` hubicka at gcc dot gnu dot org
  2008-02-03 13:40 ` hubicka at gcc dot gnu dot org
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-02 16:23 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #12 from hubicka at gcc dot gnu dot org  2008-02-02 16:22 -------
Created an attachment (id=15079)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15079&action=view)
address accumulation patch

While working on PR17863 I wrote the attached patch to make fwprop to combine
code like:

a=base;
*a=something;
a++;
*a=something;
a++;
*a=something;
...

into

*base=something
a=base+1
*a=something
a=base+2
*a=something
....

I dropped it to vangelis and nightly tester shows gzip improvement 815->880.
Gzip internal loop is hand unrolled into similar form as shown above.
(the tester peaks in Jul 2005 with scores somewhat above 900). Since it gzip
results tends to be unstable it would be nice to know how this reproduce on
other targets/setups.

Honza

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (11 preceding siblings ...)
  2008-02-02 16:23 ` hubicka at gcc dot gnu dot org
@ 2008-02-03 13:40 ` hubicka at gcc dot gnu dot org
  2008-02-03 17:35 ` ubizjak at gmail dot com
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-03 13:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from hubicka at gcc dot gnu dot org  2008-02-03 13:39 -------
Tonight runs on haydn with patch in shows regression on gzip: 950->901 in
32bit. FDO 64bit runs are not affected.

This is same score as we had in December, we improved a bit since then but not
enough to match score we used to have.  
Looks like codegen of the string compare loop is very unstable here.
Uros, would be possible to give it a try on Core?  That would help to figure
out if it is code layout problem of K8.

Honza


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2007-12-10 10:14:39         |2008-02-03 13:39:42
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (12 preceding siblings ...)
  2008-02-03 13:40 ` hubicka at gcc dot gnu dot org
@ 2008-02-03 17:35 ` ubizjak at gmail dot com
  2008-02-05 13:37 ` hubicka at gcc dot gnu dot org
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-02-03 17:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from ubizjak at gmail dot com  2008-02-03 17:35 -------
(In reply to comment #13)

> Uros, would be possible to give it a try on Core?  That would help to figure
> out if it is code layout problem of K8.

Hm, the patch doesn't seem to help:

-m32 -O2: 32.434
-m32 -O2 (patched): 32.586

-m32 -O3: 40.723
-m32 -O3 (patched): 41.059


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (13 preceding siblings ...)
  2008-02-03 17:35 ` ubizjak at gmail dot com
@ 2008-02-05 13:37 ` hubicka at gcc dot gnu dot org
  2008-02-05 13:56 ` hubicka at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-05 13:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from hubicka at gcc dot gnu dot org  2008-02-05 13:36 -------
Thanks, looks comparable to K8 scores, except that -O3 is not actually that
worse there.  So it looks there is more than just random effect of code layout
involved, I will try to look into the assembly produced more.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (14 preceding siblings ...)
  2008-02-05 13:37 ` hubicka at gcc dot gnu dot org
@ 2008-02-05 13:56 ` hubicka at gcc dot gnu dot org
  2008-02-06 13:29 ` hubicka at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-05 13:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from hubicka at gcc dot gnu dot org  2008-02-05 13:55 -------
Thanks, looks comparable to K8 scores, except that -O3 is not actually that
worse there.  So it looks there is more than just random effect of code layout
involved, I will try to look into the assembly produced more.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (15 preceding siblings ...)
  2008-02-05 13:56 ` hubicka at gcc dot gnu dot org
@ 2008-02-06 13:29 ` hubicka at gcc dot gnu dot org
  2008-02-06 16:45 ` hubicka at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-06 13:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from hubicka at gcc dot gnu dot org  2008-02-06 13:28 -------
One problem is the following:
  do {
        ;
        match = window + cur_match;
        if (match[best_len] != scan_end ||
            match[best_len-1] != scan_end1 ||
            *match != *scan ||
            *++match != scan[1]) continue;
        scan += 2, match++;
        do {
        } while (*++scan == *++match && *++scan == *++match &&
                 *++scan == *++match && *++scan == *++match &&
                 *++scan == *++match && *++scan == *++match &&
                 *++scan == *++match && *++scan == *++match &&
                 scan < strend);

....

The internal loop is the string comparsion thingy, while the branch prediction
logic completely misses it: the continue statement looks like it is forming 4
nested loops, so it concludes that this is the internal loop.

We used to have prediction heuristic guessing that continue statement is not
used to form a loop.  This was killed when gimplification was introduced. 
Perhaps we should bring it back, since this is resonably common scenario.

Looking at longest_match in not unrolled version, the "loops" formed by
continue statement has frequencies: 298, 961, 2139, 3100, 6900, 1000
so every loop is predicted to iterate about twice.

The outer real loop now gets frequency 92, ie small enough to be predicted as
cold.  The string comparsion loop now get freuqnecy 344, predicted to iterate 3
times (quite realistically).  But because the frequency is so small we end up
allocating one of the two pointers in memory:
.L9:
        leal    1(%ecx), %eax
        movl    %eax, -16(%ebp)
        movzbl  1(%ecx), %eax
        cmpb    1(%edx), %al
        jne     .L8
        leal    2(%ecx), %eax
        movl    %eax, -16(%ebp)
        movzbl  2(%ecx), %eax
        cmpb    2(%edx), %al
        jne     .L8
        leal    3(%ecx), %eax
        movl    %eax, -16(%ebp)
        movzbl  3(%ecx), %eax
        cmpb    3(%edx), %al
        jne     .L8
        leal    4(%ecx), %eax
        movl    %eax, -16(%ebp)
        movzbl  4(%ecx), %eax 
        cmpb    4(%edx), %al
        jne     .L8
        leal    5(%ecx), %eax
        movl    %eax, -16(%ebp)
        movzbl  5(%ecx), %eax
        cmpb    5(%edx), %al
        jne     .L8

This happens in offline copy of longest_match. The inline gets this detail
right, but frequencies of the deflate functions are all crazy, naturally.

I guess I should revive the patch for language scope branch predictors.
Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (16 preceding siblings ...)
  2008-02-06 13:29 ` hubicka at gcc dot gnu dot org
@ 2008-02-06 16:45 ` hubicka at gcc dot gnu dot org
  2008-02-06 16:57 ` hubicka at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-06 16:45 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #18 from hubicka at gcc dot gnu dot org  2008-02-06 16:44 -------
Created an attachment (id=15107)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15107&action=view)
Path to predict_paths_leading_to

Hi,
I've revived the continue heuristic patch.  By itself it does not help becuase
of bug in predict_paths_leading_to.

The code looks as follows:

if (test1)
   goto continue_block;
if (test2)
   goto continue_block;
if (test3)
   goto continue_block;
if (test4)
   goto continue_block;
goto real_loop_body;
continue_block:
   goto loop_header;

We call predict_paths_leading_to on the continue_block and expect that the
continue_block will not be very likely.

What the function does is to find dominator of continue_block that is the
if(test1) block and predict edge from the first block.  This is however not
quite enough as all the other paths remain likely.

It seems to me that we need to walk the whole set of BBs postdominated by the
BB and mark all edges forming edge cut defined by this set.

I am testing the attached patch.  It makes the function linear (so we are
overall quadratic) for very deep postdominator tree.  If this turns out to be
problem, I think we can just cut the computation after some specified amount of
BBs is walked.

Zdenek, does this seem sane?

With this change and continue prediction patch I get sort of sane prediction
for longest_match function.  Profile is still quite unrealistic, but I am
testing if it makes noticeable difference.

Honza

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (17 preceding siblings ...)
  2008-02-06 16:45 ` hubicka at gcc dot gnu dot org
@ 2008-02-06 16:57 ` hubicka at gcc dot gnu dot org
  2008-02-06 18:43 ` ubizjak at gmail dot com
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-06 16:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from hubicka at gcc dot gnu dot org  2008-02-06 16:56 -------
Created an attachment (id=15108)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15108&action=view)
Complete continue heuristic patch

Hi,
this is the complete patch.  With this patch we produce profile sane enough so
the internal loops are not marked cold.  I will benchmark it probably tomorrow
(I want to wait for the FP changes to show separately).

It fixes the offline copy of longest_match, so we no longer have one of IV
variables at stack:
.L15:
        movzbl  2(%edx), %eax
        leal    2(%edx), %esi
        cmpb    2(%ecx), %al
        jne     .L8
        movzbl  3(%edx), %eax
        leal    3(%edx), %esi
        cmpb    3(%ecx), %al
        jne     .L8 
        movzbl  4(%edx), %eax
        leal    4(%edx), %esi
        cmpb    4(%ecx), %al
        jne     .L8
        movzbl  5(%edx), %eax
        leal    5(%edx), %esi
        cmpb    5(%ecx), %al
        jne     .L8
        movzbl  6(%edx), %eax
        leal    6(%edx), %esi
        cmpb    6(%ecx), %al
        jne     .L8
        movzbl  7(%edx), %eax
        leal    7(%edx), %esi
        cmpb    7(%ecx), %al
        jne     .L8
        leal    8(%ecx), %eax
        movl    %eax, %ecx
        movzbl  8(%edx), %eax
        cmpb    (%ecx), %al
        leal    8(%edx), %ebx
        movl    %ebx, %esi
        jne     .L8
        cmpl    %ebx, -20(%ebp)
        jbe     .L8
        movl    %ebx, %edx
        movzbl  1(%edx), %eax
        leal    1(%edx), %esi
        cmpb    1(%ecx), %al
        je      .L15

Irronically this can further widen the gap in between -O2 and -O3, since the
inline copy in deflate was always allocated resonably.
Deflate codegen changes quite a lot and because function body is big I will
wait for benchmarks before trying to analyze futher.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (18 preceding siblings ...)
  2008-02-06 16:57 ` hubicka at gcc dot gnu dot org
@ 2008-02-06 18:43 ` ubizjak at gmail dot com
  2008-02-06 19:11 ` ubizjak at gmail dot com
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-02-06 18:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from ubizjak at gmail dot com  2008-02-06 18:42 -------
Whoa, adding -fomit-frame-pointer brings us from

(gcc -O3 -m32)
user    0m41.031s

to

(gcc -O3 -m32 -fomit-frame-pointer)
user    0m30.006s

Since -fo-f-p adds another free reg, it looks that since inlining increases
register pressure some unlucky heavy-used variable gets allocated to the stack
slot.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (19 preceding siblings ...)
  2008-02-06 18:43 ` ubizjak at gmail dot com
@ 2008-02-06 19:11 ` ubizjak at gmail dot com
  2008-02-06 19:22 ` hubicka at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-02-06 19:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from ubizjak at gmail dot com  2008-02-06 19:10 -------
(In reply to comment #20)

> Since -fo-f-p adds another free reg, it looks that since inlining increases
> register pressure some unlucky heavy-used variable gets allocated to the stack
> slot.


It is "best_len" (and probably some others, too):

[uros@localhost gzip-1.2.4]$ grep best_len fp.s
        movl    %edx, -68(%ebp) #, best_len
        movl    -68(%ebp), %edx # best_len, best_len.494
        movl    %edx, -68(%ebp) # best_len.494, best_len
        movl    -68(%ebp), %edx # best_len,
        movl    -68(%ebp), %edx # best_len,
        movl    -68(%ebp), %edx # best_len, best_len.494
        cmpl    %esi, %edx      # lookahead, best_len.494
        movl    %edx, -108(%ebp)        # best_len.494, match_length
        movl    -68(%ebp), %edx # best_len, best_len.494
        movl    %edx, -88(%ebp) # prev_length.28, best_len
        movl    -88(%ebp), %edx # best_len, best_len.457
        movl    %edx, -88(%ebp) # best_len.457, best_len
        movl    -88(%ebp), %eax # best_len,
        movl    -88(%ebp), %edx # best_len,
        movl    -88(%ebp), %edx # best_len, best_len.457
        cmpl    %esi, %edx      # lookahead, best_len.457
        movl    %edx, -40(%ebp) # best_len.457, match_length.404
        movl    -88(%ebp), %edx # best_len, best_len.457
        leal    (%ecx,%eax), %edx       #, best_len.457
        cmpl    %edx, -88(%ebp) # best_len.457, best_len
        cmpl    -96(%ebp), %edx # nice_match.34, best_len.457
        leal    (%ecx,%eax), %edx       #, best_len.494
        cmpl    %edx, -68(%ebp) # best_len.494, best_len
        cmpl    -76(%ebp), %edx # nice_match.34, best_len.494

[uros@localhost gzip-1.2.4]$ grep best_len no-fp.s
        movl    %edx, 76(%esp)  #, best_len
        movl    76(%esp), %edx  # best_len,
        movl    76(%esp), %edx  # best_len, best_len.494
        movl    %edx, 76(%esp)  # best_len.494, best_len
        movl    76(%esp), %eax  # best_len,
        movl    76(%esp), %edx  # best_len, best_len.494
        movl    %edx, %ebp      # best_len.494, match_length
        movl    76(%esp), %edx  # best_len, best_len.494
        movl    %edx, %ebp      # prev_length.28, best_len
        movl    %ebp, %edx      # best_len, best_len.457
        movl    %edx, %ebp      # best_len.457, best_len
        movl    %ebp, %edx      # best_len, best_len.457
        cmpl    %esi, %edx      # lookahead, best_len.457
        movl    %ebp, %edx      # best_len, best_len.457
        leal    (%ecx,%eax), %edx       #, best_len.494
        cmpl    %edx, 76(%esp)  # best_len.494, best_len
        cmpl    68(%esp), %edx  # nice_match.34, best_len.494
        leal    (%ecx,%eax), %edx       #, best_len.457
        cmpl    %edx, %ebp      # best_len.457, best_len
        cmpl    52(%esp), %edx  # nice_match.34, best_len.457


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (20 preceding siblings ...)
  2008-02-06 19:11 ` ubizjak at gmail dot com
@ 2008-02-06 19:22 ` hubicka at gcc dot gnu dot org
  2008-02-07 12:31 ` hubicka at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-06 19:22 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #22 from hubicka at gcc dot gnu dot org  2008-02-06 19:22 -------
Yes, there are number of unlucky variables. However the real source is here
seems to be always wrong profile guiding regalloc to optimize for cold portions
of the function rather than real increase of register pressure increase due to
inlining.  

In general, inlining operation itself only decrease register pressure: you
don't fix function parameters/return value to fixed registers and you know
precisely what registers survive the body so you don't need to save caller
saved registers when not needed. 

The losses from inlining with our regalloc is partly due to callee saved
registers being sometimes more effective sort of immitating live range
splitting. Increased register pressure is effect of propagating from function
body to the rest of program, but it is not that bat either: at least all the
inlining heuristic/RA bugs turned to be something else.

The high speedup by forwprop patch in 64bit mode (and slowdown in 32bit) is
actually also register allocation related: the internal loop consisting of
sequence of ++ operations ends up with extra copy instructions without forwprop
patch, while with the patch we produce normal induction variable.  On 32bit it
however results in regalloc putting this variable on stack because its
liferange heuristics gives it lower priority then.

For 32bit data, britten 32-bit SPEC tester peaked at 760, while we now get 620
on peak with -fomit-frame-pointer. 20% regression on rather simple commonly
used codebase definitly makes us look stupid.... More though that ICC 7.x did
820 on same machine. 64bit tester is 830 versus 740 approximately.

Honza

-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu dot
                   |                            |org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (21 preceding siblings ...)
  2008-02-06 19:22 ` hubicka at gcc dot gnu dot org
@ 2008-02-07 12:31 ` hubicka at gcc dot gnu dot org
  2008-02-08 15:12 ` hubicka at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-07 12:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from hubicka at gcc dot gnu dot org  2008-02-07 12:30 -------
Created an attachment (id=15115)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15115&action=view)
Annotated profile

I am attaching dump with profile read in.  It shows the hot spots in
longest_match at least:

(this is first conditional of the continue guard)
  # BLOCK 27 freq:10000 count:1346119696
  # PRED: 6 [100.0%]  count:112241556 (fallthru) 25 [99.5%]  count:1233878140
(true,exec)
  # scan_end_13 = PHI <scan_end_106(6), scan_end_14(25)>
  # scan_end1_11 = PHI <scan_end1_93(6), scan_end1_12(25)>
  # best_len_8 = PHI <best_len_25(6), best_len_9(25)>
  # scan_3 = PHI <scan_24(6), scan_6(25)>
  # chain_length_2 = PHI <chain_length_108(6), chain_length_105(25)>
  # cur_match_1 = PHI <cur_match_109(6), cur_match_104(25)>
  match_40 = &window + cur_match_1;
  best_len.31_41 = (unsigned int) best_len_8;
  D.2379_42 = match_40 + best_len.31_41;
  D.2380_43 = *D.2379_42;
  if (D.2380_43 != scan_end_13)
    goto <bb 10>;
  else
    goto <bb 7>;

  # SUCC: 10 [0.1%]  count:33977 (true,exec) 11 [99.9%]  count:48979565
(false,exec)

  # BLOCK 10 freq:9636 count:1297140131
  # PRED: 27 [87.5%]  count:1177665163 (true,exec) 7 [55.2%]  count:93018627
(true,exec) 8 [35.0%]  count:26422364 (true,exec) 9 [0.1%]  count:33977
(true,exec)
  goto <bb 24>;

(this is the continue statement)

  D.2391_102 = cur_match_1 & 32767;
  D.2392_103 = prev[D.2391_102];
  cur_match_104 = (IPos) D.2392_103;
  if (limit_15 >= cur_match_104)
    goto <bb 26>;
  else
    goto <bb 25>;
  # SUCC: 26 [7.7%]  count:104056913 (true,exec) 25 [92.3%]  count:1240391903
(false,exec)

  # BLOCK 25 freq:9215 count:1240391903
  # PRED: 24 [92.3%]  count:1240391903 (false,exec)
  chain_length_105 = chain_length_2 + 0x0ffffffff;
  if (chain_length_105 != 0)
    goto <bb 27>;
  else
    goto <bb 26>;


(this is end of outer loop)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (22 preceding siblings ...)
  2008-02-07 12:31 ` hubicka at gcc dot gnu dot org
@ 2008-02-08 15:12 ` hubicka at gcc dot gnu dot org
  2008-02-08 15:40 ` hubicka at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-08 15:12 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #24 from hubicka at gcc dot gnu dot org  2008-02-08 15:11 -------
Hi,
the tonight runs with continue heuristics shows again improvements on 64bit
scores , but degradation on 32bit scores.  Looking into the loop, the real
trouble seems to be that the main loop has 6 loop carried variables:

scan_end, scan_end1, best_len, scan, chain_length, cur_match

plus few temporaries are needed too. Obviously we can't fit in registers on
i386. Making profile more realistic sometime helps sometimes hurts pretty much
at random basis.

One case where I think register presure is increased is the fact that different
SSA names of both scan_end and scan_end1 variables are actually not fully
coalesced in out-of-SSA.  This is result of optimizing:

        if (match[best_len] != scan_end ||
            match[best_len-1] != scan_end1 ||
            *match != *scan ||
            *++match != scan[1]) continue;
       ...later code sometimes modifying scan_end....

into computing match[best_len] into name of scan_end that is sometimes assigned
int the later code on the path not modifying scan_end.  As a result we do have
two scan_ends live at once.  I wonder if we can avoid this behaviour, though it
looks all right on SSA form, it would save 2 "global" registers: there is no
need at all to cache match[best_len]/match[best_len1] in register unless I
missed something. Those two vars are manipulated on the hot paths through the
loop.

Now the RA is driven by frequencies (bit confused by fact that two of loop
carried vars are split) and by their "liveranges" that is actually number of
instructions in bettween first and last occurence.  Since we are bit carelless
on BB ordering moving some code to the very end of function, this heuristics is
not realistic at all.  It would probably make more sense to replace it by
number of inssn it is live across, but this is probably ninsn*npseudos to
compute. Other idea would be degree in conflict graph, but I am not sure we
want to start such experiemtns in parallel with YARA.

I tested YARA and it does not handle this situation much better. Perhaps
Vladimir can help?

Honza

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (23 preceding siblings ...)
  2008-02-08 15:12 ` hubicka at gcc dot gnu dot org
@ 2008-02-08 15:40 ` hubicka at gcc dot gnu dot org
  2008-09-06 12:01 ` hubicka at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-08 15:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from hubicka at gcc dot gnu dot org  2008-02-08 15:39 -------
-fno-tree-dominator-opts -fno-tree-copyrename solves the coalescing problem
(name is introduced by second, the actual problematic pattern by first pass),
saving roughly 1s at both -O2 and 2s at -O3, -O3 is still worse however
Internal loop no longer spills, just reads val of scan_end stored in memory.

I will play with it more later and make simple testcase for this.
Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (24 preceding siblings ...)
  2008-02-08 15:40 ` hubicka at gcc dot gnu dot org
@ 2008-09-06 12:01 ` hubicka at gcc dot gnu dot org
  2008-09-06 12:04 ` hubicka at gcc dot gnu dot org
  2008-09-06 15:27 ` [Bug tree-optimization/33761] tree-copyrename and tree-dominators pessimizes gzip SPEC score ubizjak at gmail dot com
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-09-06 12:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #26 from hubicka at gcc dot gnu dot org  2008-09-06 12:00 -------
IRA seems to fix the remaining problem with spill in internal loop on 32bit
nicely, so we produce good scores for gzip compared to older GCC versions. 
http://gcc.opensuse.org/SPEC-britten/CINT/sandbox-britten-32bit/164_gzip_big.png
and with profile feedback
http://gcc.opensuse.org/SPEC-britten/CINT/sandbox-britten-FDO/164_gzip_big.png
we get close to ICC scores.

We now output comparsion loop as:
.L98:   
        movzbl  1(%eax), %edx   #,
        leal    1(%eax), %edi   #, scan
        cmpb    1(%ecx), %dl    #,
        jne     .L161   #,
        movzbl  2(%eax), %edx   #,
        leal    2(%eax), %edi   #, scan
        cmpb    2(%ecx), %dl    #,
        jne     .L161   #,
        movzbl  3(%eax), %edx   #,
        leal    3(%eax), %edi   #, scan
        cmpb    3(%ecx), %dl    #,
        jne     .L161   #,
        movzbl  4(%eax), %edx   #,
        leal    4(%eax), %edi   #, scan
        cmpb    4(%ecx), %dl    #,
        jne     .L161   #,
        movzbl  5(%eax), %edx   #,
        leal    5(%eax), %edi   #, scan
        cmpb    5(%ecx), %dl    #,
        jne     .L161   #,
        movzbl  6(%eax), %edx   #,
        leal    6(%eax), %edi   #, scan
        cmpb    6(%ecx), %dl    #,
        jne     .L161   #,
        movzbl  7(%eax), %edx   #,
        leal    7(%eax), %edi   #, scan
        cmpb    7(%ecx), %dl    #,
        jne     .L161   #,

there is still room for improvement however.

Remaining problem is that we still miss coaliescing of scan_end and scan_end1
(so -fno-tree-dominator-opts -fno-tree-copyrename still helps).

Vladimir, perhaps this can be solved in IRA too?

Honza


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (25 preceding siblings ...)
  2008-09-06 12:01 ` hubicka at gcc dot gnu dot org
@ 2008-09-06 12:04 ` hubicka at gcc dot gnu dot org
  2008-09-06 15:27 ` [Bug tree-optimization/33761] tree-copyrename and tree-dominators pessimizes gzip SPEC score ubizjak at gmail dot com
  27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-09-06 12:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #27 from hubicka at gcc dot gnu dot org  2008-09-06 12:02 -------
Also just noticed that offline copy of longest-match get extra move:
.L15:   
        movzbl  2(%eax), %edi   #, tmp87
        leal    2(%eax), %ecx   #, scan.158
        movl    %edi, %edx      # tmp87,
        cmpb    2(%ebx), %dl    #,
        jne     .L6     #,
        movzbl  3(%eax), %edi   #, tmp88
        leal    3(%eax), %ecx   #, scan.158
        movl    %edi, %edx      # tmp88,
        cmpb    3(%ebx), %dl    #,
        jne     .L6     #,      
        movzbl  4(%eax), %edi   #, tmp89
        leal    4(%eax), %ecx   #, scan.158
        movl    %edi, %edx      # tmp89,
        cmpb    4(%ebx), %dl    #,
        jne     .L6     #,
        movzbl  5(%eax), %edi   #, tmp90
        leal    5(%eax), %ecx   #, scan.158
        movl    %edi, %edx      # tmp90,
        cmpb    5(%ebx), %dl    #,
        jne     .L6     #,

while inlined copy is fine:
.L98:   
        movzbl  1(%eax), %edx   #,
        leal    1(%eax), %edi   #, scan
        cmpb    1(%ecx), %dl    #,
        jne     .L161   #,
        movzbl  2(%eax), %edx   #,
        leal    2(%eax), %edi   #, scan
        cmpb    2(%ecx), %dl    #,
        jne     .L161   #,
        movzbl  3(%eax), %edx   #,
        leal    3(%eax), %edi   #, scan
        cmpb    3(%ecx), %dl    #,
        jne     .L161   #,
        movzbl  4(%eax), %edx   #,
        leal    4(%eax), %edi   #, scan
        cmpb    4(%ecx), %dl    #,
        jne     .L161   #,
interesting :)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/33761] tree-copyrename and tree-dominators pessimizes gzip SPEC score
  2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
                   ` (26 preceding siblings ...)
  2008-09-06 12:04 ` hubicka at gcc dot gnu dot org
@ 2008-09-06 15:27 ` ubizjak at gmail dot com
  27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-09-06 15:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #28 from ubizjak at gmail dot com  2008-09-06 15:26 -------
(In reply to comment #27)
> Also just noticed that offline copy of longest-match get extra move:
> .L15:   
>         movzbl  2(%eax), %edi   #, tmp87
>         leal    2(%eax), %ecx   #, scan.158
>         movl    %edi, %edx      # tmp87,
>         cmpb    2(%ebx), %dl    #,
>         jne     .L6     #,
>
> interesting :)

Perhaps due to ineffective regmove (similar to PR 37364)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2008-09-06 15:27 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
2007-10-13 12:31 ` [Bug target/33761] " rguenth at gcc dot gnu dot org
2007-12-10 10:14 ` [Bug target/33761] [4.3 regression] " ubizjak at gmail dot com
2007-12-10 10:52 ` [Bug tree-optimization/33761] " rguenth at gcc dot gnu dot org
2007-12-10 12:31 ` ubizjak at gmail dot com
2007-12-10 17:12 ` ubizjak at gmail dot com
2007-12-10 17:14 ` rguenther at suse dot de
2007-12-10 17:26 ` ubizjak at gmail dot com
2007-12-11  6:00 ` [Bug tree-optimization/33761] [4.3 regression] " ubizjak at gmail dot com
2007-12-11  6:09 ` steven at gcc dot gnu dot org
2007-12-11  6:17 ` ubizjak at gmail dot com
2008-01-16 17:40 ` [Bug tree-optimization/33761] " hubicka at gcc dot gnu dot org
2008-02-02 16:23 ` hubicka at gcc dot gnu dot org
2008-02-03 13:40 ` hubicka at gcc dot gnu dot org
2008-02-03 17:35 ` ubizjak at gmail dot com
2008-02-05 13:37 ` hubicka at gcc dot gnu dot org
2008-02-05 13:56 ` hubicka at gcc dot gnu dot org
2008-02-06 13:29 ` hubicka at gcc dot gnu dot org
2008-02-06 16:45 ` hubicka at gcc dot gnu dot org
2008-02-06 16:57 ` hubicka at gcc dot gnu dot org
2008-02-06 18:43 ` ubizjak at gmail dot com
2008-02-06 19:11 ` ubizjak at gmail dot com
2008-02-06 19:22 ` hubicka at gcc dot gnu dot org
2008-02-07 12:31 ` hubicka at gcc dot gnu dot org
2008-02-08 15:12 ` hubicka at gcc dot gnu dot org
2008-02-08 15:40 ` hubicka at gcc dot gnu dot org
2008-09-06 12:01 ` hubicka at gcc dot gnu dot org
2008-09-06 12:04 ` hubicka at gcc dot gnu dot org
2008-09-06 15:27 ` [Bug tree-optimization/33761] tree-copyrename and tree-dominators pessimizes gzip SPEC score ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).