public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
@ 2007-10-13 11:27 ubizjak at gmail dot com
2007-10-13 12:31 ` [Bug target/33761] " rguenth at gcc dot gnu dot org
` (27 more replies)
0 siblings, 28 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-10-13 11:27 UTC (permalink / raw)
To: gcc-bugs
The measurements were actually done on gzip-1.2.4 sources on core2-d with:
a) gcc -mtune=generic -m32 -O2
b) gcc -mtune=generic -m32 -O3
The testfile was created as the tar archive of current SVN trunk repository,
which currently accounts for 865M uncompressed.
profile of a)
% cumulative self self total
time seconds seconds calls s/call s/call name
54.63 14.76 14.76 102254750 0.00 0.00 longest_match
18.47 19.75 4.99 1 4.99 27.02 deflate
10.25 22.52 2.77 27389 0.00 0.00 fill_window
6.81 24.36 1.84 27390 0.00 0.00 updcrc
3.15 25.21 0.85 5901 0.00 0.00 compress_block
2.85 25.98 0.77 203123663 0.00 0.00 send_bits
2.66 26.70 0.72 89123566 0.00 0.00 ct_tally
0.67 26.88 0.18 3378994 0.00 0.00 pqdownheap
0.22 26.94 0.06 17709 0.00 0.00 build_tree
0.15 26.98 0.04 11802 0.00 0.00 send_tree
0.07 27.00 0.02 1367732 0.00 0.00 bi_reverse
0.07 27.02 0.02 17710 0.00 0.00 gen_codes
0.00 27.02 0.00 27390 0.00 0.00 file_read
profile of b)
% cumulative self self total
time seconds seconds calls s/call s/call name
86.86 29.35 29.35 1 29.35 33.79 deflate
5.27 31.13 1.78 27390 0.00 0.00 updcrc
2.69 32.04 0.91 5901 0.00 0.00 compress_block
2.55 32.90 0.86 89123566 0.00 0.00 ct_tally
2.04 33.59 0.69 203123663 0.00 0.00 send_bits
0.44 33.74 0.15 17709 0.00 0.00 build_tree
0.06 33.76 0.02 1367732 0.00 0.00 bi_reverse
0.06 33.78 0.02 5903 0.00 0.00 flush_block
0.03 33.79 0.01 11802 0.00 0.00 send_tree
0.00 33.79 0.00 27390 0.00 0.00 file_read
0.00 33.79 0.00 9237 0.00 0.00 flush_outbuf
0.00 33.79 0.00 2 0.00 0.00 basename
0.00 33.79 0.00 2 0.00 0.00 copy_block
0.00 33.79 0.00 1 0.00 0.00 add_envopt
As can be seen from profiles, longest_match was inlined into deflate. Adding
__attribute__((noinline)) to longest_match prototype, we obtain:
% cumulative self self total
time seconds seconds calls s/call s/call name
55.80 13.86 13.86 102254750 0.00 0.00 longest_match
27.62 20.72 6.86 1 6.86 24.84 deflate
7.09 22.48 1.76 27390 0.00 0.00 updcrc
3.74 23.41 0.93 5901 0.00 0.00 compress_block
2.62 24.06 0.65 89123566 0.00 0.00 ct_tally
2.42 24.66 0.60 203123663 0.00 0.00 send_bits
0.56 24.80 0.14 17709 0.00 0.00 build_tree
0.08 24.82 0.02 1367732 0.00 0.00 bi_reverse
0.08 24.84 0.02 11802 0.00 0.00 send_tree
0.00 24.84 0.00 27390 0.00 0.00 file_read
0.00 24.84 0.00 9237 0.00 0.00 flush_outbuf
0.00 24.84 0.00 5903 0.00 0.00 flush_block
0.00 24.84 0.00 2 0.00 0.00 basename
0.00 24.84 0.00 2 0.00 0.00 copy_block
or ~26.5% improvement. I speculate that inlining increases register pressure on
SMALL_REGISTER_CLASS target, as this problem is not that noticeable on x86_64.
The results of 32bit run are at [1] (valid from 13. oct) and results of 64bit
run at [2].
[1]
http://vmakarov.fedorapeople.org/spec/spec2000.toolbox_32/gcc/individual-run-ratio.html
[2]
http://vmakarov.fedorapeople.org/spec/spec2000.toolbox/gcc/individual-run-ratio.html
--
Summary: non-optimal inlining heuristics pessimizes gzip SPEC
score at -O3
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: ubizjak at gmail dot com
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
@ 2007-10-13 12:31 ` rguenth at gcc dot gnu dot org
2007-12-10 10:14 ` [Bug target/33761] [4.3 regression] " ubizjak at gmail dot com
` (26 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-10-13 12:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2007-10-13 12:31 -------
I suppose that alias partitioning makes the difference instead. This is not
really a fault of the inliner but our dumb memory optimizers.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/33761] [4.3 regression] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
2007-10-13 12:31 ` [Bug target/33761] " rguenth at gcc dot gnu dot org
@ 2007-12-10 10:14 ` ubizjak at gmail dot com
2007-12-10 10:52 ` [Bug tree-optimization/33761] " rguenth at gcc dot gnu dot org
` (25 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-10 10:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from ubizjak at gmail dot com 2007-12-10 10:14 -------
According to Issue 2 from http://gcc.gnu.org/ml/gcc/2007-11/msg00753.html, I
think that this bug qualifies as a 4.3 regression.
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2007-12-10 10:14:39
date| |
Summary|non-optimal inlining |[4.3 regression] non-optimal
|heuristics pessimizes gzip |inlining heuristics
|SPEC score at -O3 |pessimizes gzip SPEC score
| |at -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
2007-10-13 12:31 ` [Bug target/33761] " rguenth at gcc dot gnu dot org
2007-12-10 10:14 ` [Bug target/33761] [4.3 regression] " ubizjak at gmail dot com
@ 2007-12-10 10:52 ` rguenth at gcc dot gnu dot org
2007-12-10 12:31 ` ubizjak at gmail dot com
` (24 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-12-10 10:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from rguenth at gcc dot gnu dot org 2007-12-10 10:52 -------
I don't think this qualifies as a 4.3 regression -
http://www.suse.de/~gcctest/SPEC/CINT/sb-haydn-head-64-32o-32bit/index.html
shows that while there were jumps, the numbers close to the 4.2 release are
actually quite similar to what we have now. So, unless somebody produces
numbers with 4.2 or earlier, this is not a 'regression', but a
missed-optimization only.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu dot
| |org
Component|target |tree-optimization
Keywords| |missed-optimization
Summary|[4.3 regression] non-optimal|non-optimal inlining
|inlining heuristics |heuristics pessimizes gzip
|pessimizes gzip SPEC score |SPEC score at -O3
|at -O3 |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (2 preceding siblings ...)
2007-12-10 10:52 ` [Bug tree-optimization/33761] " rguenth at gcc dot gnu dot org
@ 2007-12-10 12:31 ` ubizjak at gmail dot com
2007-12-10 17:12 ` ubizjak at gmail dot com
` (23 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-10 12:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from ubizjak at gmail dot com 2007-12-10 12:31 -------
(In reply to comment #3)
> I don't think this qualifies as a 4.3 regression -
Fair enough. It looks that this problem is specific to Core2.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (3 preceding siblings ...)
2007-12-10 12:31 ` ubizjak at gmail dot com
@ 2007-12-10 17:12 ` ubizjak at gmail dot com
2007-12-10 17:14 ` rguenther at suse dot de
` (22 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-10 17:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from ubizjak at gmail dot com 2007-12-10 17:12 -------
(In reply to comment #4)
> Fair enough. It looks that this problem is specific to Core2.
Here are timings with 'gcc version 4.3.0 20071201 (experimental) [trunk
revision 130554] (GCC)' on
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU X6800 @ 2.93GHz
stepping : 5
cpu MHz : 2933.389
cache size : 4096 KB
-mtune=generic -m32 -O3: 40.763s [*]
-mtune=generic -m32 -O2: 32.170s
-mtune=core2 -m32 -O3 : 36.850s
-mtune=core2 -m32 -O2 : 32.170s
-mtune=generic -m64 -O3: 28.550s
-mtune=generic -m64 -O2: 28.682s
-mtune=core2 -m64 -O3 : 28.670s
-mtune=core2 -m64 -O2 : 28.714s
With __attribute__((noinline)) to longest_match():
-mtune=generic -m32 -O3: 30.658s
-mtune=generic -m32 -O2: 32.154s
-mtune=core2 -m32 -O3 : 30.690s
-mtune=core2 -m32 -O2 : 32.247s
And with FC6 system compiler 'gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)':
-mtune=generic -m32 -O3: 30.154s [**]
-mtune=generic -m32 -O2: 30.275s
Comparing [*] to [**], it _is_ a regression, at least on Core2.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (4 preceding siblings ...)
2007-12-10 17:12 ` ubizjak at gmail dot com
@ 2007-12-10 17:14 ` rguenther at suse dot de
2007-12-10 17:26 ` ubizjak at gmail dot com
` (21 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenther at suse dot de @ 2007-12-10 17:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from rguenther at suse dot de 2007-12-10 17:13 -------
Subject: Re: non-optimal inlining heuristics
pessimizes gzip SPEC score at -O3
On Mon, 10 Dec 2007, ubizjak at gmail dot com wrote:
> (In reply to comment #4)
>
> > Fair enough. It looks that this problem is specific to Core2.
>
> Here are timings with 'gcc version 4.3.0 20071201 (experimental) [trunk
> revision 130554] (GCC)' on
>
> vendor_id : GenuineIntel
> cpu family : 6
> model : 15
> model name : Intel(R) Core(TM)2 CPU X6800 @ 2.93GHz
> stepping : 5
> cpu MHz : 2933.389
> cache size : 4096 KB
>
> -mtune=generic -m32 -O3: 40.763s [*]
> -mtune=generic -m32 -O2: 32.170s
> -mtune=core2 -m32 -O3 : 36.850s
> -mtune=core2 -m32 -O2 : 32.170s
>
> -mtune=generic -m64 -O3: 28.550s
> -mtune=generic -m64 -O2: 28.682s
> -mtune=core2 -m64 -O3 : 28.670s
> -mtune=core2 -m64 -O2 : 28.714s
>
> With __attribute__((noinline)) to longest_match():
>
> -mtune=generic -m32 -O3: 30.658s
> -mtune=generic -m32 -O2: 32.154s
> -mtune=core2 -m32 -O3 : 30.690s
> -mtune=core2 -m32 -O2 : 32.247s
>
> And with FC6 system compiler 'gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)':
>
> -mtune=generic -m32 -O3: 30.154s [**]
> -mtune=generic -m32 -O2: 30.275s
>
> Comparing [*] to [**], it _is_ a regression, at least on Core2.
FSF GCC 4.1 does not have -mtune=generic.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (5 preceding siblings ...)
2007-12-10 17:14 ` rguenther at suse dot de
@ 2007-12-10 17:26 ` ubizjak at gmail dot com
2007-12-11 6:00 ` [Bug tree-optimization/33761] [4.3 regression] " ubizjak at gmail dot com
` (20 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-10 17:26 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from ubizjak at gmail dot com 2007-12-10 17:26 -------
(In reply to comment #6)
> FSF GCC 4.1 does not have -mtune=generic.
OK, OK. Now with 'gcc version 4.1.3 20070716 (prerelease)':
-m32 -O2: 29.306s
-m32 -O3: 29.582s
I don't have 4.2 here.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] [4.3 regression] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (6 preceding siblings ...)
2007-12-10 17:26 ` ubizjak at gmail dot com
@ 2007-12-11 6:00 ` ubizjak at gmail dot com
2007-12-11 6:09 ` steven at gcc dot gnu dot org
` (19 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-11 6:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from ubizjak at gmail dot com 2007-12-11 06:00 -------
Regression at least for 4.3.
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|missed-optimization |
Summary|non-optimal inlining |[4.3 regression] non-optimal
|heuristics pessimizes gzip |inlining heuristics
|SPEC score at -O3 |pessimizes gzip SPEC score
| |at -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] [4.3 regression] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (7 preceding siblings ...)
2007-12-11 6:00 ` [Bug tree-optimization/33761] [4.3 regression] " ubizjak at gmail dot com
@ 2007-12-11 6:09 ` steven at gcc dot gnu dot org
2007-12-11 6:17 ` ubizjak at gmail dot com
` (18 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: steven at gcc dot gnu dot org @ 2007-12-11 6:09 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from steven at gcc dot gnu dot org 2007-12-11 06:09 -------
One of those "regressions" where actually GCC made progress overall. This
should be low priority for GCC 4.3.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] [4.3 regression] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (8 preceding siblings ...)
2007-12-11 6:09 ` steven at gcc dot gnu dot org
@ 2007-12-11 6:17 ` ubizjak at gmail dot com
2008-01-16 17:40 ` [Bug tree-optimization/33761] " hubicka at gcc dot gnu dot org
` (17 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-11 6:17 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from ubizjak at gmail dot com 2007-12-11 06:16 -------
(In reply to comment #9)
> One of those "regressions" where actually GCC made progress overall. This
> should be low priority for GCC 4.3.
Probably it is too early in the morning here, but I can't see any traces of
overall progress, when execution time of the testcase in _this_ PR goes from
30s to 40s.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (9 preceding siblings ...)
2007-12-11 6:17 ` ubizjak at gmail dot com
@ 2008-01-16 17:40 ` hubicka at gcc dot gnu dot org
2008-02-02 16:23 ` hubicka at gcc dot gnu dot org
` (16 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-01-16 17:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from hubicka at gcc dot gnu dot org 2008-01-16 16:46 -------
Last time I looked into it, it was code
alignment affected by inlining in the string matching loop (longest_match).
This code is very atypical, since the internal loop comparing strings is hand
unrolled but it almost never rolls, since the compressed strings tends to be
all different. GCC mispredicts this
moving some stuff out of the loop and bb-reorder aligns the code in a
way that the default path not doing
the loop is jumping pretty far
hurting decode bandwidth of K8 especially because the jumps are hard to
predict.
I don't see any direct things in the code heuristics can use to realize
that the loop is not rooling, except for
special casing the particular
benchmark.
FDO scores of gzip are not doing that bad, but there is still gap
relative to ICC (even archaic version of it
running 32bit compared to 64bit GCC).
http://www.suse.de/~gcctest/SPEC-britten/CINT/sandbox-britten-FDO/index.html
It would be nice to convince gzip/zlibc/bzip2 people to use profiling by
default in the build process - those
packages are ideal targets.
But since core is not that much sensitive to code alignment and nuber of
jumps as K8, perhaps there are extra
problems demonstrated by this.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (10 preceding siblings ...)
2008-01-16 17:40 ` [Bug tree-optimization/33761] " hubicka at gcc dot gnu dot org
@ 2008-02-02 16:23 ` hubicka at gcc dot gnu dot org
2008-02-03 13:40 ` hubicka at gcc dot gnu dot org
` (15 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-02 16:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from hubicka at gcc dot gnu dot org 2008-02-02 16:22 -------
Created an attachment (id=15079)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15079&action=view)
address accumulation patch
While working on PR17863 I wrote the attached patch to make fwprop to combine
code like:
a=base;
*a=something;
a++;
*a=something;
a++;
*a=something;
...
into
*base=something
a=base+1
*a=something
a=base+2
*a=something
....
I dropped it to vangelis and nightly tester shows gzip improvement 815->880.
Gzip internal loop is hand unrolled into similar form as shown above.
(the tester peaks in Jul 2005 with scores somewhat above 900). Since it gzip
results tends to be unstable it would be nice to know how this reproduce on
other targets/setups.
Honza
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (11 preceding siblings ...)
2008-02-02 16:23 ` hubicka at gcc dot gnu dot org
@ 2008-02-03 13:40 ` hubicka at gcc dot gnu dot org
2008-02-03 17:35 ` ubizjak at gmail dot com
` (14 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-03 13:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from hubicka at gcc dot gnu dot org 2008-02-03 13:39 -------
Tonight runs on haydn with patch in shows regression on gzip: 950->901 in
32bit. FDO 64bit runs are not affected.
This is same score as we had in December, we improved a bit since then but not
enough to match score we used to have.
Looks like codegen of the string compare loop is very unstable here.
Uros, would be possible to give it a try on Core? That would help to figure
out if it is code layout problem of K8.
Honza
--
hubicka at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2007-12-10 10:14:39 |2008-02-03 13:39:42
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (12 preceding siblings ...)
2008-02-03 13:40 ` hubicka at gcc dot gnu dot org
@ 2008-02-03 17:35 ` ubizjak at gmail dot com
2008-02-05 13:37 ` hubicka at gcc dot gnu dot org
` (13 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-02-03 17:35 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from ubizjak at gmail dot com 2008-02-03 17:35 -------
(In reply to comment #13)
> Uros, would be possible to give it a try on Core? That would help to figure
> out if it is code layout problem of K8.
Hm, the patch doesn't seem to help:
-m32 -O2: 32.434
-m32 -O2 (patched): 32.586
-m32 -O3: 40.723
-m32 -O3 (patched): 41.059
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (13 preceding siblings ...)
2008-02-03 17:35 ` ubizjak at gmail dot com
@ 2008-02-05 13:37 ` hubicka at gcc dot gnu dot org
2008-02-05 13:56 ` hubicka at gcc dot gnu dot org
` (12 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-05 13:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from hubicka at gcc dot gnu dot org 2008-02-05 13:36 -------
Thanks, looks comparable to K8 scores, except that -O3 is not actually that
worse there. So it looks there is more than just random effect of code layout
involved, I will try to look into the assembly produced more.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (14 preceding siblings ...)
2008-02-05 13:37 ` hubicka at gcc dot gnu dot org
@ 2008-02-05 13:56 ` hubicka at gcc dot gnu dot org
2008-02-06 13:29 ` hubicka at gcc dot gnu dot org
` (11 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-05 13:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from hubicka at gcc dot gnu dot org 2008-02-05 13:55 -------
Thanks, looks comparable to K8 scores, except that -O3 is not actually that
worse there. So it looks there is more than just random effect of code layout
involved, I will try to look into the assembly produced more.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (15 preceding siblings ...)
2008-02-05 13:56 ` hubicka at gcc dot gnu dot org
@ 2008-02-06 13:29 ` hubicka at gcc dot gnu dot org
2008-02-06 16:45 ` hubicka at gcc dot gnu dot org
` (10 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-06 13:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from hubicka at gcc dot gnu dot org 2008-02-06 13:28 -------
One problem is the following:
do {
;
match = window + cur_match;
if (match[best_len] != scan_end ||
match[best_len-1] != scan_end1 ||
*match != *scan ||
*++match != scan[1]) continue;
scan += 2, match++;
do {
} while (*++scan == *++match && *++scan == *++match &&
*++scan == *++match && *++scan == *++match &&
*++scan == *++match && *++scan == *++match &&
*++scan == *++match && *++scan == *++match &&
scan < strend);
....
The internal loop is the string comparsion thingy, while the branch prediction
logic completely misses it: the continue statement looks like it is forming 4
nested loops, so it concludes that this is the internal loop.
We used to have prediction heuristic guessing that continue statement is not
used to form a loop. This was killed when gimplification was introduced.
Perhaps we should bring it back, since this is resonably common scenario.
Looking at longest_match in not unrolled version, the "loops" formed by
continue statement has frequencies: 298, 961, 2139, 3100, 6900, 1000
so every loop is predicted to iterate about twice.
The outer real loop now gets frequency 92, ie small enough to be predicted as
cold. The string comparsion loop now get freuqnecy 344, predicted to iterate 3
times (quite realistically). But because the frequency is so small we end up
allocating one of the two pointers in memory:
.L9:
leal 1(%ecx), %eax
movl %eax, -16(%ebp)
movzbl 1(%ecx), %eax
cmpb 1(%edx), %al
jne .L8
leal 2(%ecx), %eax
movl %eax, -16(%ebp)
movzbl 2(%ecx), %eax
cmpb 2(%edx), %al
jne .L8
leal 3(%ecx), %eax
movl %eax, -16(%ebp)
movzbl 3(%ecx), %eax
cmpb 3(%edx), %al
jne .L8
leal 4(%ecx), %eax
movl %eax, -16(%ebp)
movzbl 4(%ecx), %eax
cmpb 4(%edx), %al
jne .L8
leal 5(%ecx), %eax
movl %eax, -16(%ebp)
movzbl 5(%ecx), %eax
cmpb 5(%edx), %al
jne .L8
This happens in offline copy of longest_match. The inline gets this detail
right, but frequencies of the deflate functions are all crazy, naturally.
I guess I should revive the patch for language scope branch predictors.
Honza
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (16 preceding siblings ...)
2008-02-06 13:29 ` hubicka at gcc dot gnu dot org
@ 2008-02-06 16:45 ` hubicka at gcc dot gnu dot org
2008-02-06 16:57 ` hubicka at gcc dot gnu dot org
` (9 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-06 16:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from hubicka at gcc dot gnu dot org 2008-02-06 16:44 -------
Created an attachment (id=15107)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15107&action=view)
Path to predict_paths_leading_to
Hi,
I've revived the continue heuristic patch. By itself it does not help becuase
of bug in predict_paths_leading_to.
The code looks as follows:
if (test1)
goto continue_block;
if (test2)
goto continue_block;
if (test3)
goto continue_block;
if (test4)
goto continue_block;
goto real_loop_body;
continue_block:
goto loop_header;
We call predict_paths_leading_to on the continue_block and expect that the
continue_block will not be very likely.
What the function does is to find dominator of continue_block that is the
if(test1) block and predict edge from the first block. This is however not
quite enough as all the other paths remain likely.
It seems to me that we need to walk the whole set of BBs postdominated by the
BB and mark all edges forming edge cut defined by this set.
I am testing the attached patch. It makes the function linear (so we are
overall quadratic) for very deep postdominator tree. If this turns out to be
problem, I think we can just cut the computation after some specified amount of
BBs is walked.
Zdenek, does this seem sane?
With this change and continue prediction patch I get sort of sane prediction
for longest_match function. Profile is still quite unrealistic, but I am
testing if it makes noticeable difference.
Honza
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (17 preceding siblings ...)
2008-02-06 16:45 ` hubicka at gcc dot gnu dot org
@ 2008-02-06 16:57 ` hubicka at gcc dot gnu dot org
2008-02-06 18:43 ` ubizjak at gmail dot com
` (8 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-06 16:57 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from hubicka at gcc dot gnu dot org 2008-02-06 16:56 -------
Created an attachment (id=15108)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15108&action=view)
Complete continue heuristic patch
Hi,
this is the complete patch. With this patch we produce profile sane enough so
the internal loops are not marked cold. I will benchmark it probably tomorrow
(I want to wait for the FP changes to show separately).
It fixes the offline copy of longest_match, so we no longer have one of IV
variables at stack:
.L15:
movzbl 2(%edx), %eax
leal 2(%edx), %esi
cmpb 2(%ecx), %al
jne .L8
movzbl 3(%edx), %eax
leal 3(%edx), %esi
cmpb 3(%ecx), %al
jne .L8
movzbl 4(%edx), %eax
leal 4(%edx), %esi
cmpb 4(%ecx), %al
jne .L8
movzbl 5(%edx), %eax
leal 5(%edx), %esi
cmpb 5(%ecx), %al
jne .L8
movzbl 6(%edx), %eax
leal 6(%edx), %esi
cmpb 6(%ecx), %al
jne .L8
movzbl 7(%edx), %eax
leal 7(%edx), %esi
cmpb 7(%ecx), %al
jne .L8
leal 8(%ecx), %eax
movl %eax, %ecx
movzbl 8(%edx), %eax
cmpb (%ecx), %al
leal 8(%edx), %ebx
movl %ebx, %esi
jne .L8
cmpl %ebx, -20(%ebp)
jbe .L8
movl %ebx, %edx
movzbl 1(%edx), %eax
leal 1(%edx), %esi
cmpb 1(%ecx), %al
je .L15
Irronically this can further widen the gap in between -O2 and -O3, since the
inline copy in deflate was always allocated resonably.
Deflate codegen changes quite a lot and because function body is big I will
wait for benchmarks before trying to analyze futher.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (18 preceding siblings ...)
2008-02-06 16:57 ` hubicka at gcc dot gnu dot org
@ 2008-02-06 18:43 ` ubizjak at gmail dot com
2008-02-06 19:11 ` ubizjak at gmail dot com
` (7 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-02-06 18:43 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from ubizjak at gmail dot com 2008-02-06 18:42 -------
Whoa, adding -fomit-frame-pointer brings us from
(gcc -O3 -m32)
user 0m41.031s
to
(gcc -O3 -m32 -fomit-frame-pointer)
user 0m30.006s
Since -fo-f-p adds another free reg, it looks that since inlining increases
register pressure some unlucky heavy-used variable gets allocated to the stack
slot.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (19 preceding siblings ...)
2008-02-06 18:43 ` ubizjak at gmail dot com
@ 2008-02-06 19:11 ` ubizjak at gmail dot com
2008-02-06 19:22 ` hubicka at gcc dot gnu dot org
` (6 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-02-06 19:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from ubizjak at gmail dot com 2008-02-06 19:10 -------
(In reply to comment #20)
> Since -fo-f-p adds another free reg, it looks that since inlining increases
> register pressure some unlucky heavy-used variable gets allocated to the stack
> slot.
It is "best_len" (and probably some others, too):
[uros@localhost gzip-1.2.4]$ grep best_len fp.s
movl %edx, -68(%ebp) #, best_len
movl -68(%ebp), %edx # best_len, best_len.494
movl %edx, -68(%ebp) # best_len.494, best_len
movl -68(%ebp), %edx # best_len,
movl -68(%ebp), %edx # best_len,
movl -68(%ebp), %edx # best_len, best_len.494
cmpl %esi, %edx # lookahead, best_len.494
movl %edx, -108(%ebp) # best_len.494, match_length
movl -68(%ebp), %edx # best_len, best_len.494
movl %edx, -88(%ebp) # prev_length.28, best_len
movl -88(%ebp), %edx # best_len, best_len.457
movl %edx, -88(%ebp) # best_len.457, best_len
movl -88(%ebp), %eax # best_len,
movl -88(%ebp), %edx # best_len,
movl -88(%ebp), %edx # best_len, best_len.457
cmpl %esi, %edx # lookahead, best_len.457
movl %edx, -40(%ebp) # best_len.457, match_length.404
movl -88(%ebp), %edx # best_len, best_len.457
leal (%ecx,%eax), %edx #, best_len.457
cmpl %edx, -88(%ebp) # best_len.457, best_len
cmpl -96(%ebp), %edx # nice_match.34, best_len.457
leal (%ecx,%eax), %edx #, best_len.494
cmpl %edx, -68(%ebp) # best_len.494, best_len
cmpl -76(%ebp), %edx # nice_match.34, best_len.494
[uros@localhost gzip-1.2.4]$ grep best_len no-fp.s
movl %edx, 76(%esp) #, best_len
movl 76(%esp), %edx # best_len,
movl 76(%esp), %edx # best_len, best_len.494
movl %edx, 76(%esp) # best_len.494, best_len
movl 76(%esp), %eax # best_len,
movl 76(%esp), %edx # best_len, best_len.494
movl %edx, %ebp # best_len.494, match_length
movl 76(%esp), %edx # best_len, best_len.494
movl %edx, %ebp # prev_length.28, best_len
movl %ebp, %edx # best_len, best_len.457
movl %edx, %ebp # best_len.457, best_len
movl %ebp, %edx # best_len, best_len.457
cmpl %esi, %edx # lookahead, best_len.457
movl %ebp, %edx # best_len, best_len.457
leal (%ecx,%eax), %edx #, best_len.494
cmpl %edx, 76(%esp) # best_len.494, best_len
cmpl 68(%esp), %edx # nice_match.34, best_len.494
leal (%ecx,%eax), %edx #, best_len.457
cmpl %edx, %ebp # best_len.457, best_len
cmpl 52(%esp), %edx # nice_match.34, best_len.457
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (20 preceding siblings ...)
2008-02-06 19:11 ` ubizjak at gmail dot com
@ 2008-02-06 19:22 ` hubicka at gcc dot gnu dot org
2008-02-07 12:31 ` hubicka at gcc dot gnu dot org
` (5 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-06 19:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from hubicka at gcc dot gnu dot org 2008-02-06 19:22 -------
Yes, there are number of unlucky variables. However the real source is here
seems to be always wrong profile guiding regalloc to optimize for cold portions
of the function rather than real increase of register pressure increase due to
inlining.
In general, inlining operation itself only decrease register pressure: you
don't fix function parameters/return value to fixed registers and you know
precisely what registers survive the body so you don't need to save caller
saved registers when not needed.
The losses from inlining with our regalloc is partly due to callee saved
registers being sometimes more effective sort of immitating live range
splitting. Increased register pressure is effect of propagating from function
body to the rest of program, but it is not that bat either: at least all the
inlining heuristic/RA bugs turned to be something else.
The high speedup by forwprop patch in 64bit mode (and slowdown in 32bit) is
actually also register allocation related: the internal loop consisting of
sequence of ++ operations ends up with extra copy instructions without forwprop
patch, while with the patch we produce normal induction variable. On 32bit it
however results in regalloc putting this variable on stack because its
liferange heuristics gives it lower priority then.
For 32bit data, britten 32-bit SPEC tester peaked at 760, while we now get 620
on peak with -fomit-frame-pointer. 20% regression on rather simple commonly
used codebase definitly makes us look stupid.... More though that ICC 7.x did
820 on same machine. 64bit tester is 830 versus 740 approximately.
Honza
--
hubicka at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hubicka at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (21 preceding siblings ...)
2008-02-06 19:22 ` hubicka at gcc dot gnu dot org
@ 2008-02-07 12:31 ` hubicka at gcc dot gnu dot org
2008-02-08 15:12 ` hubicka at gcc dot gnu dot org
` (4 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-07 12:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from hubicka at gcc dot gnu dot org 2008-02-07 12:30 -------
Created an attachment (id=15115)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15115&action=view)
Annotated profile
I am attaching dump with profile read in. It shows the hot spots in
longest_match at least:
(this is first conditional of the continue guard)
# BLOCK 27 freq:10000 count:1346119696
# PRED: 6 [100.0%] count:112241556 (fallthru) 25 [99.5%] count:1233878140
(true,exec)
# scan_end_13 = PHI <scan_end_106(6), scan_end_14(25)>
# scan_end1_11 = PHI <scan_end1_93(6), scan_end1_12(25)>
# best_len_8 = PHI <best_len_25(6), best_len_9(25)>
# scan_3 = PHI <scan_24(6), scan_6(25)>
# chain_length_2 = PHI <chain_length_108(6), chain_length_105(25)>
# cur_match_1 = PHI <cur_match_109(6), cur_match_104(25)>
match_40 = &window + cur_match_1;
best_len.31_41 = (unsigned int) best_len_8;
D.2379_42 = match_40 + best_len.31_41;
D.2380_43 = *D.2379_42;
if (D.2380_43 != scan_end_13)
goto <bb 10>;
else
goto <bb 7>;
# SUCC: 10 [0.1%] count:33977 (true,exec) 11 [99.9%] count:48979565
(false,exec)
# BLOCK 10 freq:9636 count:1297140131
# PRED: 27 [87.5%] count:1177665163 (true,exec) 7 [55.2%] count:93018627
(true,exec) 8 [35.0%] count:26422364 (true,exec) 9 [0.1%] count:33977
(true,exec)
goto <bb 24>;
(this is the continue statement)
D.2391_102 = cur_match_1 & 32767;
D.2392_103 = prev[D.2391_102];
cur_match_104 = (IPos) D.2392_103;
if (limit_15 >= cur_match_104)
goto <bb 26>;
else
goto <bb 25>;
# SUCC: 26 [7.7%] count:104056913 (true,exec) 25 [92.3%] count:1240391903
(false,exec)
# BLOCK 25 freq:9215 count:1240391903
# PRED: 24 [92.3%] count:1240391903 (false,exec)
chain_length_105 = chain_length_2 + 0x0ffffffff;
if (chain_length_105 != 0)
goto <bb 27>;
else
goto <bb 26>;
(this is end of outer loop)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (22 preceding siblings ...)
2008-02-07 12:31 ` hubicka at gcc dot gnu dot org
@ 2008-02-08 15:12 ` hubicka at gcc dot gnu dot org
2008-02-08 15:40 ` hubicka at gcc dot gnu dot org
` (3 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-08 15:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from hubicka at gcc dot gnu dot org 2008-02-08 15:11 -------
Hi,
the tonight runs with continue heuristics shows again improvements on 64bit
scores , but degradation on 32bit scores. Looking into the loop, the real
trouble seems to be that the main loop has 6 loop carried variables:
scan_end, scan_end1, best_len, scan, chain_length, cur_match
plus few temporaries are needed too. Obviously we can't fit in registers on
i386. Making profile more realistic sometime helps sometimes hurts pretty much
at random basis.
One case where I think register presure is increased is the fact that different
SSA names of both scan_end and scan_end1 variables are actually not fully
coalesced in out-of-SSA. This is result of optimizing:
if (match[best_len] != scan_end ||
match[best_len-1] != scan_end1 ||
*match != *scan ||
*++match != scan[1]) continue;
...later code sometimes modifying scan_end....
into computing match[best_len] into name of scan_end that is sometimes assigned
int the later code on the path not modifying scan_end. As a result we do have
two scan_ends live at once. I wonder if we can avoid this behaviour, though it
looks all right on SSA form, it would save 2 "global" registers: there is no
need at all to cache match[best_len]/match[best_len1] in register unless I
missed something. Those two vars are manipulated on the hot paths through the
loop.
Now the RA is driven by frequencies (bit confused by fact that two of loop
carried vars are split) and by their "liveranges" that is actually number of
instructions in bettween first and last occurence. Since we are bit carelless
on BB ordering moving some code to the very end of function, this heuristics is
not realistic at all. It would probably make more sense to replace it by
number of inssn it is live across, but this is probably ninsn*npseudos to
compute. Other idea would be degree in conflict graph, but I am not sure we
want to start such experiemtns in parallel with YARA.
I tested YARA and it does not handle this situation much better. Perhaps
Vladimir can help?
Honza
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (23 preceding siblings ...)
2008-02-08 15:12 ` hubicka at gcc dot gnu dot org
@ 2008-02-08 15:40 ` hubicka at gcc dot gnu dot org
2008-09-06 12:01 ` hubicka at gcc dot gnu dot org
` (2 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-08 15:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from hubicka at gcc dot gnu dot org 2008-02-08 15:39 -------
-fno-tree-dominator-opts -fno-tree-copyrename solves the coalescing problem
(name is introduced by second, the actual problematic pattern by first pass),
saving roughly 1s at both -O2 and 2s at -O3, -O3 is still worse however
Internal loop no longer spills, just reads val of scan_end stored in memory.
I will play with it more later and make simple testcase for this.
Honza
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (24 preceding siblings ...)
2008-02-08 15:40 ` hubicka at gcc dot gnu dot org
@ 2008-09-06 12:01 ` hubicka at gcc dot gnu dot org
2008-09-06 12:04 ` hubicka at gcc dot gnu dot org
2008-09-06 15:27 ` [Bug tree-optimization/33761] tree-copyrename and tree-dominators pessimizes gzip SPEC score ubizjak at gmail dot com
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-09-06 12:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #26 from hubicka at gcc dot gnu dot org 2008-09-06 12:00 -------
IRA seems to fix the remaining problem with spill in internal loop on 32bit
nicely, so we produce good scores for gzip compared to older GCC versions.
http://gcc.opensuse.org/SPEC-britten/CINT/sandbox-britten-32bit/164_gzip_big.png
and with profile feedback
http://gcc.opensuse.org/SPEC-britten/CINT/sandbox-britten-FDO/164_gzip_big.png
we get close to ICC scores.
We now output comparsion loop as:
.L98:
movzbl 1(%eax), %edx #,
leal 1(%eax), %edi #, scan
cmpb 1(%ecx), %dl #,
jne .L161 #,
movzbl 2(%eax), %edx #,
leal 2(%eax), %edi #, scan
cmpb 2(%ecx), %dl #,
jne .L161 #,
movzbl 3(%eax), %edx #,
leal 3(%eax), %edi #, scan
cmpb 3(%ecx), %dl #,
jne .L161 #,
movzbl 4(%eax), %edx #,
leal 4(%eax), %edi #, scan
cmpb 4(%ecx), %dl #,
jne .L161 #,
movzbl 5(%eax), %edx #,
leal 5(%eax), %edi #, scan
cmpb 5(%ecx), %dl #,
jne .L161 #,
movzbl 6(%eax), %edx #,
leal 6(%eax), %edi #, scan
cmpb 6(%ecx), %dl #,
jne .L161 #,
movzbl 7(%eax), %edx #,
leal 7(%eax), %edi #, scan
cmpb 7(%ecx), %dl #,
jne .L161 #,
there is still room for improvement however.
Remaining problem is that we still miss coaliescing of scan_end and scan_end1
(so -fno-tree-dominator-opts -fno-tree-copyrename still helps).
Vladimir, perhaps this can be solved in IRA too?
Honza
--
hubicka at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (25 preceding siblings ...)
2008-09-06 12:01 ` hubicka at gcc dot gnu dot org
@ 2008-09-06 12:04 ` hubicka at gcc dot gnu dot org
2008-09-06 15:27 ` [Bug tree-optimization/33761] tree-copyrename and tree-dominators pessimizes gzip SPEC score ubizjak at gmail dot com
27 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-09-06 12:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #27 from hubicka at gcc dot gnu dot org 2008-09-06 12:02 -------
Also just noticed that offline copy of longest-match get extra move:
.L15:
movzbl 2(%eax), %edi #, tmp87
leal 2(%eax), %ecx #, scan.158
movl %edi, %edx # tmp87,
cmpb 2(%ebx), %dl #,
jne .L6 #,
movzbl 3(%eax), %edi #, tmp88
leal 3(%eax), %ecx #, scan.158
movl %edi, %edx # tmp88,
cmpb 3(%ebx), %dl #,
jne .L6 #,
movzbl 4(%eax), %edi #, tmp89
leal 4(%eax), %ecx #, scan.158
movl %edi, %edx # tmp89,
cmpb 4(%ebx), %dl #,
jne .L6 #,
movzbl 5(%eax), %edi #, tmp90
leal 5(%eax), %ecx #, scan.158
movl %edi, %edx # tmp90,
cmpb 5(%ebx), %dl #,
jne .L6 #,
while inlined copy is fine:
.L98:
movzbl 1(%eax), %edx #,
leal 1(%eax), %edi #, scan
cmpb 1(%ecx), %dl #,
jne .L161 #,
movzbl 2(%eax), %edx #,
leal 2(%eax), %edi #, scan
cmpb 2(%ecx), %dl #,
jne .L161 #,
movzbl 3(%eax), %edx #,
leal 3(%eax), %edi #, scan
cmpb 3(%ecx), %dl #,
jne .L161 #,
movzbl 4(%eax), %edx #,
leal 4(%eax), %edi #, scan
cmpb 4(%ecx), %dl #,
jne .L161 #,
interesting :)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33761] tree-copyrename and tree-dominators pessimizes gzip SPEC score
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
` (26 preceding siblings ...)
2008-09-06 12:04 ` hubicka at gcc dot gnu dot org
@ 2008-09-06 15:27 ` ubizjak at gmail dot com
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-09-06 15:27 UTC (permalink / raw)
To: gcc-bugs
------- Comment #28 from ubizjak at gmail dot com 2008-09-06 15:26 -------
(In reply to comment #27)
> Also just noticed that offline copy of longest-match get extra move:
> .L15:
> movzbl 2(%eax), %edi #, tmp87
> leal 2(%eax), %ecx #, scan.158
> movl %edi, %edx # tmp87,
> cmpb 2(%ebx), %dl #,
> jne .L6 #,
>
> interesting :)
Perhaps due to ineffective regmove (similar to PR 37364)?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2008-09-06 15:27 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-13 11:27 [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3 ubizjak at gmail dot com
2007-10-13 12:31 ` [Bug target/33761] " rguenth at gcc dot gnu dot org
2007-12-10 10:14 ` [Bug target/33761] [4.3 regression] " ubizjak at gmail dot com
2007-12-10 10:52 ` [Bug tree-optimization/33761] " rguenth at gcc dot gnu dot org
2007-12-10 12:31 ` ubizjak at gmail dot com
2007-12-10 17:12 ` ubizjak at gmail dot com
2007-12-10 17:14 ` rguenther at suse dot de
2007-12-10 17:26 ` ubizjak at gmail dot com
2007-12-11 6:00 ` [Bug tree-optimization/33761] [4.3 regression] " ubizjak at gmail dot com
2007-12-11 6:09 ` steven at gcc dot gnu dot org
2007-12-11 6:17 ` ubizjak at gmail dot com
2008-01-16 17:40 ` [Bug tree-optimization/33761] " hubicka at gcc dot gnu dot org
2008-02-02 16:23 ` hubicka at gcc dot gnu dot org
2008-02-03 13:40 ` hubicka at gcc dot gnu dot org
2008-02-03 17:35 ` ubizjak at gmail dot com
2008-02-05 13:37 ` hubicka at gcc dot gnu dot org
2008-02-05 13:56 ` hubicka at gcc dot gnu dot org
2008-02-06 13:29 ` hubicka at gcc dot gnu dot org
2008-02-06 16:45 ` hubicka at gcc dot gnu dot org
2008-02-06 16:57 ` hubicka at gcc dot gnu dot org
2008-02-06 18:43 ` ubizjak at gmail dot com
2008-02-06 19:11 ` ubizjak at gmail dot com
2008-02-06 19:22 ` hubicka at gcc dot gnu dot org
2008-02-07 12:31 ` hubicka at gcc dot gnu dot org
2008-02-08 15:12 ` hubicka at gcc dot gnu dot org
2008-02-08 15:40 ` hubicka at gcc dot gnu dot org
2008-09-06 12:01 ` hubicka at gcc dot gnu dot org
2008-09-06 12:04 ` hubicka at gcc dot gnu dot org
2008-09-06 15:27 ` [Bug tree-optimization/33761] tree-copyrename and tree-dominators pessimizes gzip SPEC score ubizjak at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).