* [Bug rtl-optimization/21485] [4.0/4.1 Regression] BYTEmark numsort: codegen regression with -O3
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
@ 2005-10-31 3:21 ` mmitchel at gcc dot gnu dot org
2005-10-31 3:37 ` pinskia at gcc dot gnu dot org
` (32 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-10-31 3:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from mmitchel at gcc dot gnu dot org 2005-10-31 03:21 -------
We need more analysis on these kinds of issues.
So, we're doing a worse job on register allocation. Is that because the
register allocator got worse, or because we're giving it a harder problem to
solve? If the latter, what's responsible, and is there anything we can do
about it? We need that kind of information to make an intelligent decision
about whether or not we should try to fix this, or just chalk it up to the fact
that there's always variability between releases.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/21485] [4.0/4.1 Regression] BYTEmark numsort: codegen regression with -O3
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
2005-10-31 3:21 ` [Bug rtl-optimization/21485] [4.0/4.1 Regression] BYTEmark numsort: codegen regression with -O3 mmitchel at gcc dot gnu dot org
@ 2005-10-31 3:37 ` pinskia at gcc dot gnu dot org
2006-01-09 22:23 ` [Bug rtl-optimization/21485] [4.0/4.1/4.2 " steven at gcc dot gnu dot org
` (31 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-10-31 3:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from pinskia at gcc dot gnu dot org 2005-10-31 03:37 -------
(In reply to comment #11)
> So, we're doing a worse job on register allocation. Is that because the
> register allocator got worse, or because we're giving it a harder problem to
> solve? If the latter, what's responsible, and is there anything we can do
> about it?
It is the latter and the pass which is responsible is tree PRE but if someone
writes the code like what tree PRE gives, we will still have the same issue so
the only correct place to fix this would be in RA.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/21485] [4.0/4.1/4.2 Regression] BYTEmark numsort: codegen regression with -O3
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
2005-10-31 3:21 ` [Bug rtl-optimization/21485] [4.0/4.1 Regression] BYTEmark numsort: codegen regression with -O3 mmitchel at gcc dot gnu dot org
2005-10-31 3:37 ` pinskia at gcc dot gnu dot org
@ 2006-01-09 22:23 ` steven at gcc dot gnu dot org
2006-02-16 19:46 ` [Bug rtl-optimization/21485] [4.0/4.1/4.2 Regression] codegen regression due to PRE increasing register pressure dberlin at gcc dot gnu dot org
` (30 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: steven at gcc dot gnu dot org @ 2006-01-09 22:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from steven at gcc dot gnu dot org 2006-01-09 22:23 -------
AMD64 timings for today's GCC 4.1 (20060109) branch:
flags used score
for compilation (avg. of 3, higher is better)
-O3 685.2
-O3 -fprofile-generate 636
-O3 -fprofile-use 785.6
-O3 -fno-tree-pre 958.4
-O3 -fno-tree-pre -fprofile-generate 673.8
-O3 -fno-tree-pre -fprofile-use 964.2
-O3 -fno-tree-pre -fmove-loop-invariants 967.2
The problem really is that we present the register allocator with a harder
problem. Tough, there is really not much that can be done about this other
than crippling PRE for SMALL_REGISTER_CLASSES machines, but I don't know if we
should want to do that...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/21485] [4.0/4.1/4.2 Regression] codegen regression due to PRE increasing register pressure
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2006-01-09 22:23 ` [Bug rtl-optimization/21485] [4.0/4.1/4.2 " steven at gcc dot gnu dot org
@ 2006-02-16 19:46 ` dberlin at gcc dot gnu dot org
2006-02-24 0:31 ` mmitchel at gcc dot gnu dot org
` (29 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: dberlin at gcc dot gnu dot org @ 2006-02-16 19:46 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from dberlin at gcc dot gnu dot org 2006-02-16 19:46 -------
So uh, has anyone figured out why we don't get 1500 with -fno-tree-pre?
Does -fno-gcse help too?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/21485] [4.0/4.1/4.2 Regression] codegen regression due to PRE increasing register pressure
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2006-02-16 19:46 ` [Bug rtl-optimization/21485] [4.0/4.1/4.2 Regression] codegen regression due to PRE increasing register pressure dberlin at gcc dot gnu dot org
@ 2006-02-24 0:31 ` mmitchel at gcc dot gnu dot org
2006-03-10 4:41 ` pinskia at gcc dot gnu dot org
` (28 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-02-24 0:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from mmitchel at gcc dot gnu dot org 2006-02-24 00:25 -------
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.0.3 |4.1.1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/21485] [4.0/4.1/4.2 Regression] codegen regression due to PRE increasing register pressure
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2006-02-24 0:31 ` mmitchel at gcc dot gnu dot org
@ 2006-03-10 4:41 ` pinskia at gcc dot gnu dot org
2006-03-10 13:56 ` pinskia at gcc dot gnu dot org
` (27 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-03-10 4:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from pinskia at gcc dot gnu dot org 2006-03-10 04:41 -------
Hmm, I see a missed optimization here (I want to say a PRE one too).
D.1521_32 = k_4 * 4;
D.1525_37 = (long int *) D.1521_32;
D.1526_38 = D.1525_37 + array_9;
D.1527_39 = D.1526_38 + 4B;
# VUSE <SMT.4_42>;
D.1528_40 = *D.1527_39;
if (prephitmp.27_35 < D.1528_40) goto <L2>; else goto <L3>;
<L2>:;
k_41 = k_4 + 1;
pretmp.22_23 = k_41 * 4;
pretmp.24_6 = (long int *) pretmp.22_23;
prephitmp.25_48 = pretmp.24_6 + array_9;
# VUSE <SMT.4_42>;
prephitmp.27_51 = *prephitmp.25_48;
prephitmp.27_51 here is also the same as D.1528_40. Though I don't know how to
figure this out in PRE or any other optimization. I almost want to say this is
the real cause. Looking at 3.4's asm, you don't have the extra load but for
4.0 and above you do.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/21485] [4.0/4.1/4.2 Regression] codegen regression due to PRE increasing register pressure
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2006-03-10 4:41 ` pinskia at gcc dot gnu dot org
@ 2006-03-10 13:56 ` pinskia at gcc dot gnu dot org
2006-03-10 14:10 ` pinskia at gcc dot gnu dot org
` (26 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-03-10 13:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from pinskia at gcc dot gnu dot org 2006-03-10 13:56 -------
What happens to the time if you replace that function with:
void
NumSift (long *array, unsigned long i, unsigned long j)
{
unsigned long k;
while ((i + i) <= j)
{
k = i + i;
long t, t1;
t = array[k];
if (k < j)
{
t1 = array[k+1];
if (t < t1)
++k, t = t1;
}
t1 = array[i];
if (t1 < t)
{
array[k] = t1;
array[i] = t;
i = k;
}
else
i = j + 1;
}
return;
}
-----------
The semantics should be the same, I just pulled out the PREs as much as I
could.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/21485] [4.0/4.1/4.2 Regression] codegen regression due to PRE increasing register pressure
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2006-03-10 13:56 ` pinskia at gcc dot gnu dot org
@ 2006-03-10 14:10 ` pinskia at gcc dot gnu dot org
2006-03-10 14:13 ` pinskia at gcc dot gnu dot org
` (25 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-03-10 14:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from pinskia at gcc dot gnu dot org 2006-03-10 14:10 -------
(In reply to comment #17)
> What happens to the time if you replace that function with:
This helps about 5% but it does not get the score back up.
3.4.0's score for this machine is about 900.
4.1.0's score is 720.
4.1.0+scource modification is about 780.
so it helps but there is something else which needs to be improved still.
I don't think this is a RA issue now after looking into the source.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|ra |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/21485] [4.0/4.1/4.2 Regression] codegen regression due to PRE increasing register pressure
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2006-03-10 14:10 ` pinskia at gcc dot gnu dot org
@ 2006-03-10 14:13 ` pinskia at gcc dot gnu dot org
2006-05-25 2:39 ` [Bug tree-optimization/21485] " mmitchel at gcc dot gnu dot org
` (24 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-03-10 14:13 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from pinskia at gcc dot gnu dot org 2006-03-10 14:13 -------
(In reply to comment #18)
> (In reply to comment #17)
> > What happens to the time if you replace that function with:
> This helps about 5% but it does not get the score back up.
Also the asm is about the same for this function, at least on i686 between
3.4.0 and the source modified version with 4.1.0.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.0/4.1/4.2 Regression] codegen regression due to PRE increasing register pressure
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (8 preceding siblings ...)
2006-03-10 14:13 ` pinskia at gcc dot gnu dot org
@ 2006-05-25 2:39 ` mmitchel at gcc dot gnu dot org
2007-02-14 9:11 ` [Bug tree-optimization/21485] [4.0/4.1/4.2/4.3 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really) mmitchel at gcc dot gnu dot org
` (23 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-05-25 2:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from mmitchel at gcc dot gnu dot org 2006-05-25 02:32 -------
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.1.1 |4.1.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.0/4.1/4.2/4.3 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (9 preceding siblings ...)
2006-05-25 2:39 ` [Bug tree-optimization/21485] " mmitchel at gcc dot gnu dot org
@ 2007-02-14 9:11 ` mmitchel at gcc dot gnu dot org
2007-05-28 7:07 ` pinskia at gcc dot gnu dot org
` (22 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-02-14 9:11 UTC (permalink / raw)
To: gcc-bugs
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.1.2 |4.1.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.0/4.1/4.2/4.3 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (10 preceding siblings ...)
2007-02-14 9:11 ` [Bug tree-optimization/21485] [4.0/4.1/4.2/4.3 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really) mmitchel at gcc dot gnu dot org
@ 2007-05-28 7:07 ` pinskia at gcc dot gnu dot org
2007-06-18 6:13 ` pinskia at gcc dot gnu dot org
` (21 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-05-28 7:07 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from pinskia at gcc dot gnu dot org 2007-05-28 07:06 -------
The missed load PRE for the testcase below is fixed on the pointer plus banch
as the addition is done in "unsigned int" for both the "++k" and the
array[k+1]. If we change "++k" into "k+=2" and array[k+1] into array[k+2], we
run into another missed PRE issue which can be shown by the following testcase
which we don't optimize currently:
int f(int a, int b)
{
int c = a+2;
int d = c*2;
int e = a*2;
int f = e+2;
return d == f;
}
Which I will file seperately.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.0/4.1/4.2/4.3 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (11 preceding siblings ...)
2007-05-28 7:07 ` pinskia at gcc dot gnu dot org
@ 2007-06-18 6:13 ` pinskia at gcc dot gnu dot org
2007-07-04 21:01 ` [Bug tree-optimization/21485] [4.0/4.1/4.2 " pinskia at gcc dot gnu dot org
` (20 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-06-18 6:13 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from pinskia at gcc dot gnu dot org 2007-06-18 06:12 -------
This is basically fixed by the pointer_plus except we still have some
combinable code (though this is not PRE's fault); see
http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01996.html for how to fix that
issue.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.0/4.1/4.2 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (12 preceding siblings ...)
2007-06-18 6:13 ` pinskia at gcc dot gnu dot org
@ 2007-07-04 21:01 ` pinskia at gcc dot gnu dot org
2008-06-15 13:03 ` [Bug tree-optimization/21485] [4.1/4.2 " wbrana at gmail dot com
` (19 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-07-04 21:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from pinskia at gcc dot gnu dot org 2007-07-04 21:01 -------
I am going to declare this as fixed for 4.3, pointer plus gets the original
case correctly and the secondary issues are all file seperately.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.0/4.1/4.2/4.3 Regression]|[4.0/4.1/4.2 Regression]
|codegen regression due to |codegen regression due to
|PRE increasing register |PRE increasing register
|pressure (missing load PRE |pressure (missing load PRE
|really) |really)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.1/4.2 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (13 preceding siblings ...)
2007-07-04 21:01 ` [Bug tree-optimization/21485] [4.0/4.1/4.2 " pinskia at gcc dot gnu dot org
@ 2008-06-15 13:03 ` wbrana at gmail dot com
2008-06-29 10:56 ` wbrana at gmail dot com
` (18 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: wbrana at gmail dot com @ 2008-06-15 13:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from wbrana at gmail dot com 2008-06-15 13:02 -------
It seems to not be fixed in 4.3.1:
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 1545.8 : 39.64 : 13.02
STRING SORT : 380.16 : 169.87 : 26.29
BITFIELD : 6.014e+08 : 103.16 : 21.55
FP EMULATION : 332.88 : 159.73 : 36.86
FOURIER : 32962 : 37.49 : 21.06
ASSIGNMENT : 57.851 : 220.13 : 57.10
IDEA : 9372 : 143.34 : 42.56
HUFFMAN : 3234.8 : 89.70 : 28.64
NEURAL NET : 72.571 : 116.58 : 49.04
LU DECOMPOSITION : 2656.5 : 137.62 : 99.37
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 117.762
FLOATING-POINT INDEX: 84.407
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Core(TM)2 Duo CPU E6750 @
2.66GHz 3200MHz
L2 Cache : 4096 KB
OS : Linux 2.6.25.6
C compiler : 4.3.1
libc :
MEMORY INDEX : 31.863
INTEGER INDEX : 27.656
FLOATING-POINT INDEX: 46.816
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 2463.4 : 63.18 : 20.75
STRING SORT : 374.8 : 167.47 : 25.92
BITFIELD : 5.9818e+08 : 102.61 : 21.43
FP EMULATION : 268.64 : 128.91 : 29.75
FOURIER : 43176 : 49.10 : 27.58
ASSIGNMENT : 58.68 : 223.29 : 57.92
IDEA : 5413.5 : 82.80 : 24.58
HUFFMAN : 3198.7 : 88.70 : 28.32
NEURAL NET : 58.29 : 93.64 : 39.39
LU DECOMPOSITION : 2380.6 : 123.33 : 89.05
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 112.599
FLOATING-POINT INDEX: 82.767
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Core(TM)2 Duo CPU E6750 @
2.66GHz 3200MHz
L2 Cache : 4096 KB
OS : Linux 2.6.25.6
C compiler : gcc version 3.4.6 (Gentoo 3.4.6-r2 p1.5, ssp-3.4.6-1.0,
pie-8.7.10)
libc :
MEMORY INDEX : 31.806
INTEGER INDEX : 25.604
FLOATING-POINT INDEX: 45.906
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.1/4.2 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (14 preceding siblings ...)
2008-06-15 13:03 ` [Bug tree-optimization/21485] [4.1/4.2 " wbrana at gmail dot com
@ 2008-06-29 10:56 ` wbrana at gmail dot com
2008-07-04 16:53 ` [Bug tree-optimization/21485] [4.2 " jsm28 at gcc dot gnu dot org
` (17 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: wbrana at gmail dot com @ 2008-06-29 10:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from wbrana at gmail dot com 2008-06-29 10:55 -------
I think "4.3" is missing in "Summary" and "4.3.1" in "Known to fail".
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (15 preceding siblings ...)
2008-06-29 10:56 ` wbrana at gmail dot com
@ 2008-07-04 16:53 ` jsm28 at gcc dot gnu dot org
2008-09-28 18:01 ` wbrana at gmail dot com
` (16 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2008-07-04 16:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #26 from jsm28 at gcc dot gnu dot org 2008-07-04 16:52 -------
Closing 4.1 branch.
--
jsm28 at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.1/4.2 Regression] codegen|[4.2 Regression] codegen
|regression due to PRE |regression due to PRE
|increasing register pressure|increasing register pressure
|(missing load PRE really) |(missing load PRE really)
Target Milestone|4.1.3 |4.2.5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (16 preceding siblings ...)
2008-07-04 16:53 ` [Bug tree-optimization/21485] [4.2 " jsm28 at gcc dot gnu dot org
@ 2008-09-28 18:01 ` wbrana at gmail dot com
2008-10-03 23:19 ` [Bug tree-optimization/21485] [4.2/4.3/4.4 " rguenth at gcc dot gnu dot org
` (15 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: wbrana at gmail dot com @ 2008-09-28 18:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #27 from wbrana at gmail dot com 2008-09-28 18:00 -------
gcc 4.3.2, -march=core2 instead of -march=nocona
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 1516.3 : 38.89 : 12.77
STRING SORT : 381.24 : 170.35 : 26.37
BITFIELD : 6.0164e+08 : 103.20 : 21.56
FP EMULATION : 332 : 159.31 : 36.76
FOURIER : 32687 : 37.17 : 20.88
ASSIGNMENT : 57.93 : 220.44 : 57.18
IDEA : 9380 : 143.46 : 42.60
HUFFMAN : 3276.1 : 90.85 : 29.01
NEURAL NET : 72.251 : 116.07 : 48.82
LU DECOMPOSITION : 2641.5 : 136.84 : 98.81
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 117.698
FLOATING-POINT INDEX: 83.889
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Core(TM)2 Duo CPU E6750 @
2.66GHz 3200MHz
L2 Cache : 4096 KB
OS : Linux 2.6.26.5
C compiler : gcc-4.3.2
libc :
MEMORY INDEX : 31.912
INTEGER INDEX : 27.598
FLOATING-POINT INDEX: 46.528
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (17 preceding siblings ...)
2008-09-28 18:01 ` wbrana at gmail dot com
@ 2008-10-03 23:19 ` rguenth at gcc dot gnu dot org
2008-10-03 23:20 ` rguenth at gcc dot gnu dot org
` (14 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-10-03 23:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #28 from rguenth at gcc dot gnu dot org 2008-10-03 23:18 -------
nbench 2.2.3 numeric sort test executes 40% less iterations per second
when compiled with 4.4 snapshot than with 3.4.6
iterations/s - version
2439 - 3.4.6
1530 - 4.4.0 20080926 (experimental)
1526 - 4.3.2
1580 - 4.2.4
CFLAGS = -s -static -Wall -O3 -g0 -march=nocona -fomit-frame-pointer
-funroll-loops -ffast-math
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 2439.1 : 62.55 : 20.54
STRING SORT : 373.28 : 166.79 : 25.82
BITFIELD : 5.9879e+08 : 102.71 : 21.45
FP EMULATION : 267.84 : 128.52 : 29.66
FOURIER : 43702 : 49.70 : 27.92
ASSIGNMENT : 56.657 : 215.59 : 55.92
IDEA : 5407.7 : 82.71 : 24.56
HUFFMAN : 3204.3 : 88.86 : 28.37
NEURAL NET : 57.485 : 92.35 : 38.84
LU DECOMPOSITION : 2363.5 : 122.44 : 88.41
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 111.792
FLOATING-POINT INDEX: 82.519
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Core(TM)2 Duo CPU E6750 @
2.66GHz 3200MHz
L2 Cache : 4096 KB
OS : Linux 2.6.26.5
C compiler : gcc version 3.4.6 (Gentoo 3.4.6-r2 p1.5, ssp-3.4.6-1.0,
pie-8.7.10)
libc : libc-2.8.so
MEMORY INDEX : 31.404
INTEGER INDEX : 25.525
FLOATING-POINT INDEX: 45.768
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 1530.9 : 39.26 : 12.89
STRING SORT : 372.64 : 166.51 : 25.77
BITFIELD : 6.0348e+08 : 103.52 : 21.62
FP EMULATION : 310.4 : 148.94 : 34.37
FOURIER : 31760 : 36.12 : 20.29
ASSIGNMENT : 48.361 : 184.02 : 47.73
IDEA : 9204 : 140.77 : 41.80
HUFFMAN : 3554.3 : 98.56 : 31.47
NEURAL NET : 73.882 : 118.69 : 49.92
LU DECOMPOSITION : 2322.2 : 120.30 : 86.87
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 114.457
FLOATING-POINT INDEX: 80.190
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Core(TM)2 Duo CPU E6750 @
2.66GHz 3200MHz
L2 Cache : 4096 KB
OS : Linux 2.6.26.5
C compiler : gcc version 4.4.0 20080926 (experimental) (GCC)
libc : libc-2.8.so
MEMORY INDEX : 29.851
INTEGER INDEX : 27.632
FLOATING-POINT INDEX: 44.477
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 1526.2 : 39.14 : 12.85
STRING SORT : 374.28 : 167.24 : 25.89
BITFIELD : 6.0262e+08 : 103.37 : 21.59
FP EMULATION : 320.64 : 153.86 : 35.50
FOURIER : 33352 : 37.93 : 21.30
ASSIGNMENT : 57.834 : 220.07 : 57.08
IDEA : 9288 : 142.06 : 42.18
HUFFMAN : 3211 : 89.04 : 28.43
NEURAL NET : 74.69 : 119.98 : 50.47
LU DECOMPOSITION : 2655.7 : 137.58 : 99.35
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 116.415
FLOATING-POINT INDEX: 85.548
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Core(TM)2 Duo CPU E6750 @
2.66GHz 3200MHz
L2 Cache : 4096 KB
OS : Linux 2.6.26.5
C compiler : gcc-4.3.2
libc : libc-2.8.so
MEMORY INDEX : 31.716
INTEGER INDEX : 27.199
FLOATING-POINT INDEX: 47.448
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 1580.2 : 40.52 : 13.31
STRING SORT : 359.88 : 160.80 : 24.89
BITFIELD : 5.9462e+08 : 102.00 : 21.30
FP EMULATION : 291.28 : 139.77 : 32.25
FOURIER : 32567 : 37.04 : 20.80
ASSIGNMENT : 57.097 : 217.26 : 56.35
IDEA : 7876.8 : 120.47 : 35.77
HUFFMAN : 3433.1 : 95.20 : 30.40
NEURAL NET : 72.091 : 115.81 : 48.71
LU DECOMPOSITION : 2664.4 : 138.03 : 99.67
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 112.739
FLOATING-POINT INDEX: 83.966
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Core(TM)2 Duo CPU E6750 @
2.66GHz 3200MHz
L2 Cache : 4096 KB
OS : Linux 2.6.26.5
C compiler : gcc-4.2.4
libc : libc-2.8.so
MEMORY INDEX : 31.032
INTEGER INDEX : 26.138
FLOATING-POINT INDEX: 46.571
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to fail|4.0.0 4.0.4 4.1.0 |4.0.0 4.0.4 4.1.0 4.2.4
| |4.3.2 4.4.0
Summary|[4.2 Regression] codegen |[4.2/4.3/4.4 Regression]
|regression due to PRE |codegen regression due to
|increasing register pressure|PRE increasing register
|(missing load PRE really) |pressure (missing load PRE
| |really)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (18 preceding siblings ...)
2008-10-03 23:19 ` [Bug tree-optimization/21485] [4.2/4.3/4.4 " rguenth at gcc dot gnu dot org
@ 2008-10-03 23:20 ` rguenth at gcc dot gnu dot org
2008-10-03 23:56 ` [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE rguenth at gcc dot gnu dot org
` (13 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-10-03 23:20 UTC (permalink / raw)
To: gcc-bugs
------- Comment #29 from rguenth at gcc dot gnu dot org 2008-10-03 23:18 -------
*** Bug 37732 has been marked as a duplicate of this bug. ***
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (19 preceding siblings ...)
2008-10-03 23:20 ` rguenth at gcc dot gnu dot org
@ 2008-10-03 23:56 ` rguenth at gcc dot gnu dot org
2008-10-04 0:14 ` pinskia at gcc dot gnu dot org
` (12 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-10-03 23:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #30 from rguenth at gcc dot gnu dot org 2008-10-03 23:54 -------
Comment #6 still applies. On the trunk we do not fully exploit the partial
redundant load of array[k] in
if (k < j)
if (array[k] < array[k + 1L])
++k;
if (array[i] < array[k])
but we transform the above to
if (k < j)
{
tmp1 = array[k];
tmp2 = array[k+1];
if (tmp1 < tmp2)
++k;
else
tmp1 = array[k+1];
}
if (array[i] < tmp1)
missing the full redundancy of array[k+1].
Reduced testcase:
long
NumSift (long *array, unsigned long k)
{
if (array[k] < array[k + 1L])
++k;
return array[k];
}
with integer k it gets somewhat more complicated even.
The question is whether this explains the slowdown compared to GCC 3.4.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2006-07-05 09:23:46 |2008-10-03 23:54:40
date| |
Summary|[4.2/4.3/4.4 Regression] |[4.2/4.3/4.4 Regression]
|codegen regression due to |missed load PRE
|PRE increasing register |
|pressure (missing load PRE |
|really) |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (20 preceding siblings ...)
2008-10-03 23:56 ` [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE rguenth at gcc dot gnu dot org
@ 2008-10-04 0:14 ` pinskia at gcc dot gnu dot org
2008-10-04 0:59 ` [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE, PRE makes i?86 suck rguenth at gcc dot gnu dot org
` (11 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-10-04 0:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #31 from pinskia at gcc dot gnu dot org 2008-10-04 00:13 -------
the reduced testcase in comment #30 is optimized by DOM3 though not by PRE.
Running PRE again right after the first PRE still founds more PREable
expressions for this testcase ...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (21 preceding siblings ...)
2008-10-04 0:14 ` pinskia at gcc dot gnu dot org
@ 2008-10-04 0:59 ` rguenth at gcc dot gnu dot org
2008-10-04 9:23 ` wbrana at gmail dot com
` (10 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-10-04 0:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #32 from rguenth at gcc dot gnu dot org 2008-10-04 00:57 -------
As we PHI-translate k_1 * 4 we are not able to find D.1237_7 * 4 in the
SCCVN tables. So we allocate a new value-id for it. Oops. This is because
once we say its type is unsigned int and once it's unsigned long. And
we require types to be pointer-equal in the hash comparison fn.
I have a patch to fix this which makes this run even more slow. -fno-tree-pre
fixes the speed regression.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.2/4.3/4.4 Regression] |[4.2/4.3/4.4 Regression]
|missed load PRE |missed load PRE, PRE makes
| |i?86 suck
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (22 preceding siblings ...)
2008-10-04 0:59 ` [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE, PRE makes i?86 suck rguenth at gcc dot gnu dot org
@ 2008-10-04 9:23 ` wbrana at gmail dot com
2008-10-04 15:13 ` rguenth at gcc dot gnu dot org
` (9 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: wbrana at gmail dot com @ 2008-10-04 9:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #33 from wbrana at gmail dot com 2008-10-04 09:22 -------
results with -fno-tree-pre
1749 - 4.4.0 20080926 (experimental)
1701 - 4.3.2
2476 - 4.2.4
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (23 preceding siblings ...)
2008-10-04 9:23 ` wbrana at gmail dot com
@ 2008-10-04 15:13 ` rguenth at gcc dot gnu dot org
2008-10-04 16:00 ` rguenth at gcc dot gnu dot org
` (8 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-10-04 15:13 UTC (permalink / raw)
To: gcc-bugs
------- Comment #34 from rguenth at gcc dot gnu dot org 2008-10-04 15:11 -------
Fastest result on a Intel Core Duo with
gcc-4.1 -O3 -fomit-frame-pointer -fno-tree-pre -fno-inline -fschedule-insns:
1273
the interesting thing is that with the above we if-convert
if (array[k] < array[k + 1L])
++k;
using setl which reduces the burden of the branch predictor which in the worst
case (trunk) has quite a number of mispredicts. The following is branches
retired vs. mispredicted branches retired for trunk (with PRE enabled)
* CPU: Core Solo / Duo, speed 1833 MHz (estimated)
* Counted BR_INST_RETIRED events (number of branch instructions retired) with
a
unit mask of 0x00 (No unit mask) count 10000
* Counted BR_MISS_PRED_RETIRED events (number of mispredicted branches
retired)
with a unit mask of 0x00 (No unit mask) count 10000
080486d0 <NumSift>: /* NumSift total: 188708 95.2681 21424 99.9953 */
752 0.3796 0 0 : 80486d0: push %ebp
: 80486d1: push %edi
: 80486d2: push %esi
824 0.4160 0 0 : 80486d3: push %ebx
5 0.0025 0 0 : 80486d4: sub $0xc,%esp
: 80486d7: mov %ecx,(%esp)
1541 0.7780 0 0 : 80486da: add $0x1,%ecx
: 80486dd: mov %ecx,0x8(%esp)
: 80486e1: lea 0x0(%esi),%esi
709 0.3579 2 0.0093 : 80486e8: lea (%edx,%edx,1),%ecx
1706 0.8613 1 0.0047 : 80486eb: cmp (%esp),%ecx
3083 1.5564 924 4.3127 : 80486ee: mov %ecx,%edi
92 0.0464 0 0 : 80486f0: lea (%eax,%edx,8),%ebp
: 80486f3: mov %ebp,%ebx
868 0.4382 13 0.0607 : 80486f5: ja 804871d <NumSift+0x4d>
5732 2.8938 0 0 : 80486f7: jb 8048728 <NumSift+0x58>
2 0.0010 0 0 : 80486f9: mov (%ebx),%esi
7789 3.9322 162 0.7561 : 80486fb: lea (%eax,%edx,4),%ecx
34575 17.4550 6534 30.4971 : 80486fe: mov 0x8(%esp),%edx
3070 1.5499 2103 9.8156 : 8048702: mov (%ecx),%ebp
8244 4.1619 134 0.6254 : 8048704: cmp %esi,%ebp
2322 1.1722 155 0.7235 : 8048706: jge 80486e8 <NumSift+0x18>
1363 0.6881 236 1.1015 : 8048708: mov %edi,%edx
3578 1.8063 0 0 : 804870a: mov %ebp,(%ebx)
450 0.2272 367 1.7130 : 804870c: lea (%eax,%edx,8),%ebp
3797 1.9169 0 0 : 804870f: mov %esi,(%ecx)
5035 2.5419 22 0.1027 : 8048711: lea (%edx,%edx,1),%ecx
: 8048714: mov %ebp,%ebx
389 0.1964 0 0 : 8048716: cmp (%esp),%ecx
5885 2.9710 15 0.0700 : 8048719: mov %ecx,%edi
7 0.0035 0 0 : 804871b: jbe 80486f7 <NumSift+0x27>
416 0.2100 24 0.1120 : 804871d: add $0xc,%esp
5419 2.7357 1431 6.6791 : 8048720: pop %ebx
568 0.2868 275 1.2835 : 8048721: pop %esi
710 0.3584 24 0.1120 : 8048722: pop %edi
334 0.1686 12 0.0560 : 8048723: pop %ebp
146 0.0737 91 0.4247 : 8048724: ret
: 8048725: lea 0x0(%esi),%esi
8706 4.3952 0 0 : 8048728: mov 0x0(%ebp),%ebx
1536 0.7754 379 1.7690 : 804872b: lea 0x1(%ecx),%edi
: 804872e: mov %ebx,0x4(%esp)
14484 7.3122 9 0.0420 : 8048732: lea (%eax,%edi,4),%ebx
: 8048735: mov (%ebx),%esi
2165 1.0930 6 0.0280 : 8048737: cmp %esi,0x4(%esp)
19814 10.0030 1 0.0047 : 804873b: jl 80486fb <NumSift+0x2b>
2585 1.3050 0 0 : 804873d: mov 0x4(%esp),%esi
37728 19.0468 8504 39.6919 : 8048741: mov %ebp,%ebx
1511 0.7628 0 0 : 8048743: mov %ecx,%edi
768 0.3877 0 0 : 8048745: jmp 80486fb <NumSift+0x2b>
: 8048747: mov %esi,%esi
: 8048749: lea 0x0(%edi),%edi
while the following is what we get for the gcc 4.1 code w/o PRE:
08048670 <NumSift>: /* NumSift total: 200781 92.9938 4738 99.9156 */
1196 0.5539 0 0 : 8048670: push %ebp
1 4.6e-04 0 0 : 8048671: push %edi
2 9.3e-04 0 0 : 8048672: mov %eax,%edi
2084 0.9652 0 0 : 8048674: push %esi
9 0.0042 0 0 : 8048675: push %ebx
1 4.6e-04 0 0 : 8048676: mov %edx,%ebx
1162 0.5382 0 0 : 8048678: sub $0x4,%esp
6 0.0028 0 0 : 804867b: mov %ecx,(%esp)
3128 1.4488 0 0 : 804867e: xchg %ax,%ax
1078 0.4993 2 0.0422 : 8048680: lea (%ebx,%ebx,1),%edx
577 0.2672 0 0 : 8048683: cmp (%esp),%edx
202 0.0936 6 0.1265 : 8048686: lea (%edi,%ebx,4),%ebp
152 0.0704 1 0.0211 : 8048689: ja 80486ba <NumSift+0x4a>
44618 20.6653 0 0 : 804868b: jae 804869c <NumSift+0x2c>
2125 0.9842 62 1.3075 : 804868d: mov (%edi,%ebx,8),%eax
2932 1.3580 322 6.7904 : 8048690: cmp 0x4(%edi,%ebx,8),%eax
23392 10.8342 151 3.1843 : 8048694: setl %al
8331 3.8586 5 0.1054 : 8048697: movzbl %al,%eax
11420 5.2893 0 0 : 804869a: add %eax,%edx
15985 7.4036 1 0.0211 : 804869c: lea (%edi,%edx,4),%esi
5171 2.3950 6 0.1265 : 804869f: mov 0x0(%ebp),%ecx
109 0.0505 0 0 : 80486a2: mov %edx,%ebx
1129 0.5229 0 0 : 80486a4: mov (%esi),%eax
16905 7.8297 0 0 : 80486a6: cmp %eax,%ecx
2442 1.1310 0 0 : 80486a8: jge 80486c2 <NumSift+0x52>
18994 8.7973 1 0.0211 : 80486aa: lea (%ebx,%ebx,1),%edx
1134 0.5252 507 10.6917 : 80486ad: cmp (%esp),%edx
136 0.0630 4 0.0844 : 80486b0: mov %ecx,(%esi)
19141 8.8654 0 0 : 80486b2: mov %eax,0x0(%ebp)
1 4.6e-04 0 0 : 80486b5: lea (%edi,%ebx,4),%ebp
36 0.0167 0 0 : 80486b8: jbe 804868b <NumSift+0x1b>
3202 1.4830 0 0 : 80486ba: add $0x4,%esp
4369 2.0235 618 13.0325 : 80486bd: pop %ebx
1842 0.8531 680 14.3399 : 80486be: pop %esi
2309 1.0694 878 18.5154 : 80486bf: pop %edi
54 0.0250 1 0.0211 : 80486c0: pop %ebp
500 0.2316 5 0.1054 : 80486c1: ret
498 0.2307 0 0 : 80486c2: mov (%esp),%ebx
4407 2.0411 1487 31.3581 : 80486c5: add $0x1,%ebx
1 4.6e-04 1 0.0211 : 80486c8: jmp 8048680 <NumSift+0x10>
: 80486ca: lea 0x0(%esi),%esi
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (24 preceding siblings ...)
2008-10-04 15:13 ` rguenth at gcc dot gnu dot org
@ 2008-10-04 16:00 ` rguenth at gcc dot gnu dot org
2008-10-04 16:17 ` rguenth at gcc dot gnu dot org
` (7 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-10-04 16:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #35 from rguenth at gcc dot gnu dot org 2008-10-04 15:58 -------
Another missed optimization on the tree level is hoisting of the load of
array[k*4] before the k < j condition which is possible after the PRE
insertion:
<bb 3>:
if (k_4 < j_5(D))
goto <bb 4>;
else
goto <bb 12>;
<bb 12>:
pretmp.12_31 = k_4 * 4;
pretmp.13_30 = array_8(D) + pretmp.12_31;
pretmp.14_25 = *pretmp.13_30;
goto <bb 6>;
<bb 4>:
D.1242_7 = k_4 * 4;
D.1243_9 = array_8(D) + D.1242_7;
D.1244_10 = *D.1243_9;
maybe PRE could somehow even insert on BB3 entry instead of BB3 exit edges in
this case. This would re-enable if-conversion of the second branch.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (25 preceding siblings ...)
2008-10-04 16:00 ` rguenth at gcc dot gnu dot org
@ 2008-10-04 16:17 ` rguenth at gcc dot gnu dot org
2008-11-22 9:14 ` steven at gcc dot gnu dot org
` (6 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-10-04 16:17 UTC (permalink / raw)
To: gcc-bugs
------- Comment #36 from rguenth at gcc dot gnu dot org 2008-10-04 16:16 -------
Testcase for that:
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-pre" } */
long
NumSift (long *array, int b, unsigned long k)
{
if (b)
if (array[k] < array[k + 1L])
++k;
return array[k];
}
/* There should be two loads left. */
/* { dg-final { scan-tree-dump-times "= \\\*" 2 "pre" } } */
/* { dg-final { cleanup-tree-dump "pre" } } */
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (26 preceding siblings ...)
2008-10-04 16:17 ` rguenth at gcc dot gnu dot org
@ 2008-11-22 9:14 ` steven at gcc dot gnu dot org
2009-02-16 9:01 ` bonzini at gnu dot org
` (5 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-11-22 9:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #37 from steven at gcc dot gnu dot org 2008-11-22 09:13 -------
Re: comment #35 and comment #36
That is code hoisting, again. See Bug 23286 and some the bugs closed as a
duplicate of Bug 23286. Looks like it's time to implement tree-level hoisting
:-)
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |23286
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.2/4.3/4.4 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (27 preceding siblings ...)
2008-11-22 9:14 ` steven at gcc dot gnu dot org
@ 2009-02-16 9:01 ` bonzini at gnu dot org
2009-03-31 18:46 ` [Bug tree-optimization/21485] [4.3/4.4/4.5 " jsm28 at gcc dot gnu dot org
` (4 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: bonzini at gnu dot org @ 2009-02-16 9:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #38 from bonzini at gnu dot org 2009-02-16 09:01 -------
Outlandish statement: maybe no-condexec if conversion should be moved to the
tree-level?!? Doing this kind of hoisting at the same time as if conversion is
simpler on GIMPLE...
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bonzini at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.3/4.4/4.5 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (28 preceding siblings ...)
2009-02-16 9:01 ` bonzini at gnu dot org
@ 2009-03-31 18:46 ` jsm28 at gcc dot gnu dot org
2009-04-21 12:31 ` rob1weld at aol dot com
` (3 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2009-03-31 18:46 UTC (permalink / raw)
To: gcc-bugs
------- Comment #39 from jsm28 at gcc dot gnu dot org 2009-03-31 18:46 -------
Closing 4.2 branch.
--
jsm28 at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.2/4.3/4.4/4.5 Regression]|[4.3/4.4/4.5 Regression]
|missed load PRE, PRE makes |missed load PRE, PRE makes
|i?86 suck |i?86 suck
Target Milestone|4.2.5 |4.3.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.3/4.4/4.5 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (29 preceding siblings ...)
2009-03-31 18:46 ` [Bug tree-optimization/21485] [4.3/4.4/4.5 " jsm28 at gcc dot gnu dot org
@ 2009-04-21 12:31 ` rob1weld at aol dot com
2009-04-21 13:24 ` dberlin at dberlin dot org
` (2 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rob1weld at aol dot com @ 2009-04-21 12:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #40 from rob1weld at aol dot com 2009-04-21 12:30 -------
(In reply to comment #0)
> I've found a major performance regression in gcc 4.0.0's optimization ...
(In reply to comment #11)
> We need more analysis on these kinds of issues.
> So, we're doing a worse job on register allocation. Is that because
> the register allocator got worse, or because we're giving it a harder
> problem to solve?
(In reply to comment #12)
> It is the latter and the pass which is responsible ...
(In reply to comment #13)
> Tough, there is really not much that can be done about this ...
Is this of any use ?
Register Allocation by Puzzle Solving
http://compilers.cs.ucla.edu/fernando/projects/puzzles/
"This project consists in developing a new model for register allocation that
has fast compilation time, produces code of good quality, is able to address
the many irregularities found in common computer architectures and is intuitive
enough to be easily understood."
"Our implementation produces Pentium code that is of similar quality to the
code produced by the slower, state-of-the-art iterated register coalescing
algorithm of George and Appel augmented with extensions by Smith, Ramsey, and
Holloway."
YT,
Rob
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.3/4.4/4.5 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (30 preceding siblings ...)
2009-04-21 12:31 ` rob1weld at aol dot com
@ 2009-04-21 13:24 ` dberlin at dberlin dot org
2009-08-04 12:29 ` rguenth at gcc dot gnu dot org
2010-05-22 19:01 ` [Bug tree-optimization/21485] [4.3/4.4/4.5/4.6 " rguenth at gcc dot gnu dot org
33 siblings, 0 replies; 35+ messages in thread
From: dberlin at dberlin dot org @ 2009-04-21 13:24 UTC (permalink / raw)
To: gcc-bugs
------- Comment #41 from dberlin at gcc dot gnu dot org 2009-04-21 13:24 -------
Subject: Re: [4.3/4.4/4.5 Regression] missed
load PRE, PRE makes i?86 suck
Fernando was an intern of mine, and while his algorithm is a great
algorithm, AFAIK he hasn't gotten better code out of it than we get
with IRA right now.
I'll check with him again and see if this has changed.
On Tue, Apr 21, 2009 at 8:31 AM, rob1weld at aol dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> Is this of any use ?
>
> Register Allocation by Puzzle Solving
> http://compilers.cs.ucla.edu/fernando/projects/puzzles/
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.3/4.4/4.5 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (31 preceding siblings ...)
2009-04-21 13:24 ` dberlin at dberlin dot org
@ 2009-08-04 12:29 ` rguenth at gcc dot gnu dot org
2010-05-22 19:01 ` [Bug tree-optimization/21485] [4.3/4.4/4.5/4.6 " rguenth at gcc dot gnu dot org
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-08-04 12:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #42 from rguenth at gcc dot gnu dot org 2009-08-04 12:26 -------
GCC 4.3.4 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.4 |4.3.5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/21485] [4.3/4.4/4.5/4.6 Regression] missed load PRE, PRE makes i?86 suck
[not found] <bug-21485-10607@http.gcc.gnu.org/bugzilla/>
` (32 preceding siblings ...)
2009-08-04 12:29 ` rguenth at gcc dot gnu dot org
@ 2010-05-22 19:01 ` rguenth at gcc dot gnu dot org
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-05-22 19:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #43 from rguenth at gcc dot gnu dot org 2010-05-22 18:10 -------
GCC 4.3.5 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.5 |4.3.6
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
^ permalink raw reply [flat|nested] 35+ messages in thread