public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes
@ 2004-09-24 7:03 uros at kss-loka dot si
2004-09-24 13:19 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: uros at kss-loka dot si @ 2004-09-24 7:03 UTC (permalink / raw)
To: gcc-bugs
This testcase (from scimark2.0, LU.c) shows missing i386 addressing modes:
void LU_copy_matrix(int M, int N, double **lu, double **A)
{
int i;
int j;
for (i=0; i<M; i++)
for (j=0; j<N; j++)
lu[i][j] = A[i][j];
}
With gcc -O2 -ffast-math -fomit-frame-pointer -march=pentium4 LU.c following ASM
code is produced:
GCC: (GNU) 3.2 20020903 (Red Hat Linux 8.0 3.2-7):
LU_copy_matrix:
pushl %ebp
pushl %edi
pushl %esi
pushl %ebx
xorl %ebx, %ebx
movl 20(%esp), %edi
movl 24(%esp), %esi
movl 32(%esp), %ebp
cmpl %edi, %ebx
jge .L13
.L11:
xorl %eax, %eax
cmpl %esi, %eax
jge .L15
movl 28(%esp), %edx
movl (%edx,%ebx,4), %ecx
movl (%ebp,%ebx,4), %edx
.L10:
fldl (%edx,%eax,8)
fstpl (%ecx,%eax,8)
addl $1, %eax
cmpl %esi, %eax
jl .L10
.L15:
addl $1, %ebx
cmpl %edi, %ebx
jl .L11
.L13:
popl %ebx
popl %esi
popl %edi
popl %ebp
ret
gcc 4.0 regressed in this area and produces:
LU_copy_matrix:
pushl %ebp
pushl %edi
pushl %esi
pushl %ebx
movl 24(%esp), %ebp
movl 20(%esp), %eax
testl %eax, %eax
jle .L8
movl 32(%esp), %esi
movl 28(%esp), %ebx
xorl %edi, %edi
.L4:
testl %ebp, %ebp
jle .L6
movl (%esi), %ecx
movl (%ebx), %edx
xorl %eax, %eax
.L5:
fldl (%ecx)
fstpl (%edx)
addl $1, %eax
addl $8, %ecx
addl $8, %edx
cmpl %eax, %ebp
jg .L5
.L6:
addl $1, %edi
addl $4, %esi
addl $4, %ebx
cmpl %edi, 20(%esp)
jg .L4
.L8:
popl %ebx
popl %esi
popl %edi
popl %ebp
ret
The problem is in .L5 loop. gcc-4.0 uses a simple addressing scheme, but gcc-3.2
can use complex SIB addressing schemes, shown in corresponding .L10 loop. This
problem is also present in integer instructions.
IMHO this could be the cause of unbeliveably bad benchmark results in
scimark-2.0 (http://www.coyotegulch.com/reviews/linux_compilers/)
Uros.
--
Summary: [4.0 regression] Missing i386 addressing modes
Product: gcc
Version: 4.0.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: i686-linuc-gnu
GCC host triplet: i686-linux-gnu
GCC target triplet: i686-linuc-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
2004-09-24 7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
@ 2004-09-24 13:19 ` pinskia at gcc dot gnu dot org
2004-09-27 8:52 ` bonzini at gcc dot gnu dot org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-09-24 13:19 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-09-24 13:19 -------
I think the simple one is fasster but I could be wrong (the reasons why I say that is because the mutliply
by 8 is not needed every time).
--
What |Removed |Added
----------------------------------------------------------------------------
Component|target |tree-optimization
Keywords| |missed-optimization
Summary|[4.0 regression] Missing |[4.0 regression] Missing
|i386 addressing modes |i386 addressing modes
Target Milestone|--- |4.0.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
2004-09-24 7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
2004-09-24 13:19 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
@ 2004-09-27 8:52 ` bonzini at gcc dot gnu dot org
2004-11-04 9:33 ` uros at kss-loka dot si
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: bonzini at gcc dot gnu dot org @ 2004-09-27 8:52 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From bonzini at gcc dot gnu dot org 2004-09-27 08:52 -------
Confirmed, this is caused by ivopts. In principle 4.0.0's optimization should
be a good thing, but once more register pressure makes the code worse because N
is not kept in a register.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Last reconfirmed|0000-00-00 00:00:00 |2004-09-27 08:52:03
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
2004-09-24 7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
2004-09-24 13:19 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
2004-09-27 8:52 ` bonzini at gcc dot gnu dot org
@ 2004-11-04 9:33 ` uros at kss-loka dot si
2004-11-05 8:00 ` uros at kss-loka dot si
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: uros at kss-loka dot si @ 2004-11-04 9:33 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2004-11-04 09:33 -------
ASM code, produced with CVS gcc dated 04. Nov 2004 looks much better, but still
not as good as 3.2:
LU_copy_matrix:
pushl %ebp
pushl %edi
pushl %esi
pushl %ebx
movl 24(%esp), %ebp
movl 20(%esp), %eax
testl %eax, %eax
jle .L8
movl 32(%esp), %esi
xorl %edi, %edi
.L4:
testl %ebp, %ebp
jle .L6
movl 28(%esp), %eax
movl (%eax,%edi,4), %ebx
movl (%esi), %ecx <= (*1)
xorl %edx, %edx
.L5:
leal 0(,%edx,8), %eax |<= (*2)
fldl (%ecx,%eax) |
fstpl (%ebx,%eax) |
addl $1, %edx
cmpl %edx, %ebp
jg .L5
.L6:
addl $1, %edi
addl $4, %esi <= (*1)
cmpl %edi, 20(%esp)
jg .L4
.L8:
popl %ebx
popl %esi
popl %edi
popl %ebp
ret
(*1): "movl (%esi,%edi,4), %ecx" could be used here. The second addl in .L4
could be eliminated in this case.
(*2): Why not use:
fldl (%ecx,%edx,8)
fstpl (%ebx,%edx,8)
directly. lea instruction would be eliminated, together with the use of %eax
register.
Uros.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
2004-09-24 7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
` (2 preceding siblings ...)
2004-11-04 9:33 ` uros at kss-loka dot si
@ 2004-11-05 8:00 ` uros at kss-loka dot si
2004-11-13 20:15 ` [Bug rtl-optimization/17647] " pinskia at gcc dot gnu dot org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: uros at kss-loka dot si @ 2004-11-05 8:00 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2004-11-05 08:00 -------
Another comment on code in comment #3:
LEA instruction uses only 32bit constant as a displacement. The code size of
inner loop is considerably bigger in gcc-4.0 compiled code. (Note that LEA with
a scale factor should be replaced by a shift in P4 case...)
Another feature would be to use an %ecx as a count register in inner loops. In
this case, TARGET_USE_LOOP architectures (such as K6) could use a loop insn for
inner loops.
gcc-4.0 (21 bytes)
23: 8d 04 d5 00 00 00 00 lea 0x0(,%edx,8),%eax
2a: dd 04 01 fldl (%ecx,%eax,1)
2d: dd 1c 03 fstpl (%ebx,%eax,1)
30: 83 c2 01 add $0x1,%edx
33: 39 55 0c cmp %edx,0xc(%ebp)
36: 7f eb jg 23 <LU_copy_matrix+0x23>
gcc-3.2 (13 bytes):
22: dd 04 c2 fldl (%edx,%eax,8)
25: dd 1c c1 fstpl (%ecx,%eax,8)
28: 83 c0 01 add $0x1,%eax
2b: 39 f0 cmp %esi,%eax
2d: 7c f3 jl 22 <LU_copy_matrix+0x22>
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/17647] [4.0 regression] Missing i386 addressing modes
2004-09-24 7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
` (3 preceding siblings ...)
2004-11-05 8:00 ` uros at kss-loka dot si
@ 2004-11-13 20:15 ` pinskia at gcc dot gnu dot org
2004-11-25 23:37 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-13 20:15 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-13 20:14 -------
Actually I was wrong about IV-OPTS being the problem. This is the same problem as PR 18463, the
problem is that CSE does not recombine to form the addressing mode at all.
--
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |18463
Component|tree-optimization |rtl-optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
2004-09-24 7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
` (4 preceding siblings ...)
2004-11-13 20:15 ` [Bug rtl-optimization/17647] " pinskia at gcc dot gnu dot org
@ 2004-11-25 23:37 ` pinskia at gcc dot gnu dot org
2004-11-26 8:50 ` uros at gcc dot gnu dot org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-25 23:37 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-25 23:37 -------
This is fully a IV-OPTS problem now.
With -fno-ivopts, I get
.L5:
fldl (%edx,%eax,8)
fstpl (%ecx,%eax,8)
incl %eax
cmpl %eax, %esi
jg .L5
--
What |Removed |Added
----------------------------------------------------------------------------
Component|rtl-optimization |tree-optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
2004-09-24 7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
` (5 preceding siblings ...)
2004-11-25 23:37 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
@ 2004-11-26 8:50 ` uros at gcc dot gnu dot org
2004-11-26 15:40 ` pinskia at gcc dot gnu dot org
2004-12-05 4:29 ` pinskia at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: uros at gcc dot gnu dot org @ 2004-11-26 8:50 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at gcc dot gnu dot org 2004-11-26 08:50 -------
(In reply to comment #6)
> This is fully a IV-OPTS problem now.
> With -fno-ivopts, I get
I can't get the same code as you. With mainline gcc (gcc version 4.0.0 20041126
(experimental)) and 'gcc -O2 -S -fno-ivopts LU.c' I got:
xorl %edx, %edx
.p2align 4,,15
.L5:
leal 0(,%edx,8), %eax
incl %edx
fldl (%eax,%ecx)
cmpl %edx, %edi
fstpl (%ebx,%eax)
jg .L5
leal is redundant in this case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
2004-09-24 7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
` (6 preceding siblings ...)
2004-11-26 8:50 ` uros at gcc dot gnu dot org
@ 2004-11-26 15:40 ` pinskia at gcc dot gnu dot org
2004-12-05 4:29 ` pinskia at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-26 15:40 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-26 15:40 -------
(In reply to comment #7)
> (In reply to comment #6)
> > This is fully a IV-OPTS problem now.
> > With -fno-ivopts, I get
>
> I can't get the same code as you. With mainline gcc (gcc version 4.0.0 20041126
> (experimental)) and 'gcc -O2 -S -fno-ivopts LU.c' I got:
Now at -O1, I do get the leal:
leal 0(,%ebx,8), %edx
movl (%edi), %ecx
movl (%esi), %eax
fldl (%edx,%eax)
fstpl (%ecx,%edx)
Likewise at -Os (which seems wrong):
movl -16(%ebp), %eax
leal 0(,%ebx,8), %edx
incl %ebx
movl (%eax), %ecx
movl (%edi), %eax
fldl (%eax,%edx)
fstpl (%ecx,%edx)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
2004-09-24 7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
` (7 preceding siblings ...)
2004-11-26 15:40 ` pinskia at gcc dot gnu dot org
@ 2004-12-05 4:29 ` pinskia at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-05 4:29 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-05 04:29 -------
This is a dup of bug 18463.
*** This bug has been marked as a duplicate of 18463 ***
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |DUPLICATE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-12-05 4:29 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-24 7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
2004-09-24 13:19 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
2004-09-27 8:52 ` bonzini at gcc dot gnu dot org
2004-11-04 9:33 ` uros at kss-loka dot si
2004-11-05 8:00 ` uros at kss-loka dot si
2004-11-13 20:15 ` [Bug rtl-optimization/17647] " pinskia at gcc dot gnu dot org
2004-11-25 23:37 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
2004-11-26 8:50 ` uros at gcc dot gnu dot org
2004-11-26 15:40 ` pinskia at gcc dot gnu dot org
2004-12-05 4:29 ` pinskia at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).