public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes
@ 2004-09-24  7:03 uros at kss-loka dot si
  2004-09-24 13:19 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: uros at kss-loka dot si @ 2004-09-24  7:03 UTC (permalink / raw)
  To: gcc-bugs

This testcase (from scimark2.0, LU.c) shows missing i386 addressing modes:

void LU_copy_matrix(int M, int N, double **lu, double **A)
{
    int i;
    int j;

    for (i=0; i<M; i++)
        for (j=0; j<N; j++)
            lu[i][j] = A[i][j];
}

With gcc -O2 -ffast-math -fomit-frame-pointer -march=pentium4 LU.c following ASM
code is produced:

GCC: (GNU) 3.2 20020903 (Red Hat Linux 8.0 3.2-7):
LU_copy_matrix:
        pushl   %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        xorl    %ebx, %ebx
        movl    20(%esp), %edi
        movl    24(%esp), %esi
        movl    32(%esp), %ebp
        cmpl    %edi, %ebx
        jge     .L13
.L11:
        xorl    %eax, %eax
        cmpl    %esi, %eax
        jge     .L15
        movl    28(%esp), %edx
        movl    (%edx,%ebx,4), %ecx
        movl    (%ebp,%ebx,4), %edx
.L10:
        fldl    (%edx,%eax,8)
        fstpl   (%ecx,%eax,8)
        addl    $1, %eax
        cmpl    %esi, %eax
        jl      .L10
.L15:
        addl    $1, %ebx
        cmpl    %edi, %ebx
        jl      .L11
.L13:
        popl    %ebx
        popl    %esi
        popl    %edi
        popl    %ebp
        ret

gcc 4.0 regressed in this area and produces:
LU_copy_matrix:
      pushl    %ebp
      pushl    %edi
      pushl    %esi
      pushl    %ebx
      movl     24(%esp), %ebp
      movl     20(%esp), %eax
      testl    %eax, %eax
      jle      .L8
      movl     32(%esp), %esi
      movl     28(%esp), %ebx
      xorl     %edi, %edi
.L4:
      testl    %ebp, %ebp
      jle      .L6
      movl     (%esi), %ecx
      movl     (%ebx), %edx
      xorl     %eax, %eax
.L5:
      fldl     (%ecx)
      fstpl    (%edx)
      addl     $1, %eax
      addl     $8, %ecx
      addl     $8, %edx
      cmpl     %eax, %ebp
      jg       .L5
.L6:
      addl     $1, %edi
      addl     $4, %esi
      addl     $4, %ebx
      cmpl     %edi, 20(%esp)
      jg       .L4
.L8:
      popl     %ebx
      popl     %esi
      popl     %edi
      popl     %ebp
      ret

The problem is in .L5 loop. gcc-4.0 uses a simple addressing scheme, but gcc-3.2
can use complex SIB addressing schemes, shown in corresponding .L10 loop. This
problem is also present in integer instructions.

IMHO this could be the cause of unbeliveably bad benchmark results in
scimark-2.0 (http://www.coyotegulch.com/reviews/linux_compilers/)

Uros.

-- 
           Summary: [4.0 regression] Missing i386 addressing modes
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: uros at kss-loka dot si
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-linuc-gnu
  GCC host triplet: i686-linux-gnu
GCC target triplet: i686-linuc-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
  2004-09-24  7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
@ 2004-09-24 13:19 ` pinskia at gcc dot gnu dot org
  2004-09-27  8:52 ` bonzini at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-09-24 13:19 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-09-24 13:19 -------
I think the simple one is fasster but I could be wrong (the reasons why I say that is because the mutliply 
by 8 is not needed every time).

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |tree-optimization
           Keywords|                            |missed-optimization
            Summary|[4.0 regression] Missing    |[4.0 regression] Missing
                   |i386 addressing modes       |i386 addressing modes
   Target Milestone|---                         |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
  2004-09-24  7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
  2004-09-24 13:19 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
@ 2004-09-27  8:52 ` bonzini at gcc dot gnu dot org
  2004-11-04  9:33 ` uros at kss-loka dot si
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: bonzini at gcc dot gnu dot org @ 2004-09-27  8:52 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bonzini at gcc dot gnu dot org  2004-09-27 08:52 -------
Confirmed, this is caused by ivopts.  In principle 4.0.0's optimization should
be a good thing, but once more register pressure makes the code worse because N
is not kept in a register.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2004-09-27 08:52:03
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
  2004-09-24  7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
  2004-09-24 13:19 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
  2004-09-27  8:52 ` bonzini at gcc dot gnu dot org
@ 2004-11-04  9:33 ` uros at kss-loka dot si
  2004-11-05  8:00 ` uros at kss-loka dot si
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: uros at kss-loka dot si @ 2004-11-04  9:33 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at kss-loka dot si  2004-11-04 09:33 -------
ASM code, produced with CVS gcc dated 04. Nov 2004 looks much better, but still
not as good as 3.2:

LU_copy_matrix:
        pushl   %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        movl    24(%esp), %ebp
        movl    20(%esp), %eax
        testl   %eax, %eax
        jle     .L8
        movl    32(%esp), %esi
        xorl    %edi, %edi
.L4:
        testl   %ebp, %ebp
        jle     .L6
        movl    28(%esp), %eax
        movl    (%eax,%edi,4), %ebx
        movl    (%esi), %ecx          <= (*1)
        xorl    %edx, %edx
.L5:
        leal    0(,%edx,8), %eax     |<= (*2)
        fldl    (%ecx,%eax)          |
        fstpl   (%ebx,%eax)          |
        addl    $1, %edx
        cmpl    %edx, %ebp
        jg      .L5
.L6:
        addl    $1, %edi
        addl    $4, %esi              <= (*1)
        cmpl    %edi, 20(%esp)
        jg      .L4
.L8:
        popl    %ebx
        popl    %esi
        popl    %edi
        popl    %ebp
        ret

(*1):  "movl    (%esi,%edi,4), %ecx" could be used here. The second addl in .L4
could be eliminated in this case.

(*2): Why not use:

       fldl    (%ecx,%edx,8)
       fstpl   (%ebx,%edx,8)

directly. lea instruction would be eliminated, together with the use of %eax
register.

Uros.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
  2004-09-24  7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
                   ` (2 preceding siblings ...)
  2004-11-04  9:33 ` uros at kss-loka dot si
@ 2004-11-05  8:00 ` uros at kss-loka dot si
  2004-11-13 20:15 ` [Bug rtl-optimization/17647] " pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: uros at kss-loka dot si @ 2004-11-05  8:00 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at kss-loka dot si  2004-11-05 08:00 -------
Another comment on code in comment #3:

LEA instruction uses only 32bit constant as a displacement. The code size of
inner loop is considerably bigger in gcc-4.0 compiled code. (Note that LEA with
a scale factor should be replaced by a shift in P4 case...)

Another feature would be to use an %ecx as a count register in inner loops. In
this case, TARGET_USE_LOOP architectures (such as K6) could use a loop insn for
inner loops.

gcc-4.0 (21 bytes)
  23:	8d 04 d5 00 00 00 00 	lea    0x0(,%edx,8),%eax
  2a:	dd 04 01             	fldl   (%ecx,%eax,1)
  2d:	dd 1c 03             	fstpl  (%ebx,%eax,1)
  30:	83 c2 01             	add    $0x1,%edx
  33:	39 55 0c             	cmp    %edx,0xc(%ebp)
  36:	7f eb                	jg     23 <LU_copy_matrix+0x23>

gcc-3.2 (13 bytes):
  22:	dd 04 c2             	fldl   (%edx,%eax,8)
  25:	dd 1c c1             	fstpl  (%ecx,%eax,8)
  28:	83 c0 01             	add    $0x1,%eax
  2b:	39 f0                	cmp    %esi,%eax
  2d:	7c f3                	jl     22 <LU_copy_matrix+0x22>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/17647] [4.0 regression] Missing i386 addressing modes
  2004-09-24  7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
                   ` (3 preceding siblings ...)
  2004-11-05  8:00 ` uros at kss-loka dot si
@ 2004-11-13 20:15 ` pinskia at gcc dot gnu dot org
  2004-11-25 23:37 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-13 20:15 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-13 20:14 -------
Actually I was wrong about IV-OPTS being the problem.  This is the same problem as PR 18463, the 
problem is that CSE does not recombine to form the addressing mode at all.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |18463
          Component|tree-optimization           |rtl-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
  2004-09-24  7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
                   ` (4 preceding siblings ...)
  2004-11-13 20:15 ` [Bug rtl-optimization/17647] " pinskia at gcc dot gnu dot org
@ 2004-11-25 23:37 ` pinskia at gcc dot gnu dot org
  2004-11-26  8:50 ` uros at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-25 23:37 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-25 23:37 -------
This is fully a IV-OPTS problem now.
With -fno-ivopts, I get
.L5:
        fldl    (%edx,%eax,8)
        fstpl   (%ecx,%eax,8)
        incl    %eax
        cmpl    %eax, %esi
        jg      .L5


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |tree-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
  2004-09-24  7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
                   ` (5 preceding siblings ...)
  2004-11-25 23:37 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
@ 2004-11-26  8:50 ` uros at gcc dot gnu dot org
  2004-11-26 15:40 ` pinskia at gcc dot gnu dot org
  2004-12-05  4:29 ` pinskia at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: uros at gcc dot gnu dot org @ 2004-11-26  8:50 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at gcc dot gnu dot org  2004-11-26 08:50 -------
(In reply to comment #6)
> This is fully a IV-OPTS problem now.
> With -fno-ivopts, I get

I can't get the same code as you. With mainline gcc (gcc version 4.0.0 20041126
(experimental)) and 'gcc -O2 -S -fno-ivopts LU.c' I got:

        xorl    %edx, %edx
        .p2align 4,,15
.L5:
        leal    0(,%edx,8), %eax
        incl    %edx
        fldl    (%eax,%ecx)
        cmpl    %edx, %edi
        fstpl   (%ebx,%eax)
        jg      .L5

leal is redundant in this case.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
  2004-09-24  7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
                   ` (6 preceding siblings ...)
  2004-11-26  8:50 ` uros at gcc dot gnu dot org
@ 2004-11-26 15:40 ` pinskia at gcc dot gnu dot org
  2004-12-05  4:29 ` pinskia at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-26 15:40 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-26 15:40 -------
(In reply to comment #7)
> (In reply to comment #6)
> > This is fully a IV-OPTS problem now.
> > With -fno-ivopts, I get
> 
> I can't get the same code as you. With mainline gcc (gcc version 4.0.0 20041126
> (experimental)) and 'gcc -O2 -S -fno-ivopts LU.c' I got:

Now at -O1, I do get the leal:
        leal    0(,%ebx,8), %edx
        movl    (%edi), %ecx
        movl    (%esi), %eax
        fldl    (%edx,%eax)
        fstpl   (%ecx,%edx)

Likewise at -Os (which seems wrong):
        movl    -16(%ebp), %eax
        leal    0(,%ebx,8), %edx
        incl    %ebx
        movl    (%eax), %ecx
        movl    (%edi), %eax
        fldl    (%eax,%edx)
        fstpl   (%ecx,%edx)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/17647] [4.0 regression] Missing i386 addressing modes
  2004-09-24  7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
                   ` (7 preceding siblings ...)
  2004-11-26 15:40 ` pinskia at gcc dot gnu dot org
@ 2004-12-05  4:29 ` pinskia at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-05  4:29 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-05 04:29 -------
This is a dup of bug 18463.

*** This bug has been marked as a duplicate of 18463 ***

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-12-05  4:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-24  7:03 [Bug target/17647] New: [4.0 regression] Missing i386 addressing modes uros at kss-loka dot si
2004-09-24 13:19 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
2004-09-27  8:52 ` bonzini at gcc dot gnu dot org
2004-11-04  9:33 ` uros at kss-loka dot si
2004-11-05  8:00 ` uros at kss-loka dot si
2004-11-13 20:15 ` [Bug rtl-optimization/17647] " pinskia at gcc dot gnu dot org
2004-11-25 23:37 ` [Bug tree-optimization/17647] " pinskia at gcc dot gnu dot org
2004-11-26  8:50 ` uros at gcc dot gnu dot org
2004-11-26 15:40 ` pinskia at gcc dot gnu dot org
2004-12-05  4:29 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).