public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug optimization/12771] New: Weak loop optimizer, significant performance regression
@ 2003-10-25  0:54 tm at kloo dot net
  2003-10-25  1:03 ` [Bug optimization/12771] " tm at kloo dot net
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: tm at kloo dot net @ 2003-10-25  0:54 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771

           Summary: Weak loop optimizer, significant performance regression
           Product: gcc
           Version: 3.4
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tm at kloo dot net
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i386-linux
  GCC host triplet: i386-linux
GCC target triplet: i386-linux

This is based on Scott Robert Ladd's lpbench benchmark, which is derived from
linpack. He found a significant performance improvement on linpack when
-freduce-all-givs was used. This is the analysis of his situation using
gcc-3.4-20031024.

The majority of the time in Linpack is spent in the second loop in daxpy(). This
is compiled using "-O2 -S" to the following code:

.L98:
        movl    20(%ebp), %edx		<- memory ref #1
        flds    (%edx,%eax,4)		<- memory ref #2
        movl    12(%ebp), %edx		<- memory ref #3
        fmuls   (%edx,%eax,4)		<- memory ref #4
        incl    %eax	
        faddp   %st, %st(1)		<- memory ref #5

Here is the code as compiled with -freduce-all-givs:

.L85:
        flds    (%ecx)			<- memory ref #1
        addl    $4, %ecx
        fmuls   (%edx)			<- memory ref #2
        addl    $4, %edx
        decl    %eax
        faddp   %st, %st(1)		<- memory ref #3
        jne     .L85

Basically, by default the loop optimizer chooses to optimize:

        for (i = 0;i < n; i++) {
                dy[i] = dy[i] + da*dx[i];
        }

using a dual-register indirect addressing mode 4(%edx,%eax). This is bad because
it uses an extra register which causes the register allocator to reload dx and
dy every iteration through the loop, which results in two extra memory loads in
the inner loop.

The -freduce-all-givs version eliminates the biv which frees up a register, and
this removes two memory loads in the inner loop.

The loop optimizer should be able to estimate register pressure and should
eliminate the biv (perform giv reduction) automagically if it will reduce
register pressure.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug optimization/12771] Weak loop optimizer, significant performance regression
  2003-10-25  0:54 [Bug optimization/12771] New: Weak loop optimizer, significant performance regression tm at kloo dot net
@ 2003-10-25  1:03 ` tm at kloo dot net
  2003-10-25  1:31 ` pinskia at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tm at kloo dot net @ 2003-10-25  1:03 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771



------- Additional Comments From tm at kloo dot net  2003-10-25 00:55 -------
Created an attachment (id=4992)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=4992&action=view)
preprocessed source for linpack

preprocessed source for linpack for this PR


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug optimization/12771] Weak loop optimizer, significant performance regression
  2003-10-25  0:54 [Bug optimization/12771] New: Weak loop optimizer, significant performance regression tm at kloo dot net
  2003-10-25  1:03 ` [Bug optimization/12771] " tm at kloo dot net
@ 2003-10-25  1:31 ` pinskia at gcc dot gnu dot org
  2003-10-27 11:33 ` tm at kloo dot net
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2003-10-25  1:31 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771


pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |pessimizes-code


------- Additional Comments From pinskia at gcc dot gnu dot org  2003-10-25 01:14 -------
What is this a regression from, I get about the same code from 2.95.3, 3.2.3, and 3.3.1?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug optimization/12771] Weak loop optimizer, significant performance regression
  2003-10-25  0:54 [Bug optimization/12771] New: Weak loop optimizer, significant performance regression tm at kloo dot net
  2003-10-25  1:03 ` [Bug optimization/12771] " tm at kloo dot net
  2003-10-25  1:31 ` pinskia at gcc dot gnu dot org
@ 2003-10-27 11:33 ` tm at kloo dot net
  2003-10-28 15:42 ` [Bug optimization/12771] Weak loop optimizer pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tm at kloo dot net @ 2003-10-27 11:33 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771



------- Additional Comments From tm at kloo dot net  2003-10-27 10:01 -------
Subject: Re:  Weak loop optimizer, significant
 performance regression

On 25 Oct 2003, pinskia at gcc dot gnu dot org wrote:

Regression from 2.7.2.

Toshi

> PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771
> 
> 
> pinskia at gcc dot gnu dot org changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>            Keywords|                            |pessimizes-code
> 
> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2003-10-25 01:14 -------
> What is this a regression from, I get about the same code from 2.95.3, 3.2.3, and 3.3.1?
> 
> 
> 
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug optimization/12771] Weak loop optimizer
  2003-10-25  0:54 [Bug optimization/12771] New: Weak loop optimizer, significant performance regression tm at kloo dot net
                   ` (2 preceding siblings ...)
  2003-10-27 11:33 ` tm at kloo dot net
@ 2003-10-28 15:42 ` pinskia at gcc dot gnu dot org
  2003-10-28 19:22 ` tm at kloo dot net
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2003-10-28 15:42 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771


pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2003-10-28 15:38:57
               date|                            |
            Summary|Weak loop optimizer,        |Weak loop optimizer
                   |significant performance     |
                   |regression                  |


------- Additional Comments From pinskia at gcc dot gnu dot org  2003-10-28 15:38 -------
Since this is a regression from 2.7.2 which was released November 26, 1995 almost 8 years ago, 
we cannot really count this one as a regression as so much has changed since then.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug optimization/12771] Weak loop optimizer
  2003-10-25  0:54 [Bug optimization/12771] New: Weak loop optimizer, significant performance regression tm at kloo dot net
                   ` (3 preceding siblings ...)
  2003-10-28 15:42 ` [Bug optimization/12771] Weak loop optimizer pinskia at gcc dot gnu dot org
@ 2003-10-28 19:22 ` tm at kloo dot net
  2004-03-07  1:34 ` pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tm at kloo dot net @ 2003-10-28 19:22 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771



------- Additional Comments From tm at kloo dot net  2003-10-28 19:21 -------
Subject: Re:  Weak loop optimizer

On 28 Oct 2003, pinskia at gcc dot gnu dot org wrote:

It's still an optimization weakness.

Toshi

> PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771
> 
> 
> pinskia at gcc dot gnu dot org changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>            Severity|normal                      |enhancement
>              Status|UNCONFIRMED                 |NEW
>      Ever Confirmed|                            |1
>    Last reconfirmed|0000-00-00 00:00:00         |2003-10-28 15:38:57
>                date|                            |
>             Summary|Weak loop optimizer,        |Weak loop optimizer
>                    |significant performance     |
>                    |regression                  |
> 
> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2003-10-28 15:38 -------
> Since this is a regression from 2.7.2 which was released November 26, 1995 almost 8 years ago, 
> we cannot really count this one as a regression as so much has changed since then.
> 
> 
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug optimization/12771] Weak loop optimizer
  2003-10-25  0:54 [Bug optimization/12771] New: Weak loop optimizer, significant performance regression tm at kloo dot net
                   ` (4 preceding siblings ...)
  2003-10-28 19:22 ` tm at kloo dot net
@ 2004-03-07  1:34 ` pinskia at gcc dot gnu dot org
  2005-03-23  3:07 ` [Bug rtl-optimization/12771] " pinskia at gcc dot gnu dot org
  2005-05-07 20:47 ` pinskia at gcc dot gnu dot org
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-03-07  1:34 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-03-07 01:34 -------
The mainline has changed but still the same number of memory accesses:
        movl    16(%ebp), %edx
        flds    (%edx,%eax,4)
        fmul    %st(1), %st
        fadds   (%esi,%eax,4)
        fstps   (%esi,%eax,4)
        incl    %eax
On the other hand with -freduce-all-givs, it has increased by one:
.L15:
        flds    (%ecx)
        addl    $4, %ecx
        fmul    %st(1), %st
        fadds   (%eax)
        fstps   (%eax)
        addl    $4, %eax
        decl    %edx
        jne     .L15
Here is the reduced source:
void daxpy(int n, float  da, float  dx[], int incx, float  dy[], int incy)
{
        int i,ix,iy,m,mp1;

        mp1 = 0;
        m = 0;

        if(n <= 0) return;
        if (da == 0.0 ) return;

        if(incx != 1 || incy != 1) {
                ix = 0;
                iy = 0;
                if(incx < 0) ix = (-n+1)*incx;
                if(incy < 0)iy = (-n+1)*incy;
                for (i = 0;i < n; i++) {
                        dy[iy] = dy[iy] + da*dx[ix];
                        ix = ix + incx;
                        iy = iy + incy;
                }
                return;
        }
        for (i = 0;i < n; i++) {
                dy[i] = dy[i] + da*dx[i];
        }
return;
}

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2003-10-28 15:38:57         |2004-03-07 01:34:09
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/12771] Weak loop optimizer
  2003-10-25  0:54 [Bug optimization/12771] New: Weak loop optimizer, significant performance regression tm at kloo dot net
                   ` (5 preceding siblings ...)
  2004-03-07  1:34 ` pinskia at gcc dot gnu dot org
@ 2005-03-23  3:07 ` pinskia at gcc dot gnu dot org
  2005-05-07 20:47 ` pinskia at gcc dot gnu dot org
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-03-23  3:07 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-03-23 03:07 -------
We are better on the mainline but still not as good with the old code with -freduce-all-givs.
Here is what we get with the mainline:
.L14:
        fld     %st(0)
        fmuls   (%ecx,%edx,4)
        incl    %edx
        fadds   (%eax)
        fstps   (%eax)
        addl    $4, %eax
        cmpl    %edx, %esi
        jne     .L14

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2004-12-22 04:37:34         |2005-03-23 03:07:20
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/12771] Weak loop optimizer
  2003-10-25  0:54 [Bug optimization/12771] New: Weak loop optimizer, significant performance regression tm at kloo dot net
                   ` (6 preceding siblings ...)
  2005-03-23  3:07 ` [Bug rtl-optimization/12771] " pinskia at gcc dot gnu dot org
@ 2005-05-07 20:47 ` pinskia at gcc dot gnu dot org
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-05-07 20:47 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-05-07 20:47 -------
On the mainline, we get:
.L13:
        fld     %st(0)
        incl    %ecx
        fmuls   (%edx)
        addl    %ebx, %edx
        fadds   (%eax)
        fstps   (%eax)
        addl    %esi, %eax
        cmpl    %ecx, %edi
        jne     .L13

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-05-07 20:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-25  0:54 [Bug optimization/12771] New: Weak loop optimizer, significant performance regression tm at kloo dot net
2003-10-25  1:03 ` [Bug optimization/12771] " tm at kloo dot net
2003-10-25  1:31 ` pinskia at gcc dot gnu dot org
2003-10-27 11:33 ` tm at kloo dot net
2003-10-28 15:42 ` [Bug optimization/12771] Weak loop optimizer pinskia at gcc dot gnu dot org
2003-10-28 19:22 ` tm at kloo dot net
2004-03-07  1:34 ` pinskia at gcc dot gnu dot org
2005-03-23  3:07 ` [Bug rtl-optimization/12771] " pinskia at gcc dot gnu dot org
2005-05-07 20:47 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).