public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
* Re: optimization/6883: Fails to optimize temporary objects.
@ 2003-05-10 23:36 Dara Hazeghi
  0 siblings, 0 replies; 4+ messages in thread
From: Dara Hazeghi @ 2003-05-10 23:36 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR optimization/6883; it has been noted by GNATS.

From: Dara Hazeghi <dhazeghi@yahoo.com>
To: gcc-gnats@gcc.gnu.org, rguenth@tat.physik.uni-tuebingen.de
Cc:  
Subject: Re: optimization/6883: Fails to optimize temporary objects.
Date: Sat, 10 May 2003 16:29:36 -0700

 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit- 
 trail&database=gcc&pr=6883
 
 Hello,
 
 this bug was reported against gcc 3.1. Would it be possible to test  
 your testcase against a more current version of gcc (ie 3.2.3 or 3.3  
 prerelease) and report back on the results? Thanks,
 
 Dara
 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: optimization/6883: Fails to optimize temporary objects.
@ 2003-05-11 12:06 Richard Guenther
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Guenther @ 2003-05-11 12:06 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR optimization/6883; it has been noted by GNATS.

From: Richard Guenther <rguenth@tat.physik.uni-tuebingen.de>
To: Dara Hazeghi <dhazeghi@yahoo.com>
Cc: gcc-gnats@gcc.gnu.org,  <rguenth@tat.physik.uni-tuebingen.de>
Subject: Re: optimization/6883: Fails to optimize temporary objects.
Date: Sun, 11 May 2003 14:00:37 +0200 (CEST)

 On Sat, 10 May 2003, Dara Hazeghi wrote:
 
 > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-
 > trail&database=gcc&pr=6883
 >
 > Hello,
 >
 > this bug was reported against gcc 3.1. Would it be possible to test
 > your testcase against a more current version of gcc (ie 3.2.3 or 3.3
 > prerelease) and report back on the results? Thanks,
 
 Some more information, based on my last reply - the Intel compiler
 Intel(R) C++ Compiler for 32-bit applications, Version 7.1   Build
 20030424Z produces with -O2
 
 iterators with temporaries
 
 ..B1.18:                        # Preds ..B1.18 ..B1.17
         lea       1(%esi), %edi                                 #65.24
         cmpl      $254, %edi                                    #66.16
         fldl      -8(%edx,%esi,8)                               #65.21
         faddl     8(%edx,%esi,8)                                #65.28
         fmull     (%eax,%esi,8)                                 #65.32
         fstpl     (%ecx,%esi,8)                                 #65.9
         movl      %edi, %esi                                    #65.24
         jle       ..B1.18       # Prob 97%                      #66.16
 
 iterators without temporaries
 
 ..B1.21:                        # Preds ..B1.21 ..B1.20         # Infreq
         fldl      -8(%edx,%esi,8)                               #74.17
         faddl     8(%edx,%esi,8)                                #74.25
         fmull     (%eax,%esi,8)                                 #74.34
         fstpl     (%ecx,%esi,8)                                 #74.9
         incl      %esi                                          #75.16
         cmpl      $254, %esi                                    #75.16
         jle       ..B1.21       # Prob 97%                      #75.16
 
 no iterators, just int
 
 ..B1.24:                        # Preds ..B1.24 ..B1.23         # Infreq
         lea       1(%edi), %esi                                 #83.24
         fldl      -8(%eax,%edi,8)                               #83.17
         faddl     8(%eax,%edi,8)                                #83.24
         fmull     (%edx,%edi,8)                                 #83.32
         fstpl     (%ecx,%edi,8)                                 #83.9
         addl      $2, %edi                                      #83.24
         cmpl      $254, %edi                                    #84.16
         fldl      -8(%eax,%esi,8)                               #83.17
         faddl     8(%eax,%esi,8)                                #83.24
         fmull     (%edx,%esi,8)                                 #83.32
         fstpl     (%ecx,%esi,8)                                 #83.9
         jle       ..B1.24       # Prob 97%                      #84.16
 
 it even detects the loop is run even times and unrolls it 2-times ;)
 This is what I like the code to get optimized to, at least all three
 versions look nearly the same.
 
 Richard.
 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: optimization/6883: Fails to optimize temporary objects.
@ 2003-05-11 11:16 Richard Guenther
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Guenther @ 2003-05-11 11:16 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR optimization/6883; it has been noted by GNATS.

From: Richard Guenther <rguenth@tat.physik.uni-tuebingen.de>
To: Dara Hazeghi <dhazeghi@yahoo.com>
Cc: gcc-gnats@gcc.gnu.org,  <rguenth@tat.physik.uni-tuebingen.de>
Subject: Re: optimization/6883: Fails to optimize temporary objects.
Date: Sun, 11 May 2003 13:13:57 +0200 (CEST)

 On Sat, 10 May 2003, Dara Hazeghi wrote:
 
 > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-
 > trail&database=gcc&pr=6883
 >
 > Hello,
 >
 > this bug was reported against gcc 3.1. Would it be possible to test
 > your testcase against a more current version of gcc (ie 3.2.3 or 3.3
 > prerelease) and report back on the results? Thanks,
 
 The problem is the same with g++-3.3 (GCC) 3.3 20030505 (prerelease)
 
 the first loop body now gets
 
 .L6:
         movl    $1, -104(%ebp)
         movl    -80(%ebp), %eax #  <variable>.m_i,  <anonymous>
         movl    %esi, -100(%ebp)
         movl    $1, -120(%ebp)
         leal    -1(%eax), %edx
         movl    %esi, -116(%ebp)
         leal    0(,%eax,8), %ecx
         incl    %eax
         fldl    (%ebx,%eax,8)
         cmpl    %esi, %eax
         faddl   (%ebx,%edx,8)
         movl    %edx, -96(%ebp) #  it.m_i
         movl    -140(%ebp), %edx
         fmull   (%ecx,%edi)
         movl    %eax, -112(%ebp)        #  it.m_i
         movl    %eax, -80(%ebp) #  <variable>.m_i
         fstpl   (%ecx,%edx)
         jle     .L6
 
 while the second, manually optimized, version results in the lot better
 
 .L23:
         movl    -128(%ebp), %eax        #  <variable>.m_i,  <anonymous>
         leal    1(%eax), %edx
         leal    0(,%eax,8), %ecx
         fldl    (%ebx,%edx,8)
         cmpl    %esi, %edx
         faddl   -8(%ebx,%eax,8)
         movl    -140(%ebp), %eax
         movl    %edx, -128(%ebp)        #  <variable>.m_i
         fmull   (%ecx,%edi)
         fstpl   (%ecx,%eax)
         jle     .L23
 
 Of course this is still not optimal, as the loop iterator itself can be
 optimized away and turned into a completely int-driven loop (add a
 operator()(int) to Array class and use
 
     int i = 1;
     do {
       a(i) = (b(i-1)+b(i+1))*c(i);
     } while (++i <= 254);
 
 to iterate gives
 
 .L37:
         fldl    8(%ebx,%edx)
         incl    %eax    #  i
         faddl   -8(%ebx,%edx)
         fmull   (%edi,%edx)
         fstpl   (%esi,%edx)
         addl    $8, %edx
         cmpl    $254, %eax      #  i
         jle     .L37
 
 which seems nearly optimal here (one might use a byte for the loop
 counter here, or unify it with the array offset %edx).
 
 g++-3.4 (GCC) 3.4 20030505 (experimental) for the third and the second
 case is the same (for the second case it is able to hoist one more movl
 out of the loop). For the first, inadequately optimized version it does
 slightly better than 3.3, probably due to the new loop optimizer:
 
 .L6:
         movl    -80(%ebp), %eax # <variable>.m_i, <anonymous>
         leal    0(,%eax,8), %edx        #, tmp88
         leal    -1(%eax), %ecx  #, tmp98
         movl    %ecx, -112(%ebp)        # tmp98, it.m_i
         incl    %eax    # tmp116
         cmpl    $254, %eax      #, tmp116
         fldl    (%ebx,%eax,8)   #* <anonymous>
         faddl   (%ebx,%ecx,8)   #* <anonymous>
         movl    %eax, -144(%ebp)        # tmp116, it.m_i
         movl    %eax, -80(%ebp) # tmp116, <variable>.m_i
         fmull   (%edx,%edi)     #
         fstpl   (%edx,%esi)     #
         jle     .L6     #,
 
 but this is still far away from the optimal and the manually optimized
 version. I.e. it still doesnt avoid creating the temporary iterator object
 to hold i-1/i+1.
 
 So after all, while its getting better, its still not anywhere near to
 satisfactory.
 
 Richard.
 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* optimization/6883: Fails to optimize temporary objects.
@ 2002-05-31  3:46 rguenth
  0 siblings, 0 replies; 4+ messages in thread
From: rguenth @ 2002-05-31  3:46 UTC (permalink / raw)
  To: gcc-gnats


>Number:         6883
>Category:       optimization
>Synopsis:       Fails to optimize temporary objects.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          pessimizes-code
>Submitter-Id:   net
>Arrival-Date:   Fri May 31 03:06:02 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator:     Richard Guenther <rguenth@tat.physik.uni-tuebingen.de>
>Release:        gcc version 3.1.1 20020516 (prerelease)
>Organization:
>Environment:
Linux bellatrix 2.4.18-rc4-TATUP #2 Tue Mar 19 16:30:05 CET 2002 i686 unknown
>Description:
Temporary objects are not optimized away, which is a serious problem for medium-complex iterators. See attachment for an example.

The first loop body which uses temporary objects compiles (ix86, -O3 -fno-exceptions) to

        movl    -88(%ebp), %ecx
        movl    -80(%ebp), %eax
        movl    -140(%ebp), %esi
        movl    %ecx, -104(%ebp)
        movl    -56(%ebp), %ebx
        leal    0(,%eax,8), %edx
        movl    %ecx, -120(%ebp)
        movl    -40(%ebp), %edi
        movl    -140(%ebp), %ecx
        movl    %esi, -100(%ebp)
        leal    -1(%eax), %esi
        incl    %eax
        movl    %eax, -112(%ebp)
        addl    %edx, %edi
        movl    %esi, -96(%ebp)
        movl    %ecx, -116(%ebp)
        fldl    (%ebx,%eax,8)
        movl    -72(%ebp), %eax
        faddl   (%ebx,%esi,8)
        addl    %eax, %edx
        fmull   (%edx)
        fstpl   (%edi)

which fails to optimize the created temporary objects. It should be optimized to the equivalent code for the second implementation manually doing this optimization:

        movl    -128(%ebp), %edi
        movl    -56(%ebp), %eax
        movl    -40(%ebp), %esi
        movl    -72(%ebp), %ecx
        leal    0(,%edi,8), %ebx
        addl    %ebx, %esi
        fldl    8(%eax,%edi,8)
        addl    %ecx, %ebx
        faddl   -8(%eax,%edi,8)
        fmull   (%ebx)
        fstpl   (%esi)

This problem pessimizes code which uses iterators like that by about a factor of 3 compared to equivalent C code which is a serious problem.
>How-To-Repeat:

>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:
----gnatsweb-attachment----
Content-Type: text/plain; name="iterator-gnats.cpp"
Content-Disposition: inline; filename="iterator-gnats.cpp"


class Iterator {
public:
  Iterator(int i0, int i1)
    : m_i0(i0), m_i1(i1)
  {
    m_i = i0;
  }

  Iterator operator+(int i) const {
    Iterator it(*this);
    it.m_i += i;
    return it;
  }
  Iterator operator-(int i) const {
    Iterator it(*this);
    it.m_i -= i;
    return it;
  }

  bool operator++()
  {
    m_i++;
    if (m_i > m_i1)
      return false;
    return true;
  }

  int getIndex() const { return m_i; }

private:
  const int m_i0, m_i1;
  int m_i;
};

class Array {
public:
  Array(int size)
    : m_v(new double(size)) {}
  ~Array() { delete m_v; }

  double& operator()(const Iterator& i) { return m_v[i.getIndex()]; }
  double& operator()(const Iterator& i, int d) { return m_v[i.getIndex()+d]; }

private:
  double * const m_v;
};


int main(int argc, char **argv)
{
  Array a(256);
  Array b(256);
  Array c(256);

  // temporary objects not optimized
  {
    Iterator i(1, 254);
    do {
      a(i) = (b(i-1)+b(i+1))*c(i);
    } while (++i);
  }

  // optimized ok
  {
    Iterator i(1, 254);
    do {
      a(i) = (b(i,-1)+b(i,+1))*c(i);
    } while (++i);
  }
}


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-05-11 12:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-10 23:36 optimization/6883: Fails to optimize temporary objects Dara Hazeghi
  -- strict thread matches above, loose matches on Subject: below --
2003-05-11 12:06 Richard Guenther
2003-05-11 11:16 Richard Guenther
2002-05-31  3:46 rguenth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).