[Bug c++/46284] New: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c++/46284] New: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes
@ 2010-11-02 23:48 rydencillo at gmail dot com
  2010-11-02 23:55 ` [Bug tree-optimization/46284] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: rydencillo at gmail dot com @ 2010-11-02 23:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46284

           Summary: Lack of proper optimization for certain SSE
                    operations, and weird behavior with similar source
                    codes
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: rydencillo@gmail.com


I am using this source code: http://pastebin.com/tMpQ2Bzv
Compile with -O3 -march=core2 -std=c++0x
Notice that the line 178 has been commented.
GCC will produce the following assembly in the final binary to initialize v1
and v2:
mov     dword ptr [esp+60h+var_30], 3F800000h
mov     dword ptr [esp+60h+var_30+4], 40000000h
mov     dword ptr [esp+60h+var_30+8], 40400000h
mov     dword ptr [esp+60h+var_30+0Ch], 40800000h
mov     dword ptr [esp+60h+var_20], 41000000h
mov     dword ptr [esp+60h+var_20+4], 40E00000h
mov     dword ptr [esp+60h+var_20+8], 40C00000h
mov     dword ptr [esp+60h+var_20+0Ch], 40A00000h

Removing the comment on that line will change the assembly and the
initialization will be changed to:
movaps  xmm1, oword ptr ds:oword_47D090
movaps  xmm0, oword ptr ds:oword_47D0A0
movaps  oword ptr [esp+80h+var_50], xmm1
movaps  oword ptr [esp+80h+var_40], xmm0

which seems to make no sense.

Also, the assembly for the first case would look like this:
mov     dword ptr [esp+60h+var_30], 3F800000h
mov     dword ptr [esp+60h+var_30+4], 40000000h
mov     dword ptr [esp+60h+var_30+8], 40400000h
mov     dword ptr [esp+60h+var_30+0Ch], 40800000h
mov     dword ptr [esp+60h+var_20], 41000000h
mov     dword ptr [esp+60h+var_20+4], 40E00000h
mov     dword ptr [esp+60h+var_20+8], 40C00000h
mov     dword ptr [esp+60h+var_20+0Ch], 40A00000h
movaps  xmm0, oword ptr [esp+60h+var_30]
mov     [esp+60h+var_60], offset aResultadoFFFF ; "Resultado: %f %f %f %f\n"
addps   xmm0, oword ptr [esp+60h+var_20]
movaps  oword ptr [esp+60h+var_10], xmm0
fld     dword ptr [esp+60h+var_10+0Ch]
fstp    [esp+60h+var_44]
fld     dword ptr [esp+60h+var_10+8]
fstp    [esp+60h+var_4C]
fld     dword ptr [esp+60h+var_10+4]
fstp    [esp+60h+var_54]
fld     dword ptr [esp+60h+var_10]
fstp    [esp+60h+var_5C]
call    printf
xor     eax, eax

But the object creation for those vectors should be dropped at all, and it
should work on SSE registers when possible.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/46284] Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes
  2010-11-02 23:48 [Bug c++/46284] New: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes rydencillo at gmail dot com
@ 2010-11-02 23:55 ` pinskia at gcc dot gnu.org
  2010-11-02 23:56 ` pinskia at gcc dot gnu.org
  2023-05-15  6:07 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2010-11-02 23:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46284

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|win32                       |i?86-*-*
          Component|c++                         |tree-optimization
               Host|win32                       |

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2010-11-02 23:54:54 UTC ---
There are multiple issues that GCC does not optimize this correctly.  First of
the problem that the unions does not cause a bit field reference to happen, see
PR 28367.  Then not combining the bit field references together.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/46284] Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes
  2010-11-02 23:48 [Bug c++/46284] New: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes rydencillo at gmail dot com
  2010-11-02 23:55 ` [Bug tree-optimization/46284] " pinskia at gcc dot gnu.org
@ 2010-11-02 23:56 ` pinskia at gcc dot gnu.org
  2023-05-15  6:07 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2010-11-02 23:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46284

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> 2010-11-02 23:56:05 UTC ---
Testcase copied here:
/* 
 * File:   Common/Range.h
 * Author: ryden
 * Based on code found at:
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg01854.html
 *
 */

#ifndef RANGE_H
#define    RANGE_H

template < int iMin, int iMax >
struct Range
{
    struct iterator
    {
        iterator ( int val ) : v ( val ) {}

        iterator operator++ () { iterator it ( v ); ++v; return it; }
        iterator& operator++ (int) { ++v; return *this; }
        bool operator!= ( const iterator& other ) const { return other.v != v;
}
        int operator* () const { return v; }

    private:
        int v;
    };

    iterator begin () const { return iterator ( iMin ); }
    iterator end () const { return iterator ( iMax + 1 ); }
};

#define range(min,max) Range<min,max>()
#define until(max) Range<0,(max)-1>()

#endif    /* RANGE_H */



/* 
 * File:   Common/Vector.h
 * Author: ryden
 *
 */

#ifndef VECTOR_H
#define    VECTOR_H

#include <string>
#include <sstream>

template < typename T, unsigned int uiNumComponents >
struct VectorBase
{
    typedef T __attribute__ ((vector_size(uiNumComponents*sizeof(T))))
vectorType;
    union
    {
        T           v [ uiNumComponents ];
        vectorType  vec;
    };
};

template < typename T, unsigned int uiNumComponents >
class Vector : protected VectorBase < T, uiNumComponents >
{
public:
    Vector ( )
    {
        for ( auto i : until(uiNumComponents) )
        {
            Vector::v[i] = 0;
        }
    }

    Vector ( const T values[uiNumComponents] )
    : Vector::v ( values )
    {
    }

    Vector ( const std::initializer_list<T>& list )
    {
        for ( auto i : until(uiNumComponents) )
        {
            Vector::v[i] = list.begin ()[i];
        }
    }

    Vector& operator= ( const Vector& Right )
    {
        Vector::vec = Right.vec;
        return *this;
    }

    operator std::string () const
    {
        return *(*this);
    }

    std::string operator* () const
    {
        std::ostringstream ostr;

        ostr << "( ";
        for ( const T& elem : Vector::v )
        {
            ostr << elem << ", ";
        }
        ostr.seekp( -2, std::ios_base::cur );
        ostr << " )";

        return ostr.str();
    }

    Vector operator+ ( const Vector& Right ) const
    {
        Vector vecRet;
        vecRet.vec = Vector::vec + Right.vec;
        return vecRet;
    }

    Vector& operator+= ( const Vector& Right )
    {
        Vector::vec = Vector::vec + Right.vec;
        return *this;
    }
};

template < typename T >
class Vector4 : public Vector < T, 4 >
{
    typedef Vector < T, 4 > parent;
public:

    Vector4 ()
    : parent ()
    {
    }

    Vector4 ( const T& x, const T& y = 0, const T& z = 0, const T& w = 0 )
    {
        parent::v [ 0 ] = x;
        parent::v [ 1 ] = y;
        parent::v [ 2 ] = z;
        parent::v [ 3 ] = w;
    }

    Vector4 ( const parent& Right )
    : parent ( Right )
    {
    }

    Vector4& operator= ( const parent& Right )
    {
        parent::operator= ( Right );
        return *this;
    }

    T&          x ()        { return parent::v[0]; }
    const T&    x () const  { return parent::v[0]; }
    T&          y ()        { return parent::v[1]; }
    const T&    y () const  { return parent::v[1]; }
    T&          z ()        { return parent::v[2]; }
    const T&    z () const  { return parent::v[2]; }
    T&          w ()        { return parent::v[3]; }
    const T&    w () const  { return parent::v[3]; }
};

typedef Vector4<short> Vector4s, vec4s;
typedef Vector4<float> Vector4f, vec4f;
typedef Vector4<double> Vector4d, vec4d;
#endif    /* VECTOR_H */

#include <cstdio>
#include <cstdlib>

int main(int argc, char** argv)
{
    vec4f v1 ( 1, 2, 3, 4 ), v2 ( 8, 7, 6, 5 ), v3;
    v3 = v1 + v2;
//    v3 += { 1, 0, 1, 0 };

    printf ( "Result: %f %f %f %f\n", v3.x(), v3.y(), v3.z(), v3.w() );
    return 0;
}


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/46284] Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes
  2010-11-02 23:48 [Bug c++/46284] New: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes rydencillo at gmail dot com
  2010-11-02 23:55 ` [Bug tree-optimization/46284] " pinskia at gcc dot gnu.org
  2010-11-02 23:56 ` pinskia at gcc dot gnu.org
@ 2023-05-15  6:07 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-15  6:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46284

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |10.1.0, 11.3.0, 12.1.0,
                   |                            |13.1.0
   Target Milestone|---                         |10.0
             Status|UNCONFIRMED                 |RESOLVED
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=28367,
                   |                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=83518
         Resolution|---                         |FIXED

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Fully fixed in GCC 10 (by r10-1692-g38988cbf9ebaa9):
main:
        subq    $8, %rsp
        movl    $.LC1, %edi
        movl    $4, %eax
        movsd   .LC0(%rip), %xmm0
        movapd  %xmm0, %xmm3
        movapd  %xmm0, %xmm2
        movapd  %xmm0, %xmm1
        call    printf

Just loading the constants to call printf.


In GCC 7-9 (improved most likely by PR 28367), GCC produces:
        movl    $.LC0, %edi
        movl    $4, %eax
        movdqa  .LC2(%rip), %xmm1
        movdqa  .LC1(%rip), %xmm0
        addps   %xmm1, %xmm0
        movaps  %xmm0, %xmm3
        movaps  %xmm0, %xmm2
        movaps  %xmm0, %xmm1
        shufps  $255, %xmm0, %xmm3
        unpckhps        %xmm0, %xmm2
        shufps  $85, %xmm0, %xmm1
        cvtss2sd        %xmm0, %xmm0
        cvtss2sd        %xmm3, %xmm3
        cvtss2sd        %xmm2, %xmm2
        cvtss2sd        %xmm1, %xmm1
        call    printf

Which is not bad and much better than before in GCC 6 (comment #0's code is
listed there).

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-15  6:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-02 23:48 [Bug c++/46284] New: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes rydencillo at gmail dot com
2010-11-02 23:55 ` [Bug tree-optimization/46284] " pinskia at gcc dot gnu.org
2010-11-02 23:56 ` pinskia at gcc dot gnu.org
2023-05-15  6:07 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).