public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/46284] New: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes
@ 2010-11-02 23:48 rydencillo at gmail dot com
2010-11-02 23:55 ` [Bug tree-optimization/46284] " pinskia at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: rydencillo at gmail dot com @ 2010-11-02 23:48 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46284
Summary: Lack of proper optimization for certain SSE
operations, and weird behavior with similar source
codes
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: rydencillo@gmail.com
I am using this source code: http://pastebin.com/tMpQ2Bzv
Compile with -O3 -march=core2 -std=c++0x
Notice that the line 178 has been commented.
GCC will produce the following assembly in the final binary to initialize v1
and v2:
mov dword ptr [esp+60h+var_30], 3F800000h
mov dword ptr [esp+60h+var_30+4], 40000000h
mov dword ptr [esp+60h+var_30+8], 40400000h
mov dword ptr [esp+60h+var_30+0Ch], 40800000h
mov dword ptr [esp+60h+var_20], 41000000h
mov dword ptr [esp+60h+var_20+4], 40E00000h
mov dword ptr [esp+60h+var_20+8], 40C00000h
mov dword ptr [esp+60h+var_20+0Ch], 40A00000h
Removing the comment on that line will change the assembly and the
initialization will be changed to:
movaps xmm1, oword ptr ds:oword_47D090
movaps xmm0, oword ptr ds:oword_47D0A0
movaps oword ptr [esp+80h+var_50], xmm1
movaps oword ptr [esp+80h+var_40], xmm0
which seems to make no sense.
Also, the assembly for the first case would look like this:
mov dword ptr [esp+60h+var_30], 3F800000h
mov dword ptr [esp+60h+var_30+4], 40000000h
mov dword ptr [esp+60h+var_30+8], 40400000h
mov dword ptr [esp+60h+var_30+0Ch], 40800000h
mov dword ptr [esp+60h+var_20], 41000000h
mov dword ptr [esp+60h+var_20+4], 40E00000h
mov dword ptr [esp+60h+var_20+8], 40C00000h
mov dword ptr [esp+60h+var_20+0Ch], 40A00000h
movaps xmm0, oword ptr [esp+60h+var_30]
mov [esp+60h+var_60], offset aResultadoFFFF ; "Resultado: %f %f %f %f\n"
addps xmm0, oword ptr [esp+60h+var_20]
movaps oword ptr [esp+60h+var_10], xmm0
fld dword ptr [esp+60h+var_10+0Ch]
fstp [esp+60h+var_44]
fld dword ptr [esp+60h+var_10+8]
fstp [esp+60h+var_4C]
fld dword ptr [esp+60h+var_10+4]
fstp [esp+60h+var_54]
fld dword ptr [esp+60h+var_10]
fstp [esp+60h+var_5C]
call printf
xor eax, eax
But the object creation for those vectors should be dropped at all, and it
should work on SSE registers when possible.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/46284] Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes
2010-11-02 23:48 [Bug c++/46284] New: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes rydencillo at gmail dot com
@ 2010-11-02 23:55 ` pinskia at gcc dot gnu.org
2010-11-02 23:56 ` pinskia at gcc dot gnu.org
2023-05-15 6:07 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2010-11-02 23:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46284
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target|win32 |i?86-*-*
Component|c++ |tree-optimization
Host|win32 |
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2010-11-02 23:54:54 UTC ---
There are multiple issues that GCC does not optimize this correctly. First of
the problem that the unions does not cause a bit field reference to happen, see
PR 28367. Then not combining the bit field references together.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/46284] Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes
2010-11-02 23:48 [Bug c++/46284] New: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes rydencillo at gmail dot com
2010-11-02 23:55 ` [Bug tree-optimization/46284] " pinskia at gcc dot gnu.org
@ 2010-11-02 23:56 ` pinskia at gcc dot gnu.org
2023-05-15 6:07 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2010-11-02 23:56 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46284
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> 2010-11-02 23:56:05 UTC ---
Testcase copied here:
/*
* File: Common/Range.h
* Author: ryden
* Based on code found at:
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg01854.html
*
*/
#ifndef RANGE_H
#define RANGE_H
template < int iMin, int iMax >
struct Range
{
struct iterator
{
iterator ( int val ) : v ( val ) {}
iterator operator++ () { iterator it ( v ); ++v; return it; }
iterator& operator++ (int) { ++v; return *this; }
bool operator!= ( const iterator& other ) const { return other.v != v;
}
int operator* () const { return v; }
private:
int v;
};
iterator begin () const { return iterator ( iMin ); }
iterator end () const { return iterator ( iMax + 1 ); }
};
#define range(min,max) Range<min,max>()
#define until(max) Range<0,(max)-1>()
#endif /* RANGE_H */
/*
* File: Common/Vector.h
* Author: ryden
*
*/
#ifndef VECTOR_H
#define VECTOR_H
#include <string>
#include <sstream>
template < typename T, unsigned int uiNumComponents >
struct VectorBase
{
typedef T __attribute__ ((vector_size(uiNumComponents*sizeof(T))))
vectorType;
union
{
T v [ uiNumComponents ];
vectorType vec;
};
};
template < typename T, unsigned int uiNumComponents >
class Vector : protected VectorBase < T, uiNumComponents >
{
public:
Vector ( )
{
for ( auto i : until(uiNumComponents) )
{
Vector::v[i] = 0;
}
}
Vector ( const T values[uiNumComponents] )
: Vector::v ( values )
{
}
Vector ( const std::initializer_list<T>& list )
{
for ( auto i : until(uiNumComponents) )
{
Vector::v[i] = list.begin ()[i];
}
}
Vector& operator= ( const Vector& Right )
{
Vector::vec = Right.vec;
return *this;
}
operator std::string () const
{
return *(*this);
}
std::string operator* () const
{
std::ostringstream ostr;
ostr << "( ";
for ( const T& elem : Vector::v )
{
ostr << elem << ", ";
}
ostr.seekp( -2, std::ios_base::cur );
ostr << " )";
return ostr.str();
}
Vector operator+ ( const Vector& Right ) const
{
Vector vecRet;
vecRet.vec = Vector::vec + Right.vec;
return vecRet;
}
Vector& operator+= ( const Vector& Right )
{
Vector::vec = Vector::vec + Right.vec;
return *this;
}
};
template < typename T >
class Vector4 : public Vector < T, 4 >
{
typedef Vector < T, 4 > parent;
public:
Vector4 ()
: parent ()
{
}
Vector4 ( const T& x, const T& y = 0, const T& z = 0, const T& w = 0 )
{
parent::v [ 0 ] = x;
parent::v [ 1 ] = y;
parent::v [ 2 ] = z;
parent::v [ 3 ] = w;
}
Vector4 ( const parent& Right )
: parent ( Right )
{
}
Vector4& operator= ( const parent& Right )
{
parent::operator= ( Right );
return *this;
}
T& x () { return parent::v[0]; }
const T& x () const { return parent::v[0]; }
T& y () { return parent::v[1]; }
const T& y () const { return parent::v[1]; }
T& z () { return parent::v[2]; }
const T& z () const { return parent::v[2]; }
T& w () { return parent::v[3]; }
const T& w () const { return parent::v[3]; }
};
typedef Vector4<short> Vector4s, vec4s;
typedef Vector4<float> Vector4f, vec4f;
typedef Vector4<double> Vector4d, vec4d;
#endif /* VECTOR_H */
#include <cstdio>
#include <cstdlib>
int main(int argc, char** argv)
{
vec4f v1 ( 1, 2, 3, 4 ), v2 ( 8, 7, 6, 5 ), v3;
v3 = v1 + v2;
// v3 += { 1, 0, 1, 0 };
printf ( "Result: %f %f %f %f\n", v3.x(), v3.y(), v3.z(), v3.w() );
return 0;
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/46284] Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes
2010-11-02 23:48 [Bug c++/46284] New: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes rydencillo at gmail dot com
2010-11-02 23:55 ` [Bug tree-optimization/46284] " pinskia at gcc dot gnu.org
2010-11-02 23:56 ` pinskia at gcc dot gnu.org
@ 2023-05-15 6:07 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-15 6:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46284
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |10.1.0, 11.3.0, 12.1.0,
| |13.1.0
Target Milestone|--- |10.0
Status|UNCONFIRMED |RESOLVED
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=28367,
| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=83518
Resolution|--- |FIXED
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Fully fixed in GCC 10 (by r10-1692-g38988cbf9ebaa9):
main:
subq $8, %rsp
movl $.LC1, %edi
movl $4, %eax
movsd .LC0(%rip), %xmm0
movapd %xmm0, %xmm3
movapd %xmm0, %xmm2
movapd %xmm0, %xmm1
call printf
Just loading the constants to call printf.
In GCC 7-9 (improved most likely by PR 28367), GCC produces:
movl $.LC0, %edi
movl $4, %eax
movdqa .LC2(%rip), %xmm1
movdqa .LC1(%rip), %xmm0
addps %xmm1, %xmm0
movaps %xmm0, %xmm3
movaps %xmm0, %xmm2
movaps %xmm0, %xmm1
shufps $255, %xmm0, %xmm3
unpckhps %xmm0, %xmm2
shufps $85, %xmm0, %xmm1
cvtss2sd %xmm0, %xmm0
cvtss2sd %xmm3, %xmm3
cvtss2sd %xmm2, %xmm2
cvtss2sd %xmm1, %xmm1
call printf
Which is not bad and much better than before in GCC 6 (comment #0's code is
listed there).
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-05-15 6:07 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-02 23:48 [Bug c++/46284] New: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes rydencillo at gmail dot com
2010-11-02 23:55 ` [Bug tree-optimization/46284] " pinskia at gcc dot gnu.org
2010-11-02 23:56 ` pinskia at gcc dot gnu.org
2023-05-15 6:07 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).