[Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/39840]  New: Non-optimal (or wrong) implementation of SSE intrinsics
@ 2009-04-21 19:05 drepper at redhat dot com
  2009-04-21 19:08 ` [Bug middle-end/39840] " hjl dot tools at gmail dot com
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: drepper at redhat dot com @ 2009-04-21 19:05 UTC (permalink / raw)
  To: gcc-bugs

The implementations of the SSE intrinsics for x86 and x86-64 in gcc is tied to
the use of an appropriate -m option, such as -mssse3 or -mavx.  This is
different from what icc does and it prevents code from being written in the
most natural form.  This is nothing new in gcc 4.4, it has been the behavior of
gcc forever, as far as I can see.  But especially the introduction of AVX
brings this problem to the foreground.

As an example, assume I want to write a vector class with the usual operations.
 I can write code like this:

#ifdef __AVX__
vec<float,N> operator+(vec<float,N> &a, vec<float,N> &b) {
  ... use AVX intrinsics ...
}
#elif defined __SSE4__
vec<float,N> operator+(vec<float,N> &a, vec<float,N> &b) {
  ... use SSE4 intrinsics ...
}
#elif defined __SSE2__
vec<float,N> operator+(vec<float,N> &a, vec<float,N> &b) {
  ... use SSE2 intrinsics ...
}
#else
vec<float,N> operator+(vec<float,N> &a, vec<float,N> &b) {
  ... generic implementation ...
}
#endif

But this means, of course, that the binary has to be compiled for every single
target and the correct one has to be chosen.  This is not attractive or
practical.  Chances are that only a generic implementation will be available.

It would be better to have a self-optimizing implementation:

vec<float,N> operator+(vec<float,N> &a, vec<float,N> &b) {
  if (AVX is available)
    ... use AVX intrinsics ...
  else if (SSE4 is available)
    ... use SSE4 intrinsics ...
  else if (SSE2 is available)
    ... use SSE2 intrinsics ...
  else
    ... generic implementation ...
}

This is possible with icc.  It is not possible with gcc in the moment.  For gcc
I would have to split the implementation of all the variants in individual
files and then, in the template function as seen above, these implementations
would have to be called.  Even if as in this case it might be doable (but
terribly inconvenient) there are situations where this is really impractical or
impossible.

The problem is that to be able to use the AVX intrinsics the compiler has to be
passed -mavx (all other extensions are implied in -mavx).   But this flag has
another consequence: the compiler will now take advantage of the new
instructions in AVX and generate for unrelated code not associated with
intrinsics (e.g., an inlined memset implementation).  The result is that such a
binary will fail to run on anything but an AVX-enabled machine.

In icc the -mavx flag exclusively controls the code generation (i.e., whether
AVX is used in inlined memset etc).  The SSE intrinsics and all the associated
data types are _always_ defined as soon as <immintrin.h> is included.

This means the exmaple code above would be compiled with an -m parameter for
the minimum ISA to support and still the AVX, SSE4, ... intrinsics are
available.

gcc should follow icc's way of handling the intrinsics.  Since all this
intrinsic business comes from icc I consider this a bug in gcc's implementation
instead of an enhancement request.

-- 
           Summary: Non-optimal (or wrong) implementation of SSE intrinsics
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: drepper at redhat dot com
GCC target triplet: i?86-* x86_64-*

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
  2009-04-21 19:05 [Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics drepper at redhat dot com
@ 2009-04-21 19:08 ` hjl dot tools at gmail dot com
  2009-04-21 19:38 ` drepper at redhat dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: hjl dot tools at gmail dot com @ 2009-04-21 19:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from hjl dot tools at gmail dot com  2009-04-21 19:07 -------
Please provide some sample code which can be compiled.


-- 

hjl dot tools at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
  2009-04-21 19:05 [Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics drepper at redhat dot com
  2009-04-21 19:08 ` [Bug middle-end/39840] " hjl dot tools at gmail dot com
@ 2009-04-21 19:38 ` drepper at redhat dot com
  2009-04-21 19:41   ` Andrew Thomas Pinski
  2009-04-21 19:42 ` pinskia at gmail dot com
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 10+ messages in thread
From: drepper at redhat dot com @ 2009-04-21 19:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from drepper at redhat dot com  2009-04-21 19:37 -------
[I couldn't attach the code as an attachment, bugzilla has a bug.]

The program below has to be compiled with -mavx to allow the AVX intrinsics
being used.  But this also triggers using the use of the vmovss instruction to
load the parameter for the sin() call from memory.

(Forget the reference to memset in the original report, it's as simple as
passing floating point parameters that triggers the problem.)

#include <math.h>
#include <stdio.h>
#include <immintrin.h>


static unsigned int eax, ebx, ecx, edx;


static int
has_avx (void)
{
  if ((ecx & (1 << 27)) == 0)
    /* No OSXSAVE.  */
    return 0;

  unsigned int feat_eax, feat_edx;
  asm ("xgetbv" : "=a" (feat_eax), "=d" (feat_edx) : "c" (0));
  if ((feat_eax & 6) != 6)
    return 0;

  return (ecx & (1 << 28)) != 0;
}


template <typename T, int N>
struct vec {
  union {
    T n[N];
    __v4sf f[N / (sizeof (__v4sf) / sizeof (T))];
    __v8sf fa[N / (sizeof (__v8sf) / sizeof (T))];
  };
};


template <typename T, int N>
T
optscalar(const vec<T,N> &src1, const vec<T,N> &src2)
{
  T r = 0;
  for (int i = 0; i < N; ++i)
    r += src1[i] * src2[i];
  return r;
}


template <int N>
float
optscalar(const vec<float,N> &src1, const vec<float,N> &src2)
{
  if (has_avx ())
    {
      __m256 tmp = _mm256_setzero_ps ();
      for (int i = 0; i < N / 8; ++i)
        tmp = _mm256_add_ps (tmp, _mm256_mul_ps (src1.fa[i], src2.fa[i]));
      tmp = _mm256_hadd_ps (tmp, tmp);
      tmp = _mm256_hadd_ps (tmp, tmp);
      tmp = _mm256_hadd_ps (tmp, tmp);
      union
      {
        __m256 v;
        float a[8];
      } cvt = { tmp };
      return cvt.a[0];
    }
  else
    {
      __m128 tmp = _mm_setzero_ps ();
      for (int i = 0; i < N / 4; ++i)
        tmp = _mm_add_ps (tmp, _mm_mul_ps (src1.f[i], src2.f[i]));
      tmp = _mm_hadd_ps (tmp, tmp);
      tmp = _mm_hadd_ps (tmp, tmp);
      return __builtin_ia32_vec_ext_v4sf (tmp, 0);
    }
}


#define N 100000
#define DEF(type) vec<type,N> v##type##1, v##type##2; type type##res, type##cmp
DEF(float);

float g;

int
main ()
{
  float f = sinf  (g);
  printf ("%g\n", f);

  asm volatile ("cpuid"
                : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
                : "0" (1));

  float floatres = optscalar (vfloat1, vfloat2);
  printf ("%g\n", floatres);

  return 0;
}


-- 

drepper at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |UNCONFIRMED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
  2009-04-21 19:38 ` drepper at redhat dot com
@ 2009-04-21 19:41   ` Andrew Thomas Pinski
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Thomas Pinski @ 2009-04-21 19:41 UTC (permalink / raw)
  To: gcc-bugzilla; +Cc: gcc-bugs

Gcc 4.4 and above supports different target options on the function  
level but not on a basic block level. So you can create an interneral  
version for AVX.

Sent from my iPhone

On Apr 21, 2009, at 12:37 PM, "drepper at redhat dot com" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #2 from drepper at redhat dot com  2009-04-21 19:37  
> -------
> [I couldn't attach the code as an attachment, bugzilla has a bug.]
>
> The program below has to be compiled with -mavx to allow the AVX  
> intrinsics
> being used.  But this also triggers using the use of the vmovss  
> instruction to
> load the parameter for the sin() call from memory.
>
> (Forget the reference to memset in the original report, it's as  
> simple as
> passing floating point parameters that triggers the problem.)
>
> #include <math.h>
> #include <stdio.h>
> #include <immintrin.h>
>
>
> static unsigned int eax, ebx, ecx, edx;
>
>
> static int
> has_avx (void)
> {
>  if ((ecx & (1 << 27)) == 0)
>    /* No OSXSAVE.  */
>    return 0;
>
>  unsigned int feat_eax, feat_edx;
>  asm ("xgetbv" : "=a" (feat_eax), "=d" (feat_edx) : "c" (0));
>  if ((feat_eax & 6) != 6)
>    return 0;
>
>  return (ecx & (1 << 28)) != 0;
> }
>
>
> template <typename T, int N>
> struct vec {
>  union {
>    T n[N];
>    __v4sf f[N / (sizeof (__v4sf) / sizeof (T))];
>    __v8sf fa[N / (sizeof (__v8sf) / sizeof (T))];
>  };
> };
>
>
> template <typename T, int N>
> T
> optscalar(const vec<T,N> &src1, const vec<T,N> &src2)
> {
>  T r = 0;
>  for (int i = 0; i < N; ++i)
>    r += src1[i] * src2[i];
>  return r;
> }
>
>
> template <int N>
> float
> optscalar(const vec<float,N> &src1, const vec<float,N> &src2)
> {
>  if (has_avx ())
>    {
>      __m256 tmp = _mm256_setzero_ps ();
>      for (int i = 0; i < N / 8; ++i)
>        tmp = _mm256_add_ps (tmp, _mm256_mul_ps (src1.fa[i],  
> src2.fa[i]));
>      tmp = _mm256_hadd_ps (tmp, tmp);
>      tmp = _mm256_hadd_ps (tmp, tmp);
>      tmp = _mm256_hadd_ps (tmp, tmp);
>      union
>      {
>        __m256 v;
>        float a[8];
>      } cvt = { tmp };
>      return cvt.a[0];
>    }
>  else
>    {
>      __m128 tmp = _mm_setzero_ps ();
>      for (int i = 0; i < N / 4; ++i)
>        tmp = _mm_add_ps (tmp, _mm_mul_ps (src1.f[i], src2.f[i]));
>      tmp = _mm_hadd_ps (tmp, tmp);
>      tmp = _mm_hadd_ps (tmp, tmp);
>      return __builtin_ia32_vec_ext_v4sf (tmp, 0);
>    }
> }
>
>
> #define N 100000
> #define DEF(type) vec<type,N> v##type##1, v##type##2; type  
> type##res, type##cmp
> DEF(float);
>
> float g;
>
> int
> main ()
> {
>  float f = sinf  (g);
>  printf ("%g\n", f);
>
>  asm volatile ("cpuid"
>                : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
>                : "0" (1));
>
>  float floatres = optscalar (vfloat1, vfloat2);
>  printf ("%g\n", floatres);
>
>  return 0;
> }
>
>
> -- 
>
> drepper at redhat dot com changed:
>
>           What    |Removed                     |Added
> --- 
> --- 
> ----------------------------------------------------------------------
>             Status|WAITING                     |UNCONFIRMED
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
  2009-04-21 19:05 [Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics drepper at redhat dot com
  2009-04-21 19:08 ` [Bug middle-end/39840] " hjl dot tools at gmail dot com
  2009-04-21 19:38 ` drepper at redhat dot com
@ 2009-04-21 19:42 ` pinskia at gmail dot com
  2009-04-21 19:51 ` drepper at redhat dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: pinskia at gmail dot com @ 2009-04-21 19:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gmail dot com  2009-04-21 19:41 -------
Subject: Re:  Non-optimal (or wrong) implementation of SSE intrinsics

Gcc 4.4 and above supports different target options on the function  
level but not on a basic block level. So you can create an interneral  
version for AVX.

Sent from my iPhone

On Apr 21, 2009, at 12:37 PM, "drepper at redhat dot com"
<gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #2 from drepper at redhat dot com  2009-04-21 19:37  
> -------
> [I couldn't attach the code as an attachment, bugzilla has a bug.]
>
> The program below has to be compiled with -mavx to allow the AVX  
> intrinsics
> being used.  But this also triggers using the use of the vmovss  
> instruction to
> load the parameter for the sin() call from memory.
>
> (Forget the reference to memset in the original report, it's as  
> simple as
> passing floating point parameters that triggers the problem.)
>
> #include <math.h>
> #include <stdio.h>
> #include <immintrin.h>
>
>
> static unsigned int eax, ebx, ecx, edx;
>
>
> static int
> has_avx (void)
> {
>  if ((ecx & (1 << 27)) == 0)
>    /* No OSXSAVE.  */
>    return 0;
>
>  unsigned int feat_eax, feat_edx;
>  asm ("xgetbv" : "=a" (feat_eax), "=d" (feat_edx) : "c" (0));
>  if ((feat_eax & 6) != 6)
>    return 0;
>
>  return (ecx & (1 << 28)) != 0;
> }
>
>
> template <typename T, int N>
> struct vec {
>  union {
>    T n[N];
>    __v4sf f[N / (sizeof (__v4sf) / sizeof (T))];
>    __v8sf fa[N / (sizeof (__v8sf) / sizeof (T))];
>  };
> };
>
>
> template <typename T, int N>
> T
> optscalar(const vec<T,N> &src1, const vec<T,N> &src2)
> {
>  T r = 0;
>  for (int i = 0; i < N; ++i)
>    r += src1[i] * src2[i];
>  return r;
> }
>
>
> template <int N>
> float
> optscalar(const vec<float,N> &src1, const vec<float,N> &src2)
> {
>  if (has_avx ())
>    {
>      __m256 tmp = _mm256_setzero_ps ();
>      for (int i = 0; i < N / 8; ++i)
>        tmp = _mm256_add_ps (tmp, _mm256_mul_ps (src1.fa[i],  
> src2.fa[i]));
>      tmp = _mm256_hadd_ps (tmp, tmp);
>      tmp = _mm256_hadd_ps (tmp, tmp);
>      tmp = _mm256_hadd_ps (tmp, tmp);
>      union
>      {
>        __m256 v;
>        float a[8];
>      } cvt = { tmp };
>      return cvt.a[0];
>    }
>  else
>    {
>      __m128 tmp = _mm_setzero_ps ();
>      for (int i = 0; i < N / 4; ++i)
>        tmp = _mm_add_ps (tmp, _mm_mul_ps (src1.f[i], src2.f[i]));
>      tmp = _mm_hadd_ps (tmp, tmp);
>      tmp = _mm_hadd_ps (tmp, tmp);
>      return __builtin_ia32_vec_ext_v4sf (tmp, 0);
>    }
> }
>
>
> #define N 100000
> #define DEF(type) vec<type,N> v##type##1, v##type##2; type  
> type##res, type##cmp
> DEF(float);
>
> float g;
>
> int
> main ()
> {
>  float f = sinf  (g);
>  printf ("%g\n", f);
>
>  asm volatile ("cpuid"
>                : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
>                : "0" (1));
>
>  float floatres = optscalar (vfloat1, vfloat2);
>  printf ("%g\n", floatres);
>
>  return 0;
> }
>
>
> -- 
>
> drepper at redhat dot com changed:
>
>           What    |Removed                     |Added
> --- 
> --- 
> ----------------------------------------------------------------------
>             Status|WAITING                     |UNCONFIRMED
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
  2009-04-21 19:05 [Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics drepper at redhat dot com
                   ` (2 preceding siblings ...)
  2009-04-21 19:42 ` pinskia at gmail dot com
@ 2009-04-21 19:51 ` drepper at redhat dot com
  2009-04-21 20:34 ` hjl dot tools at gmail dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: drepper at redhat dot com @ 2009-04-21 19:51 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #4 from drepper at redhat dot com  2009-04-21 19:51 -------
(In reply to comment #3)
> Gcc 4.4 and above supports different target options on the function  
> level but not on a basic block level. So you can create an interneral  
> version for AVX.

This doesn't work either.  Aside from being also impractical.

First, you'd have to switch to AVX mode, in this case, to include
<immintrin.h>.  How do you switch back to what was used before?  How to even
determine it?

Even if you can, try it, and you'll see that gcc is horribly broken when it
comes to the target("...") attributes.  In the current Fedora 11 compiler (4.4)
all target options are apparently turned off and none of the intrinsics work at
all.

Even if the necessary support would be added and the bugs fixed it still
differs from icc (where all this comes from) and not in a nice way.  To the
contrary, it's much, much more complicated.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
  2009-04-21 19:05 [Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics drepper at redhat dot com
                   ` (3 preceding siblings ...)
  2009-04-21 19:51 ` drepper at redhat dot com
@ 2009-04-21 20:34 ` hjl dot tools at gmail dot com
  2009-04-21 21:57 ` hjl dot tools at gmail dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: hjl dot tools at gmail dot com @ 2009-04-21 20:34 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #5 from hjl dot tools at gmail dot com  2009-04-21 20:34 -------
Created an attachment (id=17667)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17667&action=view)
An example

I am enclosing a modified example which can be compiled with both
icc and gcc. I also included assembly codes generated by "icc -O2"
and "gcc -avx -O2". Icc generates:

  54:   c5 ff 7c c8             vhaddps %ymm0,%ymm0,%ymm1
  58:   c5 f7 7c d1             vhaddps %ymm1,%ymm1,%ymm2
  5c:   c5 ef 7c da             vhaddps %ymm2,%ymm2,%ymm3
  60:   c5 fc 29 5c 24 e0       vmovaps %ymm3,-0x20(%rsp)
  66:   f3 0f 10 44 24 e0       movss  -0x20(%rsp),%xmm0

for

if (has_avx ())
 {
   ...
 }

There is

f3 0f 10 44 24 e0       movss  -0x20(%rsp),%xmm0

although this code will only run on AVX targets. Since we don't
support basic block optimization, I don't see how we can avoid
SSE instructions in AVX code path. The best option I can think
of is function level optimization. But as we all know, function
level optimization isn't usable, as least in this context. I
think we should go back and another look at function level
optimization. We should do it right this time. I have some
ideas in

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37565

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
  2009-04-21 19:05 [Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics drepper at redhat dot com
                   ` (4 preceding siblings ...)
  2009-04-21 20:34 ` hjl dot tools at gmail dot com
@ 2009-04-21 21:57 ` hjl dot tools at gmail dot com
  2009-04-22  9:37 ` rguenth at gcc dot gnu dot org
  2009-04-22 13:58 ` hjl dot tools at gmail dot com
  7 siblings, 0 replies; 10+ messages in thread
From: hjl dot tools at gmail dot com @ 2009-04-21 21:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from hjl dot tools at gmail dot com  2009-04-21 21:56 -------
Created an attachment (id=17668)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17668&action=view)
An eample

Here is an example for gcc 4.4. If function level optimization works,
we don't need separate files for AVX and SSE3.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
  2009-04-21 19:05 [Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics drepper at redhat dot com
                   ` (5 preceding siblings ...)
  2009-04-21 21:57 ` hjl dot tools at gmail dot com
@ 2009-04-22  9:37 ` rguenth at gcc dot gnu dot org
  2009-04-22 13:58 ` hjl dot tools at gmail dot com
  7 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-04-22  9:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from rguenth at gcc dot gnu dot org  2009-04-22 09:36 -------
The problem with different instruction sets in different BBs is also how to
avoid code motion across them.  IMNSHO this is a bad idea.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
  2009-04-21 19:05 [Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics drepper at redhat dot com
                   ` (6 preceding siblings ...)
  2009-04-22  9:37 ` rguenth at gcc dot gnu dot org
@ 2009-04-22 13:58 ` hjl dot tools at gmail dot com
  7 siblings, 0 replies; 10+ messages in thread
From: hjl dot tools at gmail dot com @ 2009-04-22 13:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from hjl dot tools at gmail dot com  2009-04-22 13:58 -------
(In reply to comment #7)
> The problem with different instruction sets in different BBs is also how to
> avoid code motion across them.  IMNSHO this is a bad idea.
> 

I agree. There are too many issues with it. I'd like to see
function level optimization work properly for all cases.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-04-22 13:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-21 19:05 [Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics drepper at redhat dot com
2009-04-21 19:08 ` [Bug middle-end/39840] " hjl dot tools at gmail dot com
2009-04-21 19:38 ` drepper at redhat dot com
2009-04-21 19:41   ` Andrew Thomas Pinski
2009-04-21 19:42 ` pinskia at gmail dot com
2009-04-21 19:51 ` drepper at redhat dot com
2009-04-21 20:34 ` hjl dot tools at gmail dot com
2009-04-21 21:57 ` hjl dot tools at gmail dot com
2009-04-22  9:37 ` rguenth at gcc dot gnu dot org
2009-04-22 13:58 ` hjl dot tools at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).