public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug inline-asm/38671]  New: [4.4 Regression] speed regression with sse intrinsics
@ 2008-12-30 12:58 tim at klingt dot org
  2008-12-30 12:59 ` [Bug inline-asm/38671] " tim at klingt dot org
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: tim at klingt dot org @ 2008-12-30 12:58 UTC (permalink / raw)
  To: gcc-bugs

i experience some speed regressions with gcc-4.4, with sse intrinsics on a
core2 (x86_64). the code is:

namespace detail
{
/** compute x1 * (1 + x2 * amount)  */
__m128 inline amp_mod4_loop(__m128 x1, __m128 x2, __m128 amount, __m128 one)
{
    return _mm_mul_ps(x1,
                      _mm_add_ps(one,
                                 _mm_mul_ps(x2, amount)));
}
} /* namespace detail */

template <>
inline void amp_mod4(float * out, const float * in1, const float * in2,
                     const float amount, unsigned int n)
{
    n = n >> 2;
    const __m128 one = detail::gen_one();
    const __m128 amnt = _mm_set_ps1(amount);

    do
    {
        const __m128 x1 = _mm_load_ps(in1);
        in1 += 4;
        const __m128 x2 = _mm_load_ps(in2);
        in2 += 4;

        const __m128 result = detail::amp_mod4_loop(x1, x2, amnt, one);

        _mm_store_ps(out, result);
        out += 4;
    }
    while (--n);
}

the results for different compilers (using hardware performance counters) are:
gcc-4.4:
cycles: 1416276094
branch misses: 425897

gcc-4.4 -march=core2:
cycles: 1520034636
branch misses: 3263912

gcc-4.3:
cycles: 1548838336
branch misses: 5990424

gcc-4.3 -march=core2:
cycles: 1386605444
branch misses: 5609

gcc-4.2:
cycles: 1321697674
branch misses: 3682

it seems that gcc-4.3 with -march core2 and gcc-4.2 generate code, which is
more friendly to the branch predictor. tuning for core2 on gcc-4.4 actually
seems to generate worse code.

the best code (gcc-4.2) is:
0000000000400de0 <bench_1_simd(unsigned int)>:
  400de0:       66 0f ef c0             pxor   %xmm0,%xmm0
  400de4:       c1 ef 02                shr    $0x2,%edi
  400de7:       0f 28 15 32 0f 00 00    movaps 0xf32(%rip),%xmm2        #
401d20 <_IO_stdin_used+0xb0>
  400dee:       31 c0                   xor    %eax,%eax
  400df0:       66 0f 76 c0             pcmpeqd %xmm0,%xmm0
  400df4:       66 0f 72 d0 19          psrld  $0x19,%xmm0
  400df9:       66 0f 72 f0 17          pslld  $0x17,%xmm0
  400dfe:       0f 28 c8                movaps %xmm0,%xmm1
  400e01:       0f 28 80 e0 26 60 00    movaps 0x6026e0(%rax),%xmm0
  400e08:       0f 59 c2                mulps  %xmm2,%xmm0
  400e0b:       0f 58 c1                addps  %xmm1,%xmm0
  400e0e:       0f 59 80 e0 25 60 00    mulps  0x6025e0(%rax),%xmm0
  400e15:       0f 29 80 e0 24 60 00    movaps %xmm0,0x6024e0(%rax)
  400e1c:       48 83 c0 10             add    $0x10,%rax
  400e20:       83 ef 01                sub    $0x1,%edi
  400e23:       75 dc                   jne    400e01 <bench_1_simd(unsigned
int)+0x21>
  400e25:       f3 c3                   repz retq 
  400e27:       90                      nop    
  400e28:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)

the worst code (gcc-4.4, -march=core2) is 15% slower:
0000000000400e70 <bench_1_simd(unsigned int)>:
  400e70:       66 0f ef d2             pxor   %xmm2,%xmm2
  400e74:       89 fa                   mov    %edi,%edx
  400e76:       66 0f 76 d2             pcmpeqd %xmm2,%xmm2
  400e7a:       c1 ea 02                shr    $0x2,%edx
  400e7d:       66 0f 72 d2 19          psrld  $0x19,%xmm2
  400e82:       ff ca                   dec    %edx
  400e84:       66 0f 72 f2 17          pslld  $0x17,%xmm2
  400e89:       48 ff c2                inc    %rdx
  400e8c:       0f 28 0d 7d 17 00 00    movaps 0x177d(%rip),%xmm1        #
402610 <_IO_stdin_used+0xb0>
  400e93:       48 c1 e2 04             shl    $0x4,%rdx
  400e97:       31 c0                   xor    %eax,%eax
  400e99:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  400ea0:       0f 28 c1                movaps %xmm1,%xmm0
  400ea3:       0f 59 80 e0 36 60 00    mulps  0x6036e0(%rax),%xmm0
  400eaa:       0f 58 c2                addps  %xmm2,%xmm0
  400ead:       0f 59 80 e0 35 60 00    mulps  0x6035e0(%rax),%xmm0
  400eb4:       0f 29 80 e0 34 60 00    movaps %xmm0,0x6034e0(%rax)
  400ebb:       48 83 c0 10             add    $0x10,%rax
  400ebf:       48 39 d0                cmp    %rdx,%rax
  400ec2:       75 dc                   jne    400ea0 <bench_1_simd(unsigned
int)+0x30>
  400ec4:       f3 c3                   repz retq 
  400ec6:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  400ecd:       00 00 00


-- 
           Summary: [4.4 Regression] speed regression with sse intrinsics
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: inline-asm
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tim at klingt dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug inline-asm/38671] [4.4 Regression] speed regression with sse intrinsics
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
@ 2008-12-30 12:59 ` tim at klingt dot org
  2008-12-30 16:23 ` [Bug target/38671] " pinskia at gcc dot gnu dot org
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: tim at klingt dot org @ 2008-12-30 12:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from tim at klingt dot org  2008-12-30 12:58 -------
Created an attachment (id=17013)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17013&action=view)
preprocessed source (gcc-4.4)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/38671] [4.4 Regression] speed regression with sse intrinsics
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
  2008-12-30 12:59 ` [Bug inline-asm/38671] " tim at klingt dot org
@ 2008-12-30 16:23 ` pinskia at gcc dot gnu dot org
  2008-12-31  7:50 ` pinskia at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-12-30 16:23 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|inline-asm                  |target
 GCC target triplet|                            |i?86-*-*
           Keywords|                            |missed-optimization
   Target Milestone|---                         |4.4.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/38671] [4.4 Regression] speed regression with sse intrinsics
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
  2008-12-30 12:59 ` [Bug inline-asm/38671] " tim at klingt dot org
  2008-12-30 16:23 ` [Bug target/38671] " pinskia at gcc dot gnu dot org
@ 2008-12-31  7:50 ` pinskia at gcc dot gnu dot org
  2008-12-31  7:57 ` pinskia at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-12-31  7:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from pinskia at gcc dot gnu dot org  2008-12-31 07:47 -------
sys_perf_counter_open always returns less than zero for me. 
This is with:
Linux gcc13 2.6.18-6-vserver-amd64 #1 SMP Sun Feb 10 17:55:04 UTC 2008 x86_64
GNU/Linux

What system call is it trying to do and why?


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 GCC target triplet|i?86-*-*                    |x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/38671] [4.4 Regression] speed regression with sse intrinsics
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (2 preceding siblings ...)
  2008-12-31  7:50 ` pinskia at gcc dot gnu dot org
@ 2008-12-31  7:57 ` pinskia at gcc dot gnu dot org
  2008-12-31  8:11 ` [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops pinskia at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-12-31  7:57 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 456 bytes --]



------- Comment #3 from pinskia at gcc dot gnu dot org  2008-12-31 07:56 -------
t.cc: In function ‘float __vector__ nova::detail::gen_one()’:
t.cc:34160: warning: ‘x’ is used uninitialized in this function

inline __m128 gen_one(void)
{
    __m128i x;
    __m128i ones = _mm_cmpeq_epi32(x, x);
    return (__m128)_mm_slli_epi32 (_mm_srli_epi32(ones, 25), 23);
}

Is undefined code I think.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (3 preceding siblings ...)
  2008-12-31  7:57 ` pinskia at gcc dot gnu dot org
@ 2008-12-31  8:11 ` pinskia at gcc dot gnu dot org
  2008-12-31  8:14 ` [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits) pinskia at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-12-31  8:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from pinskia at gcc dot gnu dot org  2008-12-31 08:10 -------
  D.45587 = VIEW_CONVERT_EXPR<__v4si>(x);
  D.45589 = __builtin_ia32_pcmpeqd128 (D.45587, D.45587);
  D.45591 = __builtin_ia32_psrldi128 (D.45589, 25);
  D.45594 = __builtin_ia32_pslldi128 (D.45591, 23);
  one = VIEW_CONVERT_EXPR<__m128>(VIEW_CONVERT_EXPR<__m128i>(D.45594));
  D.45644 = (long unsigned int) ((n >> 2) + 4294967295) + 1 * 16;
  ivtmp.516 = 0;


So the inner loop is not the issue, only the setup code.

The extra subtract/add comes from D.45644.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |middle-end
            Summary|[4.4 Regression] speed      |[4.4 Regression] extra code
                   |regression with sse         |for setting up loops
                   |intrinsics                  |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits)
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (4 preceding siblings ...)
  2008-12-31  8:11 ` [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops pinskia at gcc dot gnu dot org
@ 2008-12-31  8:14 ` pinskia at gcc dot gnu dot org
  2008-12-31  9:21 ` tim at klingt dot org
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-12-31  8:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from pinskia at gcc dot gnu dot org  2008-12-31 08:12 -------
Confirmed, though I don't have a fully reduced testcase yet.  Basically it
comes down to using unsigned int rather than size_t.  If you had used size_t as
the index, the code would have worked correctly.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2008-12-31 08:12:50
               date|                            |
            Summary|[4.4 Regression] extra code |[4.4 Regression] extra code
                   |for setting up loops        |for setting up loops (IV-
                   |                            |opts and 32bits vs 64bits)


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits)
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (5 preceding siblings ...)
  2008-12-31  8:14 ` [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits) pinskia at gcc dot gnu dot org
@ 2008-12-31  9:21 ` tim at klingt dot org
  2009-01-05 11:28 ` rguenth at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: tim at klingt dot org @ 2008-12-31  9:21 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]



------- Comment #6 from tim at klingt dot org  2008-12-31 09:20 -------
> sys_perf_counter_open always returns less than zero for me. 
> This is with:
> Linux gcc13 2.6.18-6-vserver-amd64 #1 SMP Sun Feb 10 17:55:04 UTC 2008 x86_64
> GNU/Linux
> 
> What system call is it trying to do and why?
> 

it is trying to open the performance counters
(http://lwn.net/Articles/310176/). it requires a patched kernel, though ...


(In reply to comment #3)
> t.cc: In function �float __vector__ nova::detail::gen_one()�:
> t.cc:34160: warning: �x� is used uninitialized in this function
> 
> inline __m128 gen_one(void)
> {
>     __m128i x;
>     __m128i ones = _mm_cmpeq_epi32(x, x);
>     return (__m128)_mm_slli_epi32 (_mm_srli_epi32(ones, 25), 23);
> }
> 
> Is undefined code I think.

this code is valid. the uninitialized xmm register x is compared with itself in
order to set the register ones to ffffffffffffffffffffffffffffffff.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits)
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (6 preceding siblings ...)
  2008-12-31  9:21 ` tim at klingt dot org
@ 2009-01-05 11:28 ` rguenth at gcc dot gnu dot org
  2009-04-21 16:00 ` [Bug middle-end/38671] [4.4/4.5 " jakub at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-01-05 11:28 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/38671] [4.4/4.5 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits)
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (7 preceding siblings ...)
  2009-01-05 11:28 ` rguenth at gcc dot gnu dot org
@ 2009-04-21 16:00 ` jakub at gcc dot gnu dot org
  2009-07-22 10:35 ` jakub at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-04-21 16:00 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.0                       |4.4.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/38671] [4.4/4.5 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits)
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (8 preceding siblings ...)
  2009-04-21 16:00 ` [Bug middle-end/38671] [4.4/4.5 " jakub at gcc dot gnu dot org
@ 2009-07-22 10:35 ` jakub at gcc dot gnu dot org
  2009-10-15 12:54 ` jakub at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-07-22 10:35 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.1                       |4.4.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/38671] [4.4/4.5 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits)
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (9 preceding siblings ...)
  2009-07-22 10:35 ` jakub at gcc dot gnu dot org
@ 2009-10-15 12:54 ` jakub at gcc dot gnu dot org
  2010-01-21 13:16 ` jakub at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-10-15 12:54 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.2                       |4.4.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/38671] [4.4/4.5 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits)
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (10 preceding siblings ...)
  2009-10-15 12:54 ` jakub at gcc dot gnu dot org
@ 2010-01-21 13:16 ` jakub at gcc dot gnu dot org
  2010-03-01 23:32 ` [Bug middle-end/38671] [4.3/4.4/4.5 " pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-01-21 13:16 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.3                       |4.4.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/38671] [4.3/4.4/4.5 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits)
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (11 preceding siblings ...)
  2010-01-21 13:16 ` jakub at gcc dot gnu dot org
@ 2010-03-01 23:32 ` pinskia at gcc dot gnu dot org
  2010-03-01 23:35 ` [Bug middle-end/38671] [4.3/4.4/4.5 Regression] selecting one IV instead of three pinskia at gcc dot gnu dot org
  2010-04-30  9:25 ` [Bug middle-end/38671] [4.3/4.4/4.5/4.6 " jakub at gcc dot gnu dot org
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2010-03-01 23:32 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from pinskia at gcc dot gnu dot org  2010-03-01 23:32 -------
Still getting:
  D.41749_31 = n_13 + 4294967295;
  D.41750_32 = (long unsigned int) D.41749_31;
  D.41751_49 = D.41750_32 + 1;

Reduced testcase:
int f(int *a, int n, int *b)
{
  n = n >> 2;
  do {
   *b = *a;
    a += 4;
   b += 4;
  } while (--n);
}

--- CUT ---
I want to say this was introduced by POINTER plus work :(.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |4.3.2 4.5.0
      Known to work|                            |4.2.4
            Summary|[4.4/4.5 Regression] extra  |[4.3/4.4/4.5 Regression]
                   |code for setting up loops   |extra code for setting up
                   |(IV-opts and 32bits vs      |loops (IV-opts and 32bits vs
                   |64bits)                     |64bits)


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/38671] [4.3/4.4/4.5 Regression] selecting one IV instead of three
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (12 preceding siblings ...)
  2010-03-01 23:32 ` [Bug middle-end/38671] [4.3/4.4/4.5 " pinskia at gcc dot gnu dot org
@ 2010-03-01 23:35 ` pinskia at gcc dot gnu dot org
  2010-04-30  9:25 ` [Bug middle-end/38671] [4.3/4.4/4.5/4.6 " jakub at gcc dot gnu dot org
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2010-03-01 23:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from pinskia at gcc dot gnu dot org  2010-03-01 23:34 -------
For 4.2, we use three IVs; while from 4.3 and above, we use one IV.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[4.3/4.4/4.5 Regression]    |[4.3/4.4/4.5 Regression]
                   |extra code for setting up   |selecting one IV instead of
                   |loops (IV-opts and 32bits vs|three
                   |64bits)                     |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/38671] [4.3/4.4/4.5/4.6 Regression] selecting one IV instead of three
  2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
                   ` (13 preceding siblings ...)
  2010-03-01 23:35 ` [Bug middle-end/38671] [4.3/4.4/4.5 Regression] selecting one IV instead of three pinskia at gcc dot gnu dot org
@ 2010-04-30  9:25 ` jakub at gcc dot gnu dot org
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-30  9:25 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.4                       |4.4.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-04-30  8:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-30 12:58 [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics tim at klingt dot org
2008-12-30 12:59 ` [Bug inline-asm/38671] " tim at klingt dot org
2008-12-30 16:23 ` [Bug target/38671] " pinskia at gcc dot gnu dot org
2008-12-31  7:50 ` pinskia at gcc dot gnu dot org
2008-12-31  7:57 ` pinskia at gcc dot gnu dot org
2008-12-31  8:11 ` [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops pinskia at gcc dot gnu dot org
2008-12-31  8:14 ` [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits) pinskia at gcc dot gnu dot org
2008-12-31  9:21 ` tim at klingt dot org
2009-01-05 11:28 ` rguenth at gcc dot gnu dot org
2009-04-21 16:00 ` [Bug middle-end/38671] [4.4/4.5 " jakub at gcc dot gnu dot org
2009-07-22 10:35 ` jakub at gcc dot gnu dot org
2009-10-15 12:54 ` jakub at gcc dot gnu dot org
2010-01-21 13:16 ` jakub at gcc dot gnu dot org
2010-03-01 23:32 ` [Bug middle-end/38671] [4.3/4.4/4.5 " pinskia at gcc dot gnu dot org
2010-03-01 23:35 ` [Bug middle-end/38671] [4.3/4.4/4.5 Regression] selecting one IV instead of three pinskia at gcc dot gnu dot org
2010-04-30  9:25 ` [Bug middle-end/38671] [4.3/4.4/4.5/4.6 " jakub at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).