[Bug target/38825] New: missed optimization: register renaming in unrolled loop

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/38825]  New: missed optimization: register renaming in unrolled loop
@ 2009-01-13 11:40 tim at klingt dot org
  2009-01-13 15:09 ` [Bug target/38825] " rguenth at gcc dot gnu dot org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: tim at klingt dot org @ 2009-01-13 11:40 UTC (permalink / raw)
  To: gcc-bugs

the following two functions are equivalent, adding a scalar to a vector, using
a manual loop unrolling of 8 (2 sse vectors).

the first function serializes the operation, while the second function
interleaves the instructions for two operations:

void bench_3(float * out, float * in, float f, unsigned int n)
{
    n /= 8;
    __m128 scalar = _mm_set_ps1(f);
    do
    {
        __m128 arg = _mm_load_ps(in);
        __m128 result = _mm_add_ps(arg, scalar);
        _mm_store_ps(out, result);

        arg = _mm_load_ps(in+4);
        result = _mm_add_ps(arg, scalar);
        _mm_store_ps(out+4, result);
        in += 8;
        out += 8;
    }
    while (--n);
}

with the generated code:
.L13:
        movaps  (%rsi,%rax), %xmm0
        addps   %xmm1, %xmm0
        movaps  %xmm0, (%rdi,%rax)
        movaps  16(%rsi,%rax), %xmm0
        addps   %xmm1, %xmm0
        movaps  %xmm0, 16(%rdi,%rax)
        addq    $32, %rax
        cmpq    %rdx, %rax
        jne     .L13


void bench_4(float * out, float * in, float f, unsigned int n)
{
    n /= 8;
    __m128 scalar = _mm_set_ps1(f);
    do
    {
        __m128 arg  = _mm_load_ps(in);
        __m128 arg2 = _mm_load_ps(in+4);
        __m128 result  = _mm_add_ps(arg, scalar);
        __m128 result2 = _mm_add_ps(arg2, scalar);
        _mm_store_ps(out, result);
        _mm_store_ps(out+4, result2);
        in += 8;
        out += 8;
    }
    while (--n);
}

generated code:
.L9:
        movaps  (%rsi,%rax), %xmm0
        movaps  16(%rsi,%rax), %xmm1
        addps   %xmm2, %xmm0
        addps   %xmm2, %xmm1
        movaps  %xmm0, (%rdi,%rax)
        movaps  %xmm1, 16(%rdi,%rax)
        addq    $32, %rax
        cmpq    %rdx, %rax
        jne     .L9

the interleaved code outperforms the sequential code by about 12% on
x86_64/core2, possibly, because the instruction pairs (load/add/store) don't
have any data dependencies.
it would be nice, if gcc could do a register renaming and instruction
reordering on the first function to generate the same instructions than the
second function.


-- 
           Summary: missed optimization: register renaming in unrolled loop
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tim at klingt dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/38825] missed optimization: register renaming in unrolled loop
  2009-01-13 11:40 [Bug target/38825] New: missed optimization: register renaming in unrolled loop tim at klingt dot org
@ 2009-01-13 15:09 ` rguenth at gcc dot gnu dot org
  2009-01-13 15:16 ` rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-01-13 15:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-01-13 15:08 -------
Try -frename-registers.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/38825] missed optimization: register renaming in unrolled loop
  2009-01-13 11:40 [Bug target/38825] New: missed optimization: register renaming in unrolled loop tim at klingt dot org
  2009-01-13 15:09 ` [Bug target/38825] " rguenth at gcc dot gnu dot org
@ 2009-01-13 15:16 ` rguenth at gcc dot gnu dot org
  2009-01-13 15:26 ` tim at klingt dot org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-01-13 15:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2009-01-13 15:15 -------
Note that your testcase has moved the load _mm_load_ps(in+4); before the
store _mm_store_ps(out, result); which the compiler cannot do itself because
they may alias.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/38825] missed optimization: register renaming in unrolled loop
  2009-01-13 11:40 [Bug target/38825] New: missed optimization: register renaming in unrolled loop tim at klingt dot org
  2009-01-13 15:09 ` [Bug target/38825] " rguenth at gcc dot gnu dot org
  2009-01-13 15:16 ` rguenth at gcc dot gnu dot org
@ 2009-01-13 15:26 ` tim at klingt dot org
  2009-01-13 15:45 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tim at klingt dot org @ 2009-01-13 15:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from tim at klingt dot org  2009-01-13 15:26 -------
(In reply to comment #1)
> Try -frename-registers.

i forgot to mention: the binaries are compiled with -O3 -mfpmath=sse -msse
(4.2, 4.3 and 4.4).

-frename-registers is enabled by -O3

(In reply to comment #2)
> Note that your testcase has moved the load _mm_load_ps(in+4); before the
> store _mm_store_ps(out, result); which the compiler cannot do itself because
> they may alias.

i see ... however the generated code is the same, when using restricted
pointers to inform the compiler, that there is no aliasing problem


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/38825] missed optimization: register renaming in unrolled loop
  2009-01-13 11:40 [Bug target/38825] New: missed optimization: register renaming in unrolled loop tim at klingt dot org
                   ` (2 preceding siblings ...)
  2009-01-13 15:26 ` tim at klingt dot org
@ 2009-01-13 15:45 ` rguenth at gcc dot gnu dot org
  2009-01-13 16:08 ` tim at klingt dot org
  2009-01-13 16:37 ` rguenth at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-01-13 15:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from rguenth at gcc dot gnu dot org  2009-01-13 15:44 -------
-frename-registers does make a difference for me,

.L2:
        movaps  %xmm0, %xmm2
        movaps  %xmm0, %xmm1
        addps   (%rsi,%rax), %xmm2
        movaps  %xmm2, (%rdi,%rax)
        addps   16(%rsi,%rax), %xmm1
        movaps  %xmm1, 16(%rdi,%rax)
        addq    $32, %rax
        cmpq    %rdx, %rax
        jne     .L2

vs.

.L2:
        movaps  %xmm0, %xmm1
        addps   (%rsi,%rax), %xmm1
        movaps  %xmm1, (%rdi,%rax)
        movaps  %xmm0, %xmm1
        addps   16(%rsi,%rax), %xmm1
        movaps  %xmm1, 16(%rdi,%rax)
        addq    $32, %rax
        cmpq    %rdx, %rax
        jne     .L2

x86_64, -O3 -fschedule-insns [-frename-registers], with restrict added


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/38825] missed optimization: register renaming in unrolled loop
  2009-01-13 11:40 [Bug target/38825] New: missed optimization: register renaming in unrolled loop tim at klingt dot org
                   ` (3 preceding siblings ...)
  2009-01-13 15:45 ` rguenth at gcc dot gnu dot org
@ 2009-01-13 16:08 ` tim at klingt dot org
  2009-01-13 16:37 ` rguenth at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: tim at klingt dot org @ 2009-01-13 16:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from tim at klingt dot org  2009-01-13 16:08 -------
(In reply to comment #4)
> -frename-registers does make a difference for me,

i can reproduce it, however, -frename-registers is supposed to be enabled by
-O3:
tim@thinkpad:~/workspace/nova-server.git$ /usr/local/lib/gcc-snapshot/bin/g++
-Q -O3 --help=optimizer  |grep frename
  -frename-registers                    [enabled]


the resolved aliasing issue, is not taken into account, though:

.L23:
        movaps  %xmm0, %xmm2
        movaps  %xmm0, %xmm1
        addps   (%rsi,%rax), %xmm2
        movaps  %xmm2, (%rdi,%rax)
        addps   16(%rsi,%rax), %xmm1
        movaps  %xmm1, 16(%rdi,%rax)
        addq    $32, %rax
        cmpq    %rdx, %rax
        jne     .L23

vs.

.L19:
        movaps  %xmm0, %xmm2
        movaps  %xmm0, %xmm1
        addps   (%rsi,%rax), %xmm2
        addps   16(%rsi,%rax), %xmm1
        movaps  %xmm2, (%rdi,%rax)
        movaps  %xmm1, 16(%rdi,%rax)
        addq    $32, %rax
        cmpq    %rdx, %rax
        jne     .L19


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/38825] missed optimization: register renaming in unrolled loop
  2009-01-13 11:40 [Bug target/38825] New: missed optimization: register renaming in unrolled loop tim at klingt dot org
                   ` (4 preceding siblings ...)
  2009-01-13 16:08 ` tim at klingt dot org
@ 2009-01-13 16:37 ` rguenth at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-01-13 16:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rguenth at gcc dot gnu dot org  2009-01-13 16:37 -------
Yes, the alias sets are not properly transfered to RTL:

;; MEM[base: out, index: ivtmp.58] = result;

(insn 22 21 0 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:951 (set
(mem:V4SF (plus:DI (reg/v/f:DI 66 [ out ])
                (reg:DI 63 [ ivtmp.58 ])) [2 S16 A128])
        (reg/v:V4SF 64 [ result ])) -1 (nil))

;; result.70 = __builtin_ia32_addps (MEM[base: in, index: ivtmp.58, offset:
16], scalar);

(insn 23 22 24 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:161
(set (reg:V4SF 75)
        (plus:V4SF (reg/v:V4SF 65 [ scalar ])
            (mem:V4SF (plus:DI (plus:DI (reg/v/f:DI 67 [ in ])
                        (reg:DI 63 [ ivtmp.58 ]))
                    (const_int 16 [0x10])) [2 S16 A128]))) -1 (nil))

as you can see both use alias set 2.  But it should be noted that with
TARGET_MEM_REF (the MEM[...] expr) type-based aliasing is hosed (which is
unfortunately what restrict relies on).

Thus, with -fno-ivopts we can see different alias sets:

;; *(__v4sf *) out = result;

(insn 14 13 0 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:951 (set
(mem:V4SF (reg/v/f:DI 62 [ out ]) [6 S16 A128])
        (reg/v:V4SF 60 [ result ])) -1 (nil))

;; result.58 = __builtin_ia32_addps (*(__v4sf *) (in + 16), scalar);

(insn 15 14 16 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:161
(set (reg:V4SF 67)
        (plus:V4SF (reg/v:V4SF 61 [ scalar ])
            (mem:V4SF (plus:DI (reg/v/f:DI 63 [ in ])
                    (const_int 16 [0x10])) [5 S16 A128]))) -1 (nil))

and re-ordering of mems!

.L2:
        movaps  %xmm0, %xmm2
        movaps  %xmm0, %xmm1
        addps   (%rsi), %xmm2
        addps   16(%rsi), %xmm1
        addq    $32, %rsi
        movaps  %xmm2, (%rdi)
        movaps  %xmm1, 16(%rdi)
        addq    $32, %rdi
        subl    $1, %edx
        jne     .L2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-01-13 16:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-13 11:40 [Bug target/38825] New: missed optimization: register renaming in unrolled loop tim at klingt dot org
2009-01-13 15:09 ` [Bug target/38825] " rguenth at gcc dot gnu dot org
2009-01-13 15:16 ` rguenth at gcc dot gnu dot org
2009-01-13 15:26 ` tim at klingt dot org
2009-01-13 15:45 ` rguenth at gcc dot gnu dot org
2009-01-13 16:08 ` tim at klingt dot org
2009-01-13 16:37 ` rguenth at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).