public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] avoid xmm register in SSE float->int conversions
@ 2004-10-18 14:19 Uros Bizjak
  2004-10-18 16:49 ` Jan Hubicka
  0 siblings, 1 reply; 3+ messages in thread
From: Uros Bizjak @ 2004-10-18 14:19 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]

Hello!

For the DF->SI testcases like this:

int test1(double a) {
        return sin(a);
}

current mainline gcc produces following code for -march=pentium4:
        ...
        fsin
        fstpl   (%esp)
        movsd   (%esp), %xmm0
        cvttsd2si       %xmm0, %eax
        ...
        ret

At least for pentium4 (and perhaps other !TARGET_K8 targets), xmm 
register could be avoided, as cvttsd2si can convert directly from memory 
to integer register. Attached patch introduces a couple of peephole2 
optimizers to get rid of extra movsd.
Unfortunatelly, combine pass can't be used in this case, because all 
necessary moves are generated in greg pass _after_ combine.

With attached patch, gcc generates following code:
test1:
        subl    $12, %esp
        fldl    16(%esp)
        fsin
        fstpl   (%esp)
        cvttsd2si       (%esp), %eax
        addl    $12, %esp
        ret

BTW: A small bug was fixed: *fix_trunchi_1 should have its split 
constraint defined.

Bootstrapped on i686-pc-linux-gnu, regtest in progres, c and c++.

2004-10-18  Uros Bizjak  <uros@kss-loka.si>

    * config/i386/i386.md (fix_truncsfdi_sse, fix_truncdfdi_sse,
    fix_truncsfsi_sse, fix_truncdfsi_sse): Add peephole2
    optimizers to avoid mem->sse_reg->reg in FP->int
    conversions for !TARGET_K8.
    (*fix_trunchi_1): Add "&& 1" as split constraint.

Uros.

[-- Attachment #2: ssecvt.diff --]
[-- Type: text/plain, Size: 2333 bytes --]

Index: i386.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.md,v
retrieving revision 1.562
diff -u -r1.562 i386.md
--- i386.md	18 Oct 2004 13:01:31 -0000	1.562
+++ i386.md	18 Oct 2004 13:36:27 -0000
@@ -4181,6 +4181,15 @@
    (set_attr "mode" "SF")
    (set_attr "athlon_decode" "double,vector")])
 
+(define_peephole2
+  [(set (match_operand:SF 0 "register_operand" "")
+	(match_operand:SF 1 "memory_operand" ""))
+   (set (match_operand:DI 2 "register_operand" "")
+	(fix:DI (match_dup 0)))]
+  "!TARGET_K8"
+  [(set (match_dup 2) (fix:DI (match_dup 1)))]
+  "")
+
 ;; Avoid vector decoded form of the instruction.
 (define_peephole2
   [(match_scratch:SF 2 "x")
@@ -4200,6 +4209,15 @@
    (set_attr "mode" "DF")
    (set_attr "athlon_decode" "double,vector")])
 
+(define_peephole2
+  [(set (match_operand:DF 0 "register_operand" "")
+	(match_operand:DF 1 "memory_operand" ""))
+   (set (match_operand:DI 2 "register_operand" "")
+	(fix:DI (match_dup 0)))]
+  "!TARGET_K8"
+  [(set (match_dup 2) (fix:DI (match_dup 1)))]
+  "")
+
 ;; Avoid vector decoded form of the instruction.
 (define_peephole2
   [(match_scratch:DF 2 "Y")
@@ -4318,6 +4336,15 @@
    (set_attr "mode" "DF")
    (set_attr "athlon_decode" "double,vector")])
 
+(define_peephole2
+  [(set (match_operand:SF 0 "register_operand" "")
+	(match_operand:SF 1 "memory_operand" ""))
+   (set (match_operand:SI 2 "register_operand" "")
+	(fix:SI (match_dup 0)))]
+  "!TARGET_K8"
+  [(set (match_dup 2) (fix:SI (match_dup 1)))]
+  "")
+
 ;; Avoid vector decoded form of the instruction.
 (define_peephole2
   [(match_scratch:SF 2 "x")
@@ -4337,6 +4364,15 @@
    (set_attr "mode" "DF")
    (set_attr "athlon_decode" "double,vector")])
 
+(define_peephole2
+  [(set (match_operand:DF 0 "register_operand" "")
+	(match_operand:DF 1 "memory_operand" ""))
+   (set (match_operand:SI 2 "register_operand" "")
+	(fix:SI (match_dup 0)))]
+  "!TARGET_K8"
+  [(set (match_dup 2) (fix:SI (match_dup 1)))]
+  "")
+
 ;; Avoid vector decoded form of the instruction.
 (define_peephole2
   [(match_scratch:DF 2 "Y")
@@ -4405,7 +4441,7 @@
    && !reload_completed && !reload_in_progress
    && !SSE_FLOAT_MODE_P (GET_MODE (operands[1]))"
   "#"
-  ""
+  "&& 1"
   [(const_int 0)]
 {
   ix86_optimize_mode_switching = 1;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] avoid xmm register in SSE float->int conversions
  2004-10-18 14:19 [PATCH] avoid xmm register in SSE float->int conversions Uros Bizjak
@ 2004-10-18 16:49 ` Jan Hubicka
  2004-10-29  7:55   ` Uros Bizjak
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Hubicka @ 2004-10-18 16:49 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

> Hello!
> 
> For the DF->SI testcases like this:
> 
> int test1(double a) {
>        return sin(a);
> }
> 
> current mainline gcc produces following code for -march=pentium4:
>        ...
>        fsin
>        fstpl   (%esp)
>        movsd   (%esp), %xmm0
>        cvttsd2si       %xmm0, %eax
>        ...
>        ret
> 
> At least for pentium4 (and perhaps other !TARGET_K8 targets), xmm 
> register could be avoided, as cvttsd2si can convert directly from memory 
> to integer register. Attached patch introduces a couple of peephole2 
> optimizers to get rid of extra movsd.
> Unfortunatelly, combine pass can't be used in this case, because all 
> necessary moves are generated in greg pass _after_ combine.

To make the peepholes safe, you need to verify that xmm0 is dead after
it is stored to memoery again.  However it is definitly better to teach
reload to not do this sillyness, but it might be quite dificult...

Honza

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] avoid xmm register in SSE float->int conversions
  2004-10-18 16:49 ` Jan Hubicka
@ 2004-10-29  7:55   ` Uros Bizjak
  0 siblings, 0 replies; 3+ messages in thread
From: Uros Bizjak @ 2004-10-29  7:55 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

Jan Hubicka wrote:

>>For the DF->SI testcases like this:
>>
>>int test1(double a) {
>>       return sin(a);
>>}
>>
>>current mainline gcc produces following code for -march=pentium4:
>>       ...
>>       fsin
>>       fstpl   (%esp)
>>       movsd   (%esp), %xmm0
>>       cvttsd2si       %xmm0, %eax
>>       ...
>>       ret
>>
>>At least for pentium4 (and perhaps other !TARGET_K8 targets), xmm 
>>register could be avoided, as cvttsd2si can convert directly from memory 
>>to integer register. Attached patch introduces a couple of peephole2 
>>optimizers to get rid of extra movsd.
>>Unfortunatelly, combine pass can't be used in this case, because all 
>>necessary moves are generated in greg pass _after_ combine.
>>    
>>
>
>To make the peepholes safe, you need to verify that xmm0 is dead after
>it is stored to memoery again.  However it is definitly better to teach
>reload to not do this sillyness, but it might be quite dificult...
>
>  
>
Hm, I don't see where xmm0 is stored to memory... Perhaps you thought 
that it is necessary to check if xmm0 is dead after it is converted to 
register? Peephole is transforming sequence:

 >>>
(insn 34 31 35 0 (set (mem:DF (plus:SI (reg/f:SI 6 bp)
                (const_int -8 [0xfffffff8])) [0 S8 A8])
        (reg:DF 8 st [61])) 67 {*movdf_nointeger} (nil)
    (expr_list:REG_DEAD (reg:DF 8 st [61])
        (nil)))

(insn 35 34 13 0 (set (reg:DF 21 xmm0)
        (mem:DF (plus:SI (reg/f:SI 6 bp)
                (const_int -8 [0xfffffff8])) [0 S8 A8])) 67 
{*movdf_nointeger} (nil)
    (nil))

(insn:HI 13 35 17 0 (set (reg:SI 0 ax [60])
        (fix:SI (reg:DF 21 xmm0))) 121 {fix_truncdfsi_sse} 
(insn_list:REG_DEP_TRUE 12 (nil))
    (expr_list:REG_DEAD (reg:DF 21 xmm0)
        (nil)))

 >>>
into
 >>>

(insn 34 31 44 0 (set (mem:DF (plus:SI (reg/f:SI 6 bp)
                (const_int -8 [0xfffffff8])) [0 S8 A8])
        (reg:DF 8 st [61])) 67 {*movdf_nointeger} (nil)
    (expr_list:REG_DEAD (reg:DF 8 st [61])
        (nil)))

(insn 44 34 17 0 (set (reg:SI 0 ax [60])
        (fix:SI (mem:DF (plus:SI (reg/f:SI 6 bp)
                    (const_int -8 [0xfffffff8])) [0 S8 A8]))) -1 (nil)
    (nil))
 >>>

It would be a trivial task to add a check for dead register into 
peephole2 pattern constraint.

However, it looks that this would be only a surface fix for deeper 
problems. fix_truncdfsi_sse pattern is defined as:

(define_insn "fix_truncdfsi_sse"
  [(set (match_operand:SI 0 "register_operand" "=r,r")
    (fix:SI (match_operand:DF 1 "nonimmediate_operand" "Y,Ym")))]

and its (operand 1) constraints show that this insn accepts memory 
inputs.  Changing input constraint from "Y,Ym" to "m,m" shows  the problem:

ccc.c:3: error: unable to generate reloads for:
(insn:HI 13 31 17 0 (set (reg:SI 0 ax [60])
        (fix:SI (reg:DF 8 st [61]))) 115 {fix_truncdfsi_sse} 
(insn_list:REG_DEP_TRUE 12 (nil))
    (expr_list:REG_DEAD (reg:DF 8 st [61])
        (nil)))
ccc.c:3: internal compiler error: in find_reloads, at reload.c:3676

Reload is unable to transfer operands via memory. Ideally, it should 
allocate a local stack slot and move/copy FP register to this slot. This 
would satisfy fix_truncdfsi_sse constranints. Indeed, looking in i386.md 
this comment (near disabled "*ficom_1" pattern) has been found:

;; %%% Play games with accepting gp registers, as otherwise we have to
;; force them to memory during rtl generation, which is no good.  We
;; can get rid of this once we teach reload to do memory input reloads
;; via pushes.

It is unclear to me, if this is a design choice, because memory input 
reloads would be benefical for number of instructions in i386's case. 
Consider for example "*fix_trunc?i_1". Reload coud allocate a slot for 
its output automatically, without using "assign_386_stack_local (DImode, 
0);".

Uros.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-10-29  6:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-18 14:19 [PATCH] avoid xmm register in SSE float->int conversions Uros Bizjak
2004-10-18 16:49 ` Jan Hubicka
2004-10-29  7:55   ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).