* [PATCH] avoid xmm register in SSE float->int conversions
@ 2004-10-18 14:19 Uros Bizjak
2004-10-18 16:49 ` Jan Hubicka
0 siblings, 1 reply; 3+ messages in thread
From: Uros Bizjak @ 2004-10-18 14:19 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]
Hello!
For the DF->SI testcases like this:
int test1(double a) {
return sin(a);
}
current mainline gcc produces following code for -march=pentium4:
...
fsin
fstpl (%esp)
movsd (%esp), %xmm0
cvttsd2si %xmm0, %eax
...
ret
At least for pentium4 (and perhaps other !TARGET_K8 targets), xmm
register could be avoided, as cvttsd2si can convert directly from memory
to integer register. Attached patch introduces a couple of peephole2
optimizers to get rid of extra movsd.
Unfortunatelly, combine pass can't be used in this case, because all
necessary moves are generated in greg pass _after_ combine.
With attached patch, gcc generates following code:
test1:
subl $12, %esp
fldl 16(%esp)
fsin
fstpl (%esp)
cvttsd2si (%esp), %eax
addl $12, %esp
ret
BTW: A small bug was fixed: *fix_trunchi_1 should have its split
constraint defined.
Bootstrapped on i686-pc-linux-gnu, regtest in progres, c and c++.
2004-10-18 Uros Bizjak <uros@kss-loka.si>
* config/i386/i386.md (fix_truncsfdi_sse, fix_truncdfdi_sse,
fix_truncsfsi_sse, fix_truncdfsi_sse): Add peephole2
optimizers to avoid mem->sse_reg->reg in FP->int
conversions for !TARGET_K8.
(*fix_trunchi_1): Add "&& 1" as split constraint.
Uros.
[-- Attachment #2: ssecvt.diff --]
[-- Type: text/plain, Size: 2333 bytes --]
Index: i386.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.md,v
retrieving revision 1.562
diff -u -r1.562 i386.md
--- i386.md 18 Oct 2004 13:01:31 -0000 1.562
+++ i386.md 18 Oct 2004 13:36:27 -0000
@@ -4181,6 +4181,15 @@
(set_attr "mode" "SF")
(set_attr "athlon_decode" "double,vector")])
+(define_peephole2
+ [(set (match_operand:SF 0 "register_operand" "")
+ (match_operand:SF 1 "memory_operand" ""))
+ (set (match_operand:DI 2 "register_operand" "")
+ (fix:DI (match_dup 0)))]
+ "!TARGET_K8"
+ [(set (match_dup 2) (fix:DI (match_dup 1)))]
+ "")
+
;; Avoid vector decoded form of the instruction.
(define_peephole2
[(match_scratch:SF 2 "x")
@@ -4200,6 +4209,15 @@
(set_attr "mode" "DF")
(set_attr "athlon_decode" "double,vector")])
+(define_peephole2
+ [(set (match_operand:DF 0 "register_operand" "")
+ (match_operand:DF 1 "memory_operand" ""))
+ (set (match_operand:DI 2 "register_operand" "")
+ (fix:DI (match_dup 0)))]
+ "!TARGET_K8"
+ [(set (match_dup 2) (fix:DI (match_dup 1)))]
+ "")
+
;; Avoid vector decoded form of the instruction.
(define_peephole2
[(match_scratch:DF 2 "Y")
@@ -4318,6 +4336,15 @@
(set_attr "mode" "DF")
(set_attr "athlon_decode" "double,vector")])
+(define_peephole2
+ [(set (match_operand:SF 0 "register_operand" "")
+ (match_operand:SF 1 "memory_operand" ""))
+ (set (match_operand:SI 2 "register_operand" "")
+ (fix:SI (match_dup 0)))]
+ "!TARGET_K8"
+ [(set (match_dup 2) (fix:SI (match_dup 1)))]
+ "")
+
;; Avoid vector decoded form of the instruction.
(define_peephole2
[(match_scratch:SF 2 "x")
@@ -4337,6 +4364,15 @@
(set_attr "mode" "DF")
(set_attr "athlon_decode" "double,vector")])
+(define_peephole2
+ [(set (match_operand:DF 0 "register_operand" "")
+ (match_operand:DF 1 "memory_operand" ""))
+ (set (match_operand:SI 2 "register_operand" "")
+ (fix:SI (match_dup 0)))]
+ "!TARGET_K8"
+ [(set (match_dup 2) (fix:SI (match_dup 1)))]
+ "")
+
;; Avoid vector decoded form of the instruction.
(define_peephole2
[(match_scratch:DF 2 "Y")
@@ -4405,7 +4441,7 @@
&& !reload_completed && !reload_in_progress
&& !SSE_FLOAT_MODE_P (GET_MODE (operands[1]))"
"#"
- ""
+ "&& 1"
[(const_int 0)]
{
ix86_optimize_mode_switching = 1;
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] avoid xmm register in SSE float->int conversions
2004-10-18 14:19 [PATCH] avoid xmm register in SSE float->int conversions Uros Bizjak
@ 2004-10-18 16:49 ` Jan Hubicka
2004-10-29 7:55 ` Uros Bizjak
0 siblings, 1 reply; 3+ messages in thread
From: Jan Hubicka @ 2004-10-18 16:49 UTC (permalink / raw)
To: Uros Bizjak; +Cc: gcc-patches
> Hello!
>
> For the DF->SI testcases like this:
>
> int test1(double a) {
> return sin(a);
> }
>
> current mainline gcc produces following code for -march=pentium4:
> ...
> fsin
> fstpl (%esp)
> movsd (%esp), %xmm0
> cvttsd2si %xmm0, %eax
> ...
> ret
>
> At least for pentium4 (and perhaps other !TARGET_K8 targets), xmm
> register could be avoided, as cvttsd2si can convert directly from memory
> to integer register. Attached patch introduces a couple of peephole2
> optimizers to get rid of extra movsd.
> Unfortunatelly, combine pass can't be used in this case, because all
> necessary moves are generated in greg pass _after_ combine.
To make the peepholes safe, you need to verify that xmm0 is dead after
it is stored to memoery again. However it is definitly better to teach
reload to not do this sillyness, but it might be quite dificult...
Honza
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] avoid xmm register in SSE float->int conversions
2004-10-18 16:49 ` Jan Hubicka
@ 2004-10-29 7:55 ` Uros Bizjak
0 siblings, 0 replies; 3+ messages in thread
From: Uros Bizjak @ 2004-10-29 7:55 UTC (permalink / raw)
To: Jan Hubicka; +Cc: gcc-patches
Jan Hubicka wrote:
>>For the DF->SI testcases like this:
>>
>>int test1(double a) {
>> return sin(a);
>>}
>>
>>current mainline gcc produces following code for -march=pentium4:
>> ...
>> fsin
>> fstpl (%esp)
>> movsd (%esp), %xmm0
>> cvttsd2si %xmm0, %eax
>> ...
>> ret
>>
>>At least for pentium4 (and perhaps other !TARGET_K8 targets), xmm
>>register could be avoided, as cvttsd2si can convert directly from memory
>>to integer register. Attached patch introduces a couple of peephole2
>>optimizers to get rid of extra movsd.
>>Unfortunatelly, combine pass can't be used in this case, because all
>>necessary moves are generated in greg pass _after_ combine.
>>
>>
>
>To make the peepholes safe, you need to verify that xmm0 is dead after
>it is stored to memoery again. However it is definitly better to teach
>reload to not do this sillyness, but it might be quite dificult...
>
>
>
Hm, I don't see where xmm0 is stored to memory... Perhaps you thought
that it is necessary to check if xmm0 is dead after it is converted to
register? Peephole is transforming sequence:
>>>
(insn 34 31 35 0 (set (mem:DF (plus:SI (reg/f:SI 6 bp)
(const_int -8 [0xfffffff8])) [0 S8 A8])
(reg:DF 8 st [61])) 67 {*movdf_nointeger} (nil)
(expr_list:REG_DEAD (reg:DF 8 st [61])
(nil)))
(insn 35 34 13 0 (set (reg:DF 21 xmm0)
(mem:DF (plus:SI (reg/f:SI 6 bp)
(const_int -8 [0xfffffff8])) [0 S8 A8])) 67
{*movdf_nointeger} (nil)
(nil))
(insn:HI 13 35 17 0 (set (reg:SI 0 ax [60])
(fix:SI (reg:DF 21 xmm0))) 121 {fix_truncdfsi_sse}
(insn_list:REG_DEP_TRUE 12 (nil))
(expr_list:REG_DEAD (reg:DF 21 xmm0)
(nil)))
>>>
into
>>>
(insn 34 31 44 0 (set (mem:DF (plus:SI (reg/f:SI 6 bp)
(const_int -8 [0xfffffff8])) [0 S8 A8])
(reg:DF 8 st [61])) 67 {*movdf_nointeger} (nil)
(expr_list:REG_DEAD (reg:DF 8 st [61])
(nil)))
(insn 44 34 17 0 (set (reg:SI 0 ax [60])
(fix:SI (mem:DF (plus:SI (reg/f:SI 6 bp)
(const_int -8 [0xfffffff8])) [0 S8 A8]))) -1 (nil)
(nil))
>>>
It would be a trivial task to add a check for dead register into
peephole2 pattern constraint.
However, it looks that this would be only a surface fix for deeper
problems. fix_truncdfsi_sse pattern is defined as:
(define_insn "fix_truncdfsi_sse"
[(set (match_operand:SI 0 "register_operand" "=r,r")
(fix:SI (match_operand:DF 1 "nonimmediate_operand" "Y,Ym")))]
and its (operand 1) constraints show that this insn accepts memory
inputs. Changing input constraint from "Y,Ym" to "m,m" shows the problem:
ccc.c:3: error: unable to generate reloads for:
(insn:HI 13 31 17 0 (set (reg:SI 0 ax [60])
(fix:SI (reg:DF 8 st [61]))) 115 {fix_truncdfsi_sse}
(insn_list:REG_DEP_TRUE 12 (nil))
(expr_list:REG_DEAD (reg:DF 8 st [61])
(nil)))
ccc.c:3: internal compiler error: in find_reloads, at reload.c:3676
Reload is unable to transfer operands via memory. Ideally, it should
allocate a local stack slot and move/copy FP register to this slot. This
would satisfy fix_truncdfsi_sse constranints. Indeed, looking in i386.md
this comment (near disabled "*ficom_1" pattern) has been found:
;; %%% Play games with accepting gp registers, as otherwise we have to
;; force them to memory during rtl generation, which is no good. We
;; can get rid of this once we teach reload to do memory input reloads
;; via pushes.
It is unclear to me, if this is a design choice, because memory input
reloads would be benefical for number of instructions in i386's case.
Consider for example "*fix_trunc?i_1". Reload coud allocate a slot for
its output automatically, without using "assign_386_stack_local (DImode,
0);".
Uros.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-10-29 6:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-18 14:19 [PATCH] avoid xmm register in SSE float->int conversions Uros Bizjak
2004-10-18 16:49 ` Jan Hubicka
2004-10-29 7:55 ` Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).