public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Uros Bizjak <uros@kss-loka.si>
To: Jan Hubicka <hubicka@ucw.cz>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] avoid xmm register in SSE float->int conversions
Date: Fri, 29 Oct 2004 07:55:00 -0000	[thread overview]
Message-ID: <4181E2A3.70108@kss-loka.si> (raw)
In-Reply-To: <20041018163026.GA5128@atrey.karlin.mff.cuni.cz>

Jan Hubicka wrote:

>>For the DF->SI testcases like this:
>>
>>int test1(double a) {
>>       return sin(a);
>>}
>>
>>current mainline gcc produces following code for -march=pentium4:
>>       ...
>>       fsin
>>       fstpl   (%esp)
>>       movsd   (%esp), %xmm0
>>       cvttsd2si       %xmm0, %eax
>>       ...
>>       ret
>>
>>At least for pentium4 (and perhaps other !TARGET_K8 targets), xmm 
>>register could be avoided, as cvttsd2si can convert directly from memory 
>>to integer register. Attached patch introduces a couple of peephole2 
>>optimizers to get rid of extra movsd.
>>Unfortunatelly, combine pass can't be used in this case, because all 
>>necessary moves are generated in greg pass _after_ combine.
>>    
>>
>
>To make the peepholes safe, you need to verify that xmm0 is dead after
>it is stored to memoery again.  However it is definitly better to teach
>reload to not do this sillyness, but it might be quite dificult...
>
>  
>
Hm, I don't see where xmm0 is stored to memory... Perhaps you thought 
that it is necessary to check if xmm0 is dead after it is converted to 
register? Peephole is transforming sequence:

 >>>
(insn 34 31 35 0 (set (mem:DF (plus:SI (reg/f:SI 6 bp)
                (const_int -8 [0xfffffff8])) [0 S8 A8])
        (reg:DF 8 st [61])) 67 {*movdf_nointeger} (nil)
    (expr_list:REG_DEAD (reg:DF 8 st [61])
        (nil)))

(insn 35 34 13 0 (set (reg:DF 21 xmm0)
        (mem:DF (plus:SI (reg/f:SI 6 bp)
                (const_int -8 [0xfffffff8])) [0 S8 A8])) 67 
{*movdf_nointeger} (nil)
    (nil))

(insn:HI 13 35 17 0 (set (reg:SI 0 ax [60])
        (fix:SI (reg:DF 21 xmm0))) 121 {fix_truncdfsi_sse} 
(insn_list:REG_DEP_TRUE 12 (nil))
    (expr_list:REG_DEAD (reg:DF 21 xmm0)
        (nil)))

 >>>
into
 >>>

(insn 34 31 44 0 (set (mem:DF (plus:SI (reg/f:SI 6 bp)
                (const_int -8 [0xfffffff8])) [0 S8 A8])
        (reg:DF 8 st [61])) 67 {*movdf_nointeger} (nil)
    (expr_list:REG_DEAD (reg:DF 8 st [61])
        (nil)))

(insn 44 34 17 0 (set (reg:SI 0 ax [60])
        (fix:SI (mem:DF (plus:SI (reg/f:SI 6 bp)
                    (const_int -8 [0xfffffff8])) [0 S8 A8]))) -1 (nil)
    (nil))
 >>>

It would be a trivial task to add a check for dead register into 
peephole2 pattern constraint.

However, it looks that this would be only a surface fix for deeper 
problems. fix_truncdfsi_sse pattern is defined as:

(define_insn "fix_truncdfsi_sse"
  [(set (match_operand:SI 0 "register_operand" "=r,r")
    (fix:SI (match_operand:DF 1 "nonimmediate_operand" "Y,Ym")))]

and its (operand 1) constraints show that this insn accepts memory 
inputs.  Changing input constraint from "Y,Ym" to "m,m" shows  the problem:

ccc.c:3: error: unable to generate reloads for:
(insn:HI 13 31 17 0 (set (reg:SI 0 ax [60])
        (fix:SI (reg:DF 8 st [61]))) 115 {fix_truncdfsi_sse} 
(insn_list:REG_DEP_TRUE 12 (nil))
    (expr_list:REG_DEAD (reg:DF 8 st [61])
        (nil)))
ccc.c:3: internal compiler error: in find_reloads, at reload.c:3676

Reload is unable to transfer operands via memory. Ideally, it should 
allocate a local stack slot and move/copy FP register to this slot. This 
would satisfy fix_truncdfsi_sse constranints. Indeed, looking in i386.md 
this comment (near disabled "*ficom_1" pattern) has been found:

;; %%% Play games with accepting gp registers, as otherwise we have to
;; force them to memory during rtl generation, which is no good.  We
;; can get rid of this once we teach reload to do memory input reloads
;; via pushes.

It is unclear to me, if this is a design choice, because memory input 
reloads would be benefical for number of instructions in i386's case. 
Consider for example "*fix_trunc?i_1". Reload coud allocate a slot for 
its output automatically, without using "assign_386_stack_local (DImode, 
0);".

Uros.

      reply	other threads:[~2004-10-29  6:32 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-18 14:19 Uros Bizjak
2004-10-18 16:49 ` Jan Hubicka
2004-10-29  7:55   ` Uros Bizjak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4181E2A3.70108@kss-loka.si \
    --to=uros@kss-loka.si \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).