public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: i387 control word register definition is missing
@ 2005-05-25 12:48 Uros Bizjak
  2005-05-25 22:03 ` Jan Hubicka
  0 siblings, 1 reply; 5+ messages in thread
From: Uros Bizjak @ 2005-05-25 12:48 UTC (permalink / raw)
  To: Michael Meissner; +Cc: gcc

Hello!

> Well you really want both the fpcr and the mxcsr registers, since the fpcr
> only controls the x87 and the mxcsr controls the xmm registers.  Note, in
> adding these registers, you are going to have to go through all of the floating
> point patterns to add (use:HI FPCR_REG) and (use:SI MXCSR_REG) to each and
> every pattern so that the optimizer can be told not to move a floating point
> operation past the setting of the control word.

  I think that (use:...) clauses are needed only for (float)->(int) patterns
(fix_trunc.. & co.). For i386, we could calculate new mode word in advance (this
calculation is inserted by LCM), and fldcw insn is inserted just before
fist/frndint.

(define_insn_and_split "fix_trunc<mode>_i387_2"
  [(set (match_operand:X87MODEI12 0 "memory_operand" "=m")
	(fix:X87MODEI12 (match_operand 1 "register_operand" "f")))
   (use (match_operand:HI 2 "memory_operand" "m"))
   (use (match_operand:HI 3 "memory_operand" "m"))]
  "TARGET_80387 && !TARGET_FISTTP
   && FLOAT_MODE_P (GET_MODE (operands[1]))
   && !SSE_FLOAT_MODE_P (GET_MODE (operands[1]))"
  "#"
  "reload_completed"
  [(set (reg:HI FPCR_REG)
	(unspec:HI [(match_dup 3)] UNSPEC_FLDCW))
   (parallel [(set (match_dup 0) (fix:X87MODEI12 (match_dup 1)))
	      (use (reg:HI FPCR_REG))])]
  ""
  [(set_attr "type" "fistp")
   (set_attr "i387_cw" "trunc")
   (set_attr "mode" "<MODE>")])


(define_insn "*fix_trunc<mode>_i387"
  [(set (match_operand:X87MODEI12 0 "memory_operand" "=m")
	(fix:X87MODEI12 (match_operand 1 "register_operand" "f")))
   (use (reg:HI FPCR_REG))]
  "TARGET_80387 && !TARGET_FISTTP
   && FLOAT_MODE_P (GET_MODE (operands[1]))
   && !SSE_FLOAT_MODE_P (GET_MODE (operands[1]))"
  "* return output_fix_trunc (insn, operands, 0);"
  [(set_attr "type" "fistp")
   (set_attr "i387_cw" "trunc")
   (set_attr "mode" "<MODE>")])

I'm trying to use MODE_ENTRY and MODE_EXIT macros to insert mode calculations in
proper places. Currently, I have a somehow working prototype that switches
between 2 modes: MODE_UNINITIALIZED, MODE_TRUNC (and MODE_ANY). The trick here
is, that MODE_ENTRY and MODE_EXIT are defined to MODE_UNINITIALIZED. Secondly,
every asm statement and call insn switches to MODE_UNINITIALIZED, and when mode
is switched _from_ MODE_TRUNC _to_ MODE_UNINITIALIZED before these two
statements (or in exit BBs), an UNSPEC_VOLATILE type fldcw is emitted (again via
LCM) that switches fpu to saved mode. [UNSPEC_VOLATILE is needed to prevent
optimizers to remove this pattern]. So, 2 fldcw patterns are defined:

(define_insn "x86_fldcw_1"
  [(set (reg:HI FPCR_REG)
	(unspec:HI [(match_operand:HI 0 "memory_operand" "m")]
		     UNSPEC_FLDCW))]
  "TARGET_80387"
  "fldcw\t%0"
  [(set_attr "length" "2")
   (set_attr "mode" "HI")
   (set_attr "unit" "i387")
   (set_attr "athlon_decode" "vector")])

(define_insn "x86_fldcw_2"
  [(set (reg:HI FPCR_REG)
	(unspec_volatile:HI [(match_operand:HI 0 "memory_operand" "m")]
			      UNSPECV_FLDCW))]
  "TARGET_80387"
  "fldcw\t%0"
  [(set_attr "length" "2")
   (set_attr "mode" "HI")
   (set_attr "unit" "i387")
   (set_attr "athlon_decode" "vector")])

By using this approach, testcase:

int test (int *a, double *x) {
        int i;

        for (i = 10; i; i--) {
             a[i] = x[i];
        }

        return 0;
}

is compiled (with -O2 -fomit-frame-pointer -fgcse-after-reload) into:

test:
        pushl  %ebx
        xorl %edx, %edx
        subl $4, %esp
        fnstcw 2(%esp)         <- store current cw
        movl 12(%esp), %ebx
        movl 16(%esp), %ecx
        movzwl 2(%esp), %eax
        orw  $3072, %ax
        movw %ax, (%esp)       <- store new cw
        .p2align 4,,15
.L2:
        fldcw  (%esp)          <- hello? gcse-after-reload?
        fldl 80(%ecx,%edx,8)
        fistpl 40(%ebx,%edx,4)
        decl %edx
        cmpl $-10, %edx
        jne  .L2
        fldcw  2(%esp)         <- volatile fldcw in exit block (load stored cw)
        xorl %eax, %eax
        popl %edx
        popl %ebx
        ret

Another testcase, involving call:

extern double xxxx(int a);

int test (double a) {
        return xxxx (a);
}

is compiled into:

test:
        subl $12, %esp
        fnstcw 10(%esp)        <- store current control word
        fldl 16(%esp)
        movzwl 10(%esp), %eax
        orw  $3072, %ax
        movw %ax, 8(%esp)
        fldcw  8(%esp)         <- switch fpu to new mode
        fistpl (%esp)          <- make conversion
        fldcw  10(%esp)        <- volatile fldcw before call (load stored cw)
        call xxxx
        fnstcw 10(%esp)        <- rewrite stored control word after call
        movzwl 10(%esp), %eax
        orw  $3072, %ax
        movw %ax, 8(%esp)
        fldcw  8(%esp)         <- load new
        fistpl 4(%esp)         <- make conversion
        movl 4(%esp), %eax
        fldcw  10(%esp)        <- volatile fldcw in exit block (load stored cw)
        addl $12, %esp
        ret

Because ABI specifies that control word should be restored to saved mode, we
restore saved cw before call. After call, new control word is saved again -
because xxxxx could be cw-setting function and new cw shouldn't be rewritten by
saved cw at the beginning of the function.

Unfortunatelly, in first testcase, fldcw is not moved out of the loop, because
fix_trunc<mode>_i387_2 is splitted after gcse-after-reload pass (Is this
intentional for gcse-after-reload pass?)

Uros.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: i387 control word register definition is missing
  2005-05-25 12:48 i387 control word register definition is missing Uros Bizjak
@ 2005-05-25 22:03 ` Jan Hubicka
  2005-05-26 13:14   ` Uros Bizjak
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Hubicka @ 2005-05-25 22:03 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Michael Meissner, gcc

> Hello!
> 
> > Well you really want both the fpcr and the mxcsr registers, since the fpcr
> > only controls the x87 and the mxcsr controls the xmm registers.  Note, in
> > adding these registers, you are going to have to go through all of the floating
> > point patterns to add (use:HI FPCR_REG) and (use:SI MXCSR_REG) to each and
> > every pattern so that the optimizer can be told not to move a floating point
> > operation past the setting of the control word.
> 
>   I think that (use:...) clauses are needed only for (float)->(int) patterns

If you make FPCTR/MXCSR real registers, you will need to add use to all
the arithmetic and move pattern that would consume quite some memory and
confuse optimizers.  I think you can get better around simply using volatile
unspecs inserted by LCM pass  (this would limit scheduling, but I don't
think it is that big deal)

> (fix_trunc.. & co.). For i386, we could calculate new mode word in advance (this
> calculation is inserted by LCM), and fldcw insn is inserted just before
> fist/frndint.
> 
> (define_insn_and_split "fix_trunc<mode>_i387_2"
>   [(set (match_operand:X87MODEI12 0 "memory_operand" "=m")
> 	(fix:X87MODEI12 (match_operand 1 "register_operand" "f")))
>    (use (match_operand:HI 2 "memory_operand" "m"))
>    (use (match_operand:HI 3 "memory_operand" "m"))]
>   "TARGET_80387 && !TARGET_FISTTP
>    && FLOAT_MODE_P (GET_MODE (operands[1]))
>    && !SSE_FLOAT_MODE_P (GET_MODE (operands[1]))"
>   "#"
>   "reload_completed"
>   [(set (reg:HI FPCR_REG)
> 	(unspec:HI [(match_dup 3)] UNSPEC_FLDCW))
>    (parallel [(set (match_dup 0) (fix:X87MODEI12 (match_dup 1)))
> 	      (use (reg:HI FPCR_REG))])]
>   ""
>   [(set_attr "type" "fistp")
>    (set_attr "i387_cw" "trunc")
>    (set_attr "mode" "<MODE>")])
> 
> 
> (define_insn "*fix_trunc<mode>_i387"
>   [(set (match_operand:X87MODEI12 0 "memory_operand" "=m")
> 	(fix:X87MODEI12 (match_operand 1 "register_operand" "f")))
>    (use (reg:HI FPCR_REG))]
>   "TARGET_80387 && !TARGET_FISTTP
>    && FLOAT_MODE_P (GET_MODE (operands[1]))
>    && !SSE_FLOAT_MODE_P (GET_MODE (operands[1]))"
>   "* return output_fix_trunc (insn, operands, 0);"
>   [(set_attr "type" "fistp")
>    (set_attr "i387_cw" "trunc")
>    (set_attr "mode" "<MODE>")])
> 
> I'm trying to use MODE_ENTRY and MODE_EXIT macros to insert mode calculations in

My main motivation for stopping on this point was that reload might
insert new fld/fst instructions in the places where control word is
changes resulting in wrong rounding.  it seems to me that we would have
to make the second LCM pass happen post reloading, that is definitly
doable, just I never got across doing that.

> proper places. Currently, I have a somehow working prototype that switches
> between 2 modes: MODE_UNINITIALIZED, MODE_TRUNC (and MODE_ANY). The trick here
> is, that MODE_ENTRY and MODE_EXIT are defined to MODE_UNINITIALIZED. Secondly,
> every asm statement and call insn switches to MODE_UNINITIALIZED, and when mode
> is switched _from_ MODE_TRUNC _to_ MODE_UNINITIALIZED before these two
> statements (or in exit BBs), an UNSPEC_VOLATILE type fldcw is emitted (again via
> LCM) that switches fpu to saved mode. [UNSPEC_VOLATILE is needed to prevent
> optimizers to remove this pattern]. So, 2 fldcw patterns are defined:

If we use the second LCM pass and we make it to insert code as late as
possible, it seems to be safe to me to just have MODE_<possible values
of CW> and MODE_UNINITIALIZED and insert loads accordingly belivin that
the first LCM pass laredy inserted the computations on correct points.

> 
> (define_insn "x86_fldcw_1"
>   [(set (reg:HI FPCR_REG)
> 	(unspec:HI [(match_operand:HI 0 "memory_operand" "m")]
> 		     UNSPEC_FLDCW))]
>   "TARGET_80387"
>   "fldcw\t%0"
>   [(set_attr "length" "2")
>    (set_attr "mode" "HI")
>    (set_attr "unit" "i387")
>    (set_attr "athlon_decode" "vector")])
> 
> (define_insn "x86_fldcw_2"
>   [(set (reg:HI FPCR_REG)
> 	(unspec_volatile:HI [(match_operand:HI 0 "memory_operand" "m")]
> 			      UNSPECV_FLDCW))]
>   "TARGET_80387"
>   "fldcw\t%0"
>   [(set_attr "length" "2")
>    (set_attr "mode" "HI")
>    (set_attr "unit" "i387")
>    (set_attr "athlon_decode" "vector")])
> 
> By using this approach, testcase:
> 
> int test (int *a, double *x) {
>         int i;
> 
>         for (i = 10; i; i--) {
>              a[i] = x[i];
>         }
> 
>         return 0;
> }
> 
> is compiled (with -O2 -fomit-frame-pointer -fgcse-after-reload) into:
> 
> test:
>         pushl  %ebx
>         xorl %edx, %edx
>         subl $4, %esp
>         fnstcw 2(%esp)         <- store current cw
>         movl 12(%esp), %ebx
>         movl 16(%esp), %ecx
>         movzwl 2(%esp), %eax
>         orw  $3072, %ax
>         movw %ax, (%esp)       <- store new cw
>         .p2align 4,,15
> .L2:
>         fldcw  (%esp)          <- hello? gcse-after-reload?
>         fldl 80(%ecx,%edx,8)
>         fistpl 40(%ebx,%edx,4)
>         decl %edx
>         cmpl $-10, %edx
>         jne  .L2
>         fldcw  2(%esp)         <- volatile fldcw in exit block (load stored cw)
>         xorl %eax, %eax
>         popl %edx
>         popl %ebx
>         ret
> 
> Another testcase, involving call:
> 
> extern double xxxx(int a);
> 
> int test (double a) {
>         return xxxx (a);
> }
> 
> is compiled into:
> 
> test:
>         subl $12, %esp
>         fnstcw 10(%esp)        <- store current control word
>         fldl 16(%esp)
>         movzwl 10(%esp), %eax
>         orw  $3072, %ax
>         movw %ax, 8(%esp)
>         fldcw  8(%esp)         <- switch fpu to new mode
>         fistpl (%esp)          <- make conversion
>         fldcw  10(%esp)        <- volatile fldcw before call (load stored cw)
>         call xxxx
>         fnstcw 10(%esp)        <- rewrite stored control word after call
>         movzwl 10(%esp), %eax
>         orw  $3072, %ax
>         movw %ax, 8(%esp)
>         fldcw  8(%esp)         <- load new
>         fistpl 4(%esp)         <- make conversion
>         movl 4(%esp), %eax
>         fldcw  10(%esp)        <- volatile fldcw in exit block (load stored cw)
>         addl $12, %esp
>         ret
> 
> Because ABI specifies that control word should be restored to saved mode, we
> restore saved cw before call. After call, new control word is saved again -
> because xxxxx could be cw-setting function and new cw shouldn't be rewritten by
> saved cw at the beginning of the function.
> 
> Unfortunatelly, in first testcase, fldcw is not moved out of the loop, because
> fix_trunc<mode>_i387_2 is splitted after gcse-after-reload pass (Is this
> intentional for gcse-after-reload pass?)

It is intentional for reload pass.  I guess gcse might be run after
splitting, but not sure what the interferences are.

Honza
> 
> Uros.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: i387 control word register definition is missing
  2005-05-25 22:03 ` Jan Hubicka
@ 2005-05-26 13:14   ` Uros Bizjak
  0 siblings, 0 replies; 5+ messages in thread
From: Uros Bizjak @ 2005-05-26 13:14 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

Quoting Jan Hubicka <hubicka@ucw.cz>:

> If you make FPCTR/MXCSR real registers, you will need to add use to all
> the arithmetic and move pattern that would consume quite some memory and
> confuse optimizers.  I think you can get better around simply using volatile
> unspecs inserted by LCM pass  (this would limit scheduling, but I don't
> think it is that big deal)

  Ouch... I wrongly assumed that rouding bits affect only (int)->(float)
patterns - thanks for clearing this to me! (Perhaps adding a "nearest" i387_cw
attribute to arithmetic/move patterns could be used to switch back to default
rounding?)

> > Unfortunatelly, in first testcase, fldcw is not moved out of the loop,
> > because
> > fix_trunc<mode>_i387_2 is splitted after gcse-after-reload pass (Is this
> > intentional for gcse-after-reload pass?)
> 
> It is intentional for reload pass.  I guess gcse might be run after
> splitting, but not sure what the interferences are.

  I have added split_all_insns call before gcse_after_reload_main in passes.c.
To my suprise, it didn't break anything, but it also didn't get fldcw out of the
loop.

Uros.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: i387 control word register definition is missing
  2005-05-23 15:20 Uros Bizjak
@ 2005-05-24 16:18 ` Michael Meissner
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Meissner @ 2005-05-24 16:18 UTC (permalink / raw)
  To: gcc

On Mon, May 23, 2005 at 10:25:26AM +0200, Uros Bizjak wrote:
> Hello!
> 
> It looks that i387 control word register definition is missing from register
> definitions for i386 processor. Inside i386.h, we have:
> 
> #define FIXED_REGISTERS						\
> /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7*/	\
> {  0, 0, 0, 0, 0, 0, 0, 1, 0,  0,  0,  0,  0,  0,  0,  0,	\
> /*arg,flags,fpsr,dir,frame*/					\
>     1,    1,   1,  1,    1,					\
> /*xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7*/			\
>      0,   0,   0,   0,   0,   0,   0,   0,			\
> /*mmx0,mmx1,mmx2,mmx3,mmx4,mmx5,mmx6,mmx7*/			\
>      0,   0,   0,   0,   0,   0,   0,   0,			\
> /*  r8,  r9, r10, r11, r12, r13, r14, r15*/			\
>      2,   2,   2,   2,   2,   2,   2,   2,			\
> /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/		\
>      2,   2,    2,    2,    2,    2,    2,    2}
> 
> However, there should be another register defined, i387 control word register,
> 'fpcr' (Please look at chapter 11.2.1.2 and 11.2.1.3 in
> http://webster.cs.ucr.edu/AoA/Windows/HTML/RealArithmetic.html). There are two
> instructions in i386.md that actually use fpcr:

Well you really want both the fpcr and the mxcsr registers, since the fpcr
only controls the x87 and the mxcsr controls the xmm registers.  Note, in
adding these registers, you are going to have to go through all of the floating
point patterns to add (use:HI FPCR_REG) and (use:SI MXCSR_REG) to each and
every pattern so that the optimizer can be told not to move a floating point
operation past the setting of the control word.  Then you need to make sure
nothing got broken by adding these USE instructions.  It is certainly doable,
but it is not a simple fix.  I suspect there is then more information needed to
be able to move redudant mode switching operations out of a loop.  If you are
going to tackle it, be sure to have your paperwork in place so that your code
changes can be used.

-- 
Michael Meissner
email: gnu@the-meissners.org
http://www.the-meissners.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* i387 control word register definition is missing
@ 2005-05-23 15:20 Uros Bizjak
  2005-05-24 16:18 ` Michael Meissner
  0 siblings, 1 reply; 5+ messages in thread
From: Uros Bizjak @ 2005-05-23 15:20 UTC (permalink / raw)
  To: gcc

Hello!

It looks that i387 control word register definition is missing from register
definitions for i386 processor. Inside i386.h, we have:

#define FIXED_REGISTERS						\
/*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7*/	\
{  0, 0, 0, 0, 0, 0, 0, 1, 0,  0,  0,  0,  0,  0,  0,  0,	\
/*arg,flags,fpsr,dir,frame*/					\
    1,    1,   1,  1,    1,					\
/*xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7*/			\
     0,   0,   0,   0,   0,   0,   0,   0,			\
/*mmx0,mmx1,mmx2,mmx3,mmx4,mmx5,mmx6,mmx7*/			\
     0,   0,   0,   0,   0,   0,   0,   0,			\
/*  r8,  r9, r10, r11, r12, r13, r14, r15*/			\
     2,   2,   2,   2,   2,   2,   2,   2,			\
/*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/		\
     2,   2,    2,    2,    2,    2,    2,    2}

However, there should be another register defined, i387 control word register,
'fpcr' (Please look at chapter 11.2.1.2 and 11.2.1.3 in
http://webster.cs.ucr.edu/AoA/Windows/HTML/RealArithmetic.html). There are two
instructions in i386.md that actually use fpcr:

(define_insn "x86_fnstcw_1"
  [(set (match_operand:HI 0 "memory_operand" "=m")
	(unspec:HI [(reg:HI FPSR_REG)] UNSPEC_FSTCW))]
  "TARGET_80387"
  "fnstcw\t%0"
  [(set_attr "length" "2")
   (set_attr "mode" "HI")
   (set_attr "unit" "i387")])

(define_insn "x86_fldcw_1"
  [(set (reg:HI FPSR_REG)
	(unspec:HI [(match_operand:HI 0 "memory_operand" "m")] UNSPEC_FLDCW))]
  "TARGET_80387"
  "fldcw\t%0"
  [(set_attr "length" "2")
   (set_attr "mode" "HI")
   (set_attr "unit" "i387")
   (set_attr "athlon_decode" "vector")])

However, RTL template for these two instructions state that they use i387 STATUS
register, but they should use i387 CONTROL register. To be correct, a new fixed
register should be introduced:

#define FIXED_REGISTERS						\
/*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7*/	\
{  0, 0, 0, 0, 0, 0, 0, 1, 0,  0,  0,  0,  0,  0,  0,  0,	\
/*arg,flags,fpsr,fpcr,dir,frame*/				\
    1,    1,   1,   1,  1,    1,				\
...

and above two insn definitions should be changed to use FPCR_REG. Unfortunately,
some changes would be needed through the code (mainly to various register masks
and definitions) to fix this issue, so I would like to ask for opinions on this
change before proceeding.

This change would be needed to get i387 status word switching instructions out
of the loops, i.e.:

for ...

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-05-26  6:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-25 12:48 i387 control word register definition is missing Uros Bizjak
2005-05-25 22:03 ` Jan Hubicka
2005-05-26 13:14   ` Uros Bizjak
  -- strict thread matches above, loose matches on Subject: below --
2005-05-23 15:20 Uros Bizjak
2005-05-24 16:18 ` Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).