* Re: rfa (x86): 387<=>sse moves
@ 2005-07-31 16:51 Uros Bizjak
2005-08-01 18:52 ` Dale Johannesen
0 siblings, 1 reply; 10+ messages in thread
From: Uros Bizjak @ 2005-07-31 16:51 UTC (permalink / raw)
To: paolo.bonzini; +Cc: dalej, gcc
Hello!
> With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code like
>
> double d = atof(foo);
> int i = d;
>
>
> call atof
> fstpl -8(%ebp)
> movsd -8(%ebp), %xmm0
> cvttsd2si %xmm0, %eax
>
>
> (This is Linux, Darwin is similar.) I think the difficulty is that for
This problem is similar to the problem, described in PR target/19398.
There is another testcase and a small analysis in the PR that might help
with this problem.
Uros.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rfa (x86): 387<=>sse moves
2005-07-31 16:51 rfa (x86): 387<=>sse moves Uros Bizjak
@ 2005-08-01 18:52 ` Dale Johannesen
0 siblings, 0 replies; 10+ messages in thread
From: Dale Johannesen @ 2005-08-01 18:52 UTC (permalink / raw)
To: ubizjak; +Cc: paolo.bonzini, gcc
On Jul 31, 2005, at 9:51 AM, Uros Bizjak wrote:
> Hello!
>
>> With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code
>> like
>>
>> double d = atof(foo);
>> int i = d;
>>
>>
>> call atof
>> fstpl -8(%ebp)
>> movsd -8(%ebp), %xmm0
>> cvttsd2si %xmm0, %eax
>>
>>
>> (This is Linux, Darwin is similar.) I think the difficulty is that for
>
> This problem is similar to the problem, described in PR target/19398.
> There is another testcase and a small analysis in the PR that might
> help with this problem.
Thanks, that does seem relevant. The patches so far don't fix this
case;
I've commented the PR explaining why.
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: rfa (x86): 387<=>sse moves
@ 2005-08-02 16:31 Linthicum, Tony
0 siblings, 0 replies; 10+ messages in thread
From: Linthicum, Tony @ 2005-08-02 16:31 UTC (permalink / raw)
To: Dale Johannesen, ubizjak; +Cc: paolo.bonzini, gcc
Hello All,
I applied the recent patches to the 7/23 snapshot, and am still seeing
some 387 to sse moves. In particular, in SpecFP's 177.mesa (matrix.c),
I'm seeing fld1's feeding moves to sse registers.
Compiled via: gcc -O3 -march=k8 -mfpmath=sse matrix.c
Thanks.
Tony
-----Original Message-----
From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf Of
Dale Johannesen
Sent: Monday, August 01, 2005 1:53 PM
To: ubizjak@gmail.com
Cc: paolo.bonzini@lu.unisi.ch; gcc@gcc.gnu.org
Subject: Re: rfa (x86): 387<=>sse moves
On Jul 31, 2005, at 9:51 AM, Uros Bizjak wrote:
> Hello!
>
>> With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code
>> like
>>
>> double d = atof(foo);
>> int i = d;
>>
>>
>> call atof
>> fstpl -8(%ebp)
>> movsd -8(%ebp), %xmm0
>> cvttsd2si %xmm0, %eax
>>
>>
>> (This is Linux, Darwin is similar.) I think the difficulty is that
for
>
> This problem is similar to the problem, described in PR target/19398.
> There is another testcase and a small analysis in the PR that might
> help with this problem.
Thanks, that does seem relevant. The patches so far don't fix this
case;
I've commented the PR explaining why.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rfa (x86): 387<=>sse moves
2005-07-27 21:19 ` Richard Henderson
2005-07-28 0:07 ` Dale Johannesen
@ 2005-07-29 23:04 ` Dale Johannesen
1 sibling, 0 replies; 10+ messages in thread
From: Dale Johannesen @ 2005-07-29 23:04 UTC (permalink / raw)
To: Richard Henderson; +Cc: Paolo Bonzini, GCC Development
On Jul 27, 2005, at 2:18 PM, Richard Henderson wrote:
> On Tue, Jul 26, 2005 at 11:10:56PM -0700, Dale Johannesen wrote:
>> Yes, it is. The following fixes my problem, and causes a couple of
>> 3DNow-specific regressions
>> in the testsuite which I need to look at, but nothing serious; I think
>> it's gotten far enough to post
>> for opinions. This is intended to go on top of Paolo's patch
>> http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01044.html
>> It may, of course, run afoul of inaccuracies in the patterns on
>> various
>> targets, haven't tried any performance testing yet.
>
> Looks plausible. Let us know what you wind up with wrt those
> regressions and testing.
OK, I've tested this on darwin x86 (both patches together). No
regressions.
I don't think I ought to publish absolute Spec numbers for this
machine, but
I get +1% on FP and +1/2% on Int. Wins: applu +3%, lucas +10%,
eon +3%. Losses: apsi -9%. All other changes under 2%. This looks
OK to me, though I'll be investigating apsi.
(Paolo and Richard Guenther are doing this for Linux.)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rfa (x86): 387<=>sse moves
2005-07-27 21:19 ` Richard Henderson
@ 2005-07-28 0:07 ` Dale Johannesen
2005-07-29 23:04 ` Dale Johannesen
1 sibling, 0 replies; 10+ messages in thread
From: Dale Johannesen @ 2005-07-28 0:07 UTC (permalink / raw)
To: Richard Henderson; +Cc: Paolo Bonzini, GCC Development, Dale Johannesen
On Jul 27, 2005, at 2:18 PM, Richard Henderson wrote:
> On Tue, Jul 26, 2005 at 11:10:56PM -0700, Dale Johannesen wrote:
>> Yes, it is. The following fixes my problem, and causes a couple of
>> 3DNow-specific regressions
>> in the testsuite which I need to look at, but nothing serious; I think
>> it's gotten far enough to post
>> for opinions. This is intended to go on top of Paolo's patch
>> http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01044.html
>> It may, of course, run afoul of inaccuracies in the patterns on
>> various
>> targets, haven't tried any performance testing yet.
>
> Looks plausible. Let us know what you wind up with wrt those
> regressions and testing.
With the latest version of Paolo's patch (in PR 19653) the regressions
are gone. Spec is going to take a bit longer, I haven't gotten GMP to
build yet on x86 Darwin....since the FP benchmarks are the interesting
ones for this I should work through it.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rfa (x86): 387<=>sse moves
2005-07-27 6:11 ` Dale Johannesen
@ 2005-07-27 21:19 ` Richard Henderson
2005-07-28 0:07 ` Dale Johannesen
2005-07-29 23:04 ` Dale Johannesen
0 siblings, 2 replies; 10+ messages in thread
From: Richard Henderson @ 2005-07-27 21:19 UTC (permalink / raw)
To: Dale Johannesen; +Cc: Paolo Bonzini, GCC Development
On Tue, Jul 26, 2005 at 11:10:56PM -0700, Dale Johannesen wrote:
> Yes, it is. The following fixes my problem, and causes a couple of
> 3DNow-specific regressions
> in the testsuite which I need to look at, but nothing serious; I think
> it's gotten far enough to post
> for opinions. This is intended to go on top of Paolo's patch
> http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01044.html
> It may, of course, run afoul of inaccuracies in the patterns on various
> targets, haven't tried any performance testing yet.
Looks plausible. Let us know what you wind up with wrt those
regressions and testing.
r~
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rfa (x86): 387<=>sse moves
2005-07-26 22:34 ` Dale Johannesen
@ 2005-07-27 6:11 ` Dale Johannesen
2005-07-27 21:19 ` Richard Henderson
0 siblings, 1 reply; 10+ messages in thread
From: Dale Johannesen @ 2005-07-27 6:11 UTC (permalink / raw)
To: Dale Johannesen; +Cc: Paolo Bonzini, GCC Development
[-- Attachment #1: Type: text/plain, Size: 624 bytes --]
On Jul 26, 2005, at 3:34 PM, Dale Johannesen wrote:
>
> I think the RA may be missing the concept that memory might be faster
> than any possible register....
> will dig further.
Yes, it is. The following fixes my problem, and causes a couple of
3DNow-specific regressions
in the testsuite which I need to look at, but nothing serious; I think
it's gotten far enough to post
for opinions. This is intended to go on top of Paolo's patch
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01044.html
It may, of course, run afoul of inaccuracies in the patterns on various
targets, haven't
tried any performance testing yet.
[-- Attachment #2: diffs5.txt --]
[-- Type: text/plain, Size: 1959 bytes --]
Index: regclass.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/regclass.c,v
retrieving revision 1.206
diff -u -b -r1.206 regclass.c
--- regclass.c 25 Jun 2005 02:00:52 -0000 1.206
+++ regclass.c 27 Jul 2005 06:04:40 -0000
@@ -838,7 +838,8 @@
/* Structure used to record preferences of given pseudo. */
struct reg_pref
{
- /* (enum reg_class) prefclass is the preferred class. */
+ /* (enum reg_class) prefclass is the preferred class. May be
+ NO_REGS if no class is better than memory. */
char prefclass;
/* altclass is a register class that we should use for allocating
@@ -1321,6 +1322,10 @@
best = reg_class_subunion[(int) best][class];
}
+ /* If no register class is better than memory, use memory. */
+ if (p->mem_cost < best_cost)
+ best = NO_REGS;
+
/* Record the alternate register class; i.e., a class for which
every register in it is better than using memory. If adding a
class would make a smaller class (i.e., no union of just those
@@ -1528,7 +1533,7 @@
to what we would add if this register were not in the
appropriate class. */
- if (reg_pref)
+ if (reg_pref && reg_pref[REGNO (op)].prefclass != NO_REGS)
alt_cost
+= (may_move_in_cost[mode]
[(unsigned char) reg_pref[REGNO (op)].prefclass]
@@ -1754,7 +1759,7 @@
to what we would add if this register were not in the
appropriate class. */
- if (reg_pref)
+ if (reg_pref && reg_pref[REGNO (op)].prefclass != NO_REGS)
alt_cost
+= (may_move_in_cost[mode]
[(unsigned char) reg_pref[REGNO (op)].prefclass]
@@ -1840,7 +1845,8 @@
int class;
unsigned int nr;
- if (regno >= FIRST_PSEUDO_REGISTER && reg_pref != 0)
+ if (regno >= FIRST_PSEUDO_REGISTER && reg_pref != 0
+ && reg_pref[regno].prefclass != NO_REGS)
{
enum reg_class pref = reg_pref[regno].prefclass;
[-- Attachment #3: Type: text/plain, Size: 1 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rfa (x86): 387<=>sse moves
2005-07-26 7:51 ` Paolo Bonzini
@ 2005-07-26 22:34 ` Dale Johannesen
2005-07-27 6:11 ` Dale Johannesen
0 siblings, 1 reply; 10+ messages in thread
From: Dale Johannesen @ 2005-07-26 22:34 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: GCC Development, Dale Johannesen
On Jul 26, 2005, at 12:51 AM, Paolo Bonzini wrote:
> Dale Johannesen wrote:
>> With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code
>> like
>> double d = atof(foo);
>> int i = d;
>> call atof
>> fstpl -8(%ebp)
>> movsd -8(%ebp), %xmm0
>> cvttsd2si %xmm0, %eax
>> (This is Linux, Darwin is similar.) I think the difficulty is that
>> for
>
>> (set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger}
> Try the attached patch. It gave a 3% speedup on -mfpmath=sse for
> tramp3d. Richard Henderson asked for SPEC testing, then it may go in.
Thanks. That's progress; the cost computation in regclass now figures
out that memory
is that fastest place to put R58:
Register 58 costs: AD_REGS:87000 Q_REGS:87000 NON_Q_REGS:87000
INDEX_REGS:87000 LEGACY_REGS:87000 GENERAL_REGS:87000 FP_TOP_REG:49000
FP_SECOND_REG:50000 FLOAT_REGS:50000 SSE_REGS:50000
FP_TOP_SSE_REGS:75000
FP_SECOND_SSE_REGS:75000 FLOAT_SSE_REGS:75000 FLOAT_INT_REGS:87000
INT_SSE_REGS:91000 FLOAT_INT_SSE_REGS:91000
ALL_REGS:91000 MEM:40000
Unfortunately local-alloc insists on putting in a register anyway
(ST(0) instead of an XMM,
but the end codegen is unchanged):
;; Register 58 in 8.
I think the RA may be missing the concept that memory might be faster
than any possible register....
will dig further.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rfa (x86): 387<=>sse moves
2005-07-26 1:10 Dale Johannesen
@ 2005-07-26 7:51 ` Paolo Bonzini
2005-07-26 22:34 ` Dale Johannesen
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2005-07-26 7:51 UTC (permalink / raw)
To: GCC Development, Dale Johannesen
[-- Attachment #1: Type: text/plain, Size: 495 bytes --]
Dale Johannesen wrote:
> With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code like
>
> double d = atof(foo);
> int i = d;
>
> call atof
> fstpl -8(%ebp)
> movsd -8(%ebp), %xmm0
> cvttsd2si %xmm0, %eax
>
> (This is Linux, Darwin is similar.) I think the difficulty is that for
Try the attached patch. It gave a 3% speedup on -mfpmath=sse for
tramp3d. Richard Henderson asked for SPEC testing, then it may go in.
Paolo
[-- Attachment #2: pr19653.patch --]
[-- Type: text/plain, Size: 36141 bytes --]
2005-07-14 Paolo Bonzini <bonzini@gnu.org>
* reload.c (find_reloads): Take PREFERRED_OUTPUT_RELOAD_CLASS
into account.
(push_reload): Allow PREFERRED_*_RELOAD_CLASS to liberally
return NO_REGS.
* doc/tm.texi (Register Classes): Document what it means
if PREFERRED_*_RELOAD_CLASS return NO_REGS.
* config/i386/i386.c (ix86_preferred_reload_class): Force
using SSE registers (and return NO_REGS for floating-point
constants) if math is done with SSE.
(ix86_preferred_output_reload_class): New.
* config/i386/i386-protos.h (ix86_preferred_output_reload_class): New.
* config/i386/i386.h (PREFERRED_OUTPUT_RELOAD_CLASS): New.
* config/i386/i386.md: Remove # register preferences.
Index: reload.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/reload.c,v
retrieving revision 1.273
diff -c -r1.273 reload.c
*** reload.c 25 Jun 2005 02:00:53 -0000 1.273
--- reload.c 14 Jul 2005 08:07:56 -0000
***************
*** 1231,1245 ****
/* Narrow down the class of register wanted if that is
desirable on this machine for efficiency. */
! if (in != 0)
! class = PREFERRED_RELOAD_CLASS (in, class);
/* Output reloads may need analogous treatment, different in detail. */
#ifdef PREFERRED_OUTPUT_RELOAD_CLASS
! if (out != 0)
! class = PREFERRED_OUTPUT_RELOAD_CLASS (out, class);
#endif
/* Make sure we use a class that can handle the actual pseudo
inside any subreg. For example, on the 386, QImode regs
can appear within SImode subregs. Although GENERAL_REGS
--- 1231,1254 ----
/* Narrow down the class of register wanted if that is
desirable on this machine for efficiency. */
! {
! enum reg_class preferred_class = class;
!
! if (in != 0)
! preferred_class = PREFERRED_RELOAD_CLASS (in, class);
/* Output reloads may need analogous treatment, different in detail. */
#ifdef PREFERRED_OUTPUT_RELOAD_CLASS
! if (out != 0)
! preferred_class = PREFERRED_OUTPUT_RELOAD_CLASS (out, preferred_class);
#endif
+ /* Discard what the target said if we cannot do it. */
+ if (preferred_class != NO_REGS
+ || (optional && type == RELOAD_FOR_OUTPUT))
+ class = preferred_class;
+ }
+
/* Make sure we use a class that can handle the actual pseudo
inside any subreg. For example, on the 386, QImode regs
can appear within SImode subregs. Although GENERAL_REGS
***************
*** 3443,3457 ****
/* If we can't reload this value at all, reject this
alternative. Note that we could also lose due to
! LIMIT_RELOAD_RELOAD_CLASS, but we don't check that
here. */
if (! CONSTANT_P (operand)
! && (enum reg_class) this_alternative[i] != NO_REGS
! && (PREFERRED_RELOAD_CLASS (operand,
! (enum reg_class) this_alternative[i])
! == NO_REGS))
! bad = 1;
/* Alternative loses if it requires a type of reload not
permitted for this insn. We can always reload SCRATCH
--- 3452,3477 ----
/* If we can't reload this value at all, reject this
alternative. Note that we could also lose due to
! LIMIT_RELOAD_CLASS, but we don't check that
here. */
if (! CONSTANT_P (operand)
! && (enum reg_class) this_alternative[i] != NO_REGS)
! {
! if (PREFERRED_RELOAD_CLASS
! (operand, (enum reg_class) this_alternative[i])
! == NO_REGS)
! bad = 1;
!
! #ifdef PREFERRED_OUTPUT_RELOAD_CLASS
! if (operand_type[i] == RELOAD_FOR_OUTPUT
! && PREFERRED_OUTPUT_RELOAD_CLASS
! (operand, (enum reg_class) this_alternative[i])
! == NO_REGS)
! bad = 1;
! #endif
! }
!
/* Alternative loses if it requires a type of reload not
permitted for this insn. We can always reload SCRATCH
Index: doc/tm.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/tm.texi,v
retrieving revision 1.441
diff -p -u -r1.441 tm.texi
*** doc/tm.texi 13 Jul 2005 16:28:25 -0000 1.441
--- doc/tm.texi 15 Jul 2005 14:13:49 -0000
***************
*** 2385,2396 ****
--- 2385,2408 ----
into any kind of register, code generation will be better if
@code{LEGITIMATE_CONSTANT_P} makes the constant illegitimate instead
of using @code{PREFERRED_RELOAD_CLASS}.
+
+ If an insn has pseudos in it after register allocation, reload will go
+ through the alternatives and call repeatedly @code{PREFERRED_RELOAD_CLASS}
+ to find the best one. Returning @code{NO_REGS}, in this case, makes
+ reload add a @code{?} in front of the constraint: the x86 back-end uses
+ this feature to discourage usage of 387 registers when math is done in
+ the SSE registers (and vice versa). Be careful not to return @code{NO_REGS}
+ when @code{x} is an hard register. Otherwise, it will be impossible to
+ successfully reload the insn.
@end defmac
@defmac PREFERRED_OUTPUT_RELOAD_CLASS (@var{x}, @var{class})
Like @code{PREFERRED_RELOAD_CLASS}, but for output reloads instead of
input reloads. If you don't define this macro, the default is to use
@var{class}, unchanged.
+
+ You can also use @code{PREFERRED_OUTPUT_RELOAD_CLASS} to discourage
+ reload from using some of the insns, like @code{PREFERRED_RELOAD_CLASS}
@end defmac
@defmac LIMIT_RELOAD_CLASS (@var{mode}, @var{class})
Index: config/i386/i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.843
diff -u -p -r1.843 i386.c
*** config/i386/i386.c 18 Jul 2005 06:39:18 -0000 1.843
--- config/i386/i386.c 25 Jul 2005 15:19:03 -0000
***************
*** 15411,15425 ****
enum reg_class
ix86_preferred_reload_class (rtx x, enum reg_class class)
{
/* We're only allowed to return a subclass of CLASS. Many of the
following checks fail for NO_REGS, so eliminate that early. */
if (class == NO_REGS)
return NO_REGS;
/* All classes can load zeros. */
! if (x == CONST0_RTX (GET_MODE (x)))
return class;
/* Floating-point constants need more complex checks. */
if (GET_CODE (x) == CONST_DOUBLE && GET_MODE (x) != VOIDmode)
{
--- 15411,15453 ----
enum reg_class
ix86_preferred_reload_class (rtx x, enum reg_class class)
{
+ enum machine_mode mode = GET_MODE (x);
+ bool is_sse_math_mode;
+
/* We're only allowed to return a subclass of CLASS. Many of the
following checks fail for NO_REGS, so eliminate that early. */
if (class == NO_REGS)
return NO_REGS;
/* All classes can load zeros. */
! if (x == CONST0_RTX (mode))
return class;
+ /* Do not be picky when we are reloading a hard register. */
+ if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER)
+ return class;
+
+ /* Reject this alternative if we are loading: a) a vector constant into
+ an MMX or SSE register b) a floating-point constant into an SSE register
+ that will be used for math. This is because there are no MMX/SSE
+ load-from-constant instructions. */
+
+ is_sse_math_mode =
+ TARGET_SSE_MATH && !TARGET_MIX_SSE_I387 && SSE_FLOAT_MODE_P (mode);
+
+ if (CONSTANT_P (x))
+ {
+ if (MAYBE_MMX_CLASS_P (class))
+ return NO_REGS;
+ if (MAYBE_SSE_CLASS_P (class)
+ && (VECTOR_MODE_P (mode) || mode == TImode || is_sse_math_mode))
+ return NO_REGS;
+ }
+
+ /* Prefer SSE regs only, if we can use them for math. */
+ if (is_sse_math_mode)
+ return SSE_CLASS_P (class) ? class : NO_REGS;
+
/* Floating-point constants need more complex checks. */
if (GET_CODE (x) == CONST_DOUBLE && GET_MODE (x) != VOIDmode)
{
***************
*** 15431,15438 ****
zero above. We only want to wind up preferring 80387 registers if
we plan on doing computation with them. */
if (TARGET_80387
- && (TARGET_MIX_SSE_I387
- || !(TARGET_SSE_MATH && SSE_FLOAT_MODE_P (GET_MODE (x))))
&& standard_80387_constant_p (x))
{
/* Limit class to non-sse. */
--- 15459,15464 ----
***************
*** 15448,15457 ****
return NO_REGS;
}
- if (MAYBE_MMX_CLASS_P (class) && CONSTANT_P (x))
- return NO_REGS;
- if (MAYBE_SSE_CLASS_P (class) && CONSTANT_P (x))
- return NO_REGS;
/* Generally when we see PLUS here, it's the function invariant
(plus soft-fp const_int). Which can only be computed into general
--- 15474,15479 ----
***************
*** 15473,15478 ****
--- 15495,15537 ----
return class;
}
+ /* Discourage putting floating-point values in SSE registers unless
+ SSE math is being used, and likewise for the 387 registers. */
+ enum reg_class
+ ix86_preferred_output_reload_class (rtx x, enum reg_class class)
+ {
+ enum machine_mode mode = GET_MODE (x);
+
+ /* Restrict the output reload class to the register bank that we are doing
+ math on. If we would like not to return a subset of CLASS, reject this
+ alternative: if reload cannot do this, it will still use its choice.
+
+ We only do this if we are reloading a pseudo. Reloads of floating-point
+ hard registers can happen after a VEC_SELECT (whose output can only be
+ in SSE registers) if -mfpmath=387 is active. */
+ if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER)
+ return class;
+
+ if (TARGET_MIX_SSE_I387)
+ return class;
+
+ mode = GET_MODE (x);
+ if (TARGET_SSE_MATH && SSE_FLOAT_MODE_P (mode))
+ return SSE_CLASS_P (class) ? class : NO_REGS;
+
+ if (TARGET_80387 && SCALAR_FLOAT_MODE_P (mode))
+ {
+ if (class == FP_TOP_SSE_REGS)
+ return FP_TOP_REG;
+ else if (class == FP_SECOND_SSE_REGS)
+ return FP_SECOND_REG;
+ else
+ return FLOAT_CLASS_P (class) ? class : NO_REGS;
+ }
+
+ return class;
+ }
+
/* If we are copying between general and FP registers, we need a memory
location. The same is true for SSE and MMX registers.
Index: config/i386/i386.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.h,v
retrieving revision 1.440
diff -c -r1.440 i386.h
*** config/i386/i386.h 26 Jun 2005 05:18:34 -0000 1.440
--- config/i386/i386.h 14 Jul 2005 08:07:55 -0000
***************
*** 1294,1299 ****
--- 1294,1305 ----
#define PREFERRED_RELOAD_CLASS(X, CLASS) \
ix86_preferred_reload_class ((X), (CLASS))
+ /* Discourage putting floating-point values in SSE registers unless
+ SSE math is being used, and likewise for the 387 registers. */
+
+ #define PREFERRED_OUTPUT_RELOAD_CLASS(X, CLASS) \
+ ix86_preferred_output_reload_class ((X), (CLASS))
+
/* If we are copying between general and FP registers, we need a memory
location. The same is true for SSE and MMX registers. */
#define SECONDARY_MEMORY_NEEDED(CLASS1, CLASS2, MODE) \
Index: config/i386/i386-protos.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386-protos.h,v
retrieving revision 1.143
diff -c -r1.143 i386-protos.h
*** config/i386/i386-protos.h 29 Jun 2005 17:27:16 -0000 1.143
--- config/i386/i386-protos.h 14 Jul 2005 08:07:55 -0000
***************
*** 188,193 ****
--- 188,194 ----
extern bool ix86_cannot_change_mode_class (enum machine_mode,
enum machine_mode, enum reg_class);
extern enum reg_class ix86_preferred_reload_class (rtx, enum reg_class);
+ extern enum reg_class ix86_preferred_output_reload_class (rtx, enum reg_class);
extern int ix86_memory_move_cost (enum machine_mode, enum reg_class, int);
extern int ix86_mode_needed (int, rtx);
extern void emit_i387_cw_initialization (int);
Index: config/i386/i386.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.md,v
retrieving revision 1.645
diff -c -r1.645 i386.md
*** config/i386/i386.md 12 Jul 2005 09:20:12 -0000 1.645
--- config/i386/i386.md 14 Jul 2005 21:22:24 -0000
***************
*** 946,953 ****
(define_insn "*cmpfp_i_mixed"
[(set (reg:CCFP FLAGS_REG)
! (compare:CCFP (match_operand 0 "register_operand" "f#x,x#f")
! (match_operand 1 "nonimmediate_operand" "f#x,xm#f")))]
"TARGET_MIX_SSE_I387
&& SSE_FLOAT_MODE_P (GET_MODE (operands[0]))
&& GET_MODE (operands[0]) == GET_MODE (operands[1])"
--- 946,953 ----
(define_insn "*cmpfp_i_mixed"
[(set (reg:CCFP FLAGS_REG)
! (compare:CCFP (match_operand 0 "register_operand" "f,x")
! (match_operand 1 "nonimmediate_operand" "f,xm")))]
"TARGET_MIX_SSE_I387
&& SSE_FLOAT_MODE_P (GET_MODE (operands[0]))
&& GET_MODE (operands[0]) == GET_MODE (operands[1])"
***************
*** 995,1002 ****
(define_insn "*cmpfp_iu_mixed"
[(set (reg:CCFPU FLAGS_REG)
! (compare:CCFPU (match_operand 0 "register_operand" "f#x,x#f")
! (match_operand 1 "nonimmediate_operand" "f#x,xm#f")))]
"TARGET_MIX_SSE_I387
&& SSE_FLOAT_MODE_P (GET_MODE (operands[0]))
&& GET_MODE (operands[0]) == GET_MODE (operands[1])"
--- 995,1002 ----
(define_insn "*cmpfp_iu_mixed"
[(set (reg:CCFPU FLAGS_REG)
! (compare:CCFPU (match_operand 0 "register_operand" "f,x")
! (match_operand 1 "nonimmediate_operand" "f,xm")))]
"TARGET_MIX_SSE_I387
&& SSE_FLOAT_MODE_P (GET_MODE (operands[0]))
&& GET_MODE (operands[0]) == GET_MODE (operands[1])"
***************
*** 2197,2203 ****
(define_insn "*pushsf"
[(set (match_operand:SF 0 "push_operand" "=<,<,<")
! (match_operand:SF 1 "general_no_elim_operand" "f#rx,rFm#fx,x#rf"))]
"!TARGET_64BIT"
{
/* Anything else should be already split before reg-stack. */
--- 2197,2203 ----
(define_insn "*pushsf"
[(set (match_operand:SF 0 "push_operand" "=<,<,<")
! (match_operand:SF 1 "general_no_elim_operand" "f,rFm,x"))]
"!TARGET_64BIT"
{
/* Anything else should be already split before reg-stack. */
***************
*** 2210,2216 ****
(define_insn "*pushsf_rex64"
[(set (match_operand:SF 0 "push_operand" "=X,X,X")
! (match_operand:SF 1 "nonmemory_no_elim_operand" "f#rx,rF#fx,x#rf"))]
"TARGET_64BIT"
{
/* Anything else should be already split before reg-stack. */
--- 2210,2216 ----
(define_insn "*pushsf_rex64"
[(set (match_operand:SF 0 "push_operand" "=X,X,X")
! (match_operand:SF 1 "nonmemory_no_elim_operand" "f,rF,x"))]
"TARGET_64BIT"
{
/* Anything else should be already split before reg-stack. */
***************
*** 2250,2258 ****
(define_insn "*movsf_1"
[(set (match_operand:SF 0 "nonimmediate_operand"
! "=f#xr,m ,f#xr,r#xf ,m ,x#rf,x#rf,x#rf ,m ,!*y,!rm,!*y")
(match_operand:SF 1 "general_operand"
! "fm#rx,f#rx,G ,rmF#fx,Fr#fx,C ,x ,xm#rf,x#rf,rm ,*y ,*y"))]
"!(MEM_P (operands[0]) && MEM_P (operands[1]))
&& (reload_in_progress || reload_completed
|| (ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_LARGE)
--- 2250,2258 ----
(define_insn "*movsf_1"
[(set (match_operand:SF 0 "nonimmediate_operand"
! "=f,m ,f,r,m ,x,x,x,m ,!*y,!rm,!*y")
(match_operand:SF 1 "general_operand"
! "fm,f,G ,rmF,Fr,C ,x ,xm,x,rm ,*y ,*y"))]
"!(MEM_P (operands[0]) && MEM_P (operands[1]))
&& (reload_in_progress || reload_completed
|| (ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_LARGE)
***************
*** 2365,2371 ****
(define_insn "*pushdf_nointeger"
[(set (match_operand:DF 0 "push_operand" "=<,<,<,<")
! (match_operand:DF 1 "general_no_elim_operand" "f#Y,Fo#fY,*r#fY,Y#f"))]
"!TARGET_64BIT && !TARGET_INTEGER_DFMODE_MOVES"
{
/* This insn should be already split before reg-stack. */
--- 2365,2371 ----
(define_insn "*pushdf_nointeger"
[(set (match_operand:DF 0 "push_operand" "=<,<,<,<")
! (match_operand:DF 1 "general_no_elim_operand" "f,Fo,*r,Y"))]
"!TARGET_64BIT && !TARGET_INTEGER_DFMODE_MOVES"
{
/* This insn should be already split before reg-stack. */
***************
*** 2377,2383 ****
(define_insn "*pushdf_integer"
[(set (match_operand:DF 0 "push_operand" "=<,<,<")
! (match_operand:DF 1 "general_no_elim_operand" "f#rY,rFo#fY,Y#rf"))]
"TARGET_64BIT || TARGET_INTEGER_DFMODE_MOVES"
{
/* This insn should be already split before reg-stack. */
--- 2377,2383 ----
(define_insn "*pushdf_integer"
[(set (match_operand:DF 0 "push_operand" "=<,<,<")
! (match_operand:DF 1 "general_no_elim_operand" "f,rFo,Y"))]
"TARGET_64BIT || TARGET_INTEGER_DFMODE_MOVES"
{
/* This insn should be already split before reg-stack. */
***************
*** 2417,2425 ****
(define_insn "*movdf_nointeger"
[(set (match_operand:DF 0 "nonimmediate_operand"
! "=f#Y,m ,f#Y,*r ,o ,Y*x#f,Y*x#f,Y*x#f ,m ")
(match_operand:DF 1 "general_operand"
! "fm#Y,f#Y,G ,*roF,F*r,C ,Y*x#f,HmY*x#f,Y*x#f"))]
"(GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM)
&& ((optimize_size || !TARGET_INTEGER_DFMODE_MOVES) && !TARGET_64BIT)
&& (reload_in_progress || reload_completed
--- 2417,2425 ----
(define_insn "*movdf_nointeger"
[(set (match_operand:DF 0 "nonimmediate_operand"
! "=f,m ,f,*r ,o ,Y*x,Y*x,Y*x,m ")
(match_operand:DF 1 "general_operand"
! "fm,f,G ,*roF,F*r,C ,Y*x,HmY*x,Y*x"))]
"(GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM)
&& ((optimize_size || !TARGET_INTEGER_DFMODE_MOVES) && !TARGET_64BIT)
&& (reload_in_progress || reload_completed
***************
*** 2537,2545 ****
(define_insn "*movdf_integer"
[(set (match_operand:DF 0 "nonimmediate_operand"
! "=f#Yr,m ,f#Yr,r#Yf ,o ,Y*x#rf,Y*x#rf,Y*x#rf,m")
(match_operand:DF 1 "general_operand"
! "fm#Yr,f#Yr,G ,roF#Yf,Fr#Yf,C ,Y*x#rf,m ,Y*x#rf"))]
"(GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM)
&& ((!optimize_size && TARGET_INTEGER_DFMODE_MOVES) || TARGET_64BIT)
&& (reload_in_progress || reload_completed
--- 2537,2545 ----
(define_insn "*movdf_integer"
[(set (match_operand:DF 0 "nonimmediate_operand"
! "=f,m ,f,r,o ,Y*x,Y*x,Y*x,m")
(match_operand:DF 1 "general_operand"
! "fm,f,G ,roF,Fr,C ,Y*x,m ,Y*x"))]
"(GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM)
&& ((!optimize_size && TARGET_INTEGER_DFMODE_MOVES) || TARGET_64BIT)
&& (reload_in_progress || reload_completed
***************
*** 2712,2718 ****
(define_insn "*pushxf_integer"
[(set (match_operand:XF 0 "push_operand" "=<,<")
! (match_operand:XF 1 "general_no_elim_operand" "f#r,ro#f"))]
"!optimize_size"
{
/* This insn should be already split before reg-stack. */
--- 2712,2718 ----
(define_insn "*pushxf_integer"
[(set (match_operand:XF 0 "push_operand" "=<,<")
! (match_operand:XF 1 "general_no_elim_operand" "f,ro"))]
"!optimize_size"
{
/* This insn should be already split before reg-stack. */
***************
*** 2784,2791 ****
(set_attr "mode" "XF,XF,XF,SI,SI")])
(define_insn "*movxf_integer"
! [(set (match_operand:XF 0 "nonimmediate_operand" "=f#r,m,f#r,r#f,o")
! (match_operand:XF 1 "general_operand" "fm#r,f#r,G,roF#f,Fr#f"))]
"!optimize_size
&& (GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM)
&& (reload_in_progress || reload_completed
--- 2784,2791 ----
(set_attr "mode" "XF,XF,XF,SI,SI")])
(define_insn "*movxf_integer"
! [(set (match_operand:XF 0 "nonimmediate_operand" "=f,m,f,r,o")
! (match_operand:XF 1 "general_operand" "fm,f,G,roF,Fr"))]
"!optimize_size
&& (GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM)
&& (reload_in_progress || reload_completed
***************
*** 3508,3515 ****
})
(define_insn "*extendsfdf2_mixed"
! [(set (match_operand:DF 0 "nonimmediate_operand" "=f#Y,m#fY,Y#f")
! (float_extend:DF (match_operand:SF 1 "nonimmediate_operand" "fm#Y,f#Y,mY#f")))]
"TARGET_SSE2 && TARGET_MIX_SSE_I387
&& (GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM)"
{
--- 3508,3515 ----
})
(define_insn "*extendsfdf2_mixed"
! [(set (match_operand:DF 0 "nonimmediate_operand" "=f,m,Y")
! (float_extend:DF (match_operand:SF 1 "nonimmediate_operand" "fm,f,mY")))]
"TARGET_SSE2 && TARGET_MIX_SSE_I387
&& (GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM)"
{
***************
*** 3824,3830 ****
})
(define_insn "*truncxfsf2_mixed"
! [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f#rx,?r#fx,?x#rf")
(float_truncate:SF
(match_operand:XF 1 "register_operand" "f,f,f,f")))
(clobber (match_operand:SF 2 "memory_operand" "=X,m,m,m"))]
--- 3824,3830 ----
})
(define_insn "*truncxfsf2_mixed"
! [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f,?r,?x")
(float_truncate:SF
(match_operand:XF 1 "register_operand" "f,f,f,f")))
(clobber (match_operand:SF 2 "memory_operand" "=X,m,m,m"))]
***************
*** 3851,3857 ****
(set_attr "mode" "SF")])
(define_insn "*truncxfsf2_i387"
! [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f#r,?r#f")
(float_truncate:SF
(match_operand:XF 1 "register_operand" "f,f,f")))
(clobber (match_operand:SF 2 "memory_operand" "=X,m,m"))]
--- 3851,3857 ----
(set_attr "mode" "SF")])
(define_insn "*truncxfsf2_i387"
! [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f,?r")
(float_truncate:SF
(match_operand:XF 1 "register_operand" "f,f,f")))
(clobber (match_operand:SF 2 "memory_operand" "=X,m,m"))]
***************
*** 3922,3928 ****
})
(define_insn "*truncxfdf2_mixed"
! [(set (match_operand:DF 0 "nonimmediate_operand" "=m,?f#rY,?r#fY,?Y#rf")
(float_truncate:DF
(match_operand:XF 1 "register_operand" "f,f,f,f")))
(clobber (match_operand:DF 2 "memory_operand" "=X,m,m,m"))]
--- 3922,3928 ----
})
(define_insn "*truncxfdf2_mixed"
! [(set (match_operand:DF 0 "nonimmediate_operand" "=m,?f,?r,?Y")
(float_truncate:DF
(match_operand:XF 1 "register_operand" "f,f,f,f")))
(clobber (match_operand:DF 2 "memory_operand" "=X,m,m,m"))]
***************
*** 3949,3955 ****
(set_attr "mode" "DF")])
(define_insn "*truncxfdf2_i387"
! [(set (match_operand:DF 0 "nonimmediate_operand" "=m,?f#r,?r#f")
(float_truncate:DF
(match_operand:XF 1 "register_operand" "f,f,f")))
(clobber (match_operand:DF 2 "memory_operand" "=X,m,m"))]
--- 3949,3955 ----
(set_attr "mode" "DF")])
(define_insn "*truncxfdf2_i387"
! [(set (match_operand:DF 0 "nonimmediate_operand" "=m,?f,?r")
(float_truncate:DF
(match_operand:XF 1 "register_operand" "f,f,f")))
(clobber (match_operand:DF 2 "memory_operand" "=X,m,m"))]
***************
*** 4423,4429 ****
"")
(define_insn "*floatsisf2_mixed"
! [(set (match_operand:SF 0 "register_operand" "=f#x,?f#x,x#f,x#f")
(float:SF (match_operand:SI 1 "nonimmediate_operand" "m,r,r,mr")))]
"TARGET_MIX_SSE_I387"
"@
--- 4423,4429 ----
"")
(define_insn "*floatsisf2_mixed"
! [(set (match_operand:SF 0 "register_operand" "=f,?f,x,x")
(float:SF (match_operand:SI 1 "nonimmediate_operand" "m,r,r,mr")))]
"TARGET_MIX_SSE_I387"
"@
***************
*** 4466,4472 ****
"")
(define_insn "*floatdisf2_mixed"
! [(set (match_operand:SF 0 "register_operand" "=f#x,?f#x,x#f,x#f")
(float:SF (match_operand:DI 1 "nonimmediate_operand" "m,r,r,mr")))]
"TARGET_64BIT && TARGET_MIX_SSE_I387"
"@
--- 4466,4472 ----
"")
(define_insn "*floatdisf2_mixed"
! [(set (match_operand:SF 0 "register_operand" "=f,?f,x,x")
(float:SF (match_operand:DI 1 "nonimmediate_operand" "m,r,r,mr")))]
"TARGET_64BIT && TARGET_MIX_SSE_I387"
"@
***************
*** 4534,4540 ****
"")
(define_insn "*floatsidf2_mixed"
! [(set (match_operand:DF 0 "register_operand" "=f#Y,?f#Y,Y#f,Y#f")
(float:DF (match_operand:SI 1 "nonimmediate_operand" "m,r,r,mr")))]
"TARGET_SSE2 && TARGET_MIX_SSE_I387"
"@
--- 4534,4540 ----
"")
(define_insn "*floatsidf2_mixed"
! [(set (match_operand:DF 0 "register_operand" "=f,?f,Y,Y")
(float:DF (match_operand:SI 1 "nonimmediate_operand" "m,r,r,mr")))]
"TARGET_SSE2 && TARGET_MIX_SSE_I387"
"@
***************
*** 4577,4583 ****
"")
(define_insn "*floatdidf2_mixed"
! [(set (match_operand:DF 0 "register_operand" "=f#Y,?f#Y,Y#f,Y#f")
(float:DF (match_operand:DI 1 "nonimmediate_operand" "m,r,r,mr")))]
"TARGET_64BIT && TARGET_SSE2 && TARGET_MIX_SSE_I387"
"@
--- 4577,4583 ----
"")
(define_insn "*floatdidf2_mixed"
! [(set (match_operand:DF 0 "register_operand" "=f,?f,Y,Y")
(float:DF (match_operand:DI 1 "nonimmediate_operand" "m,r,r,mr")))]
"TARGET_64BIT && TARGET_SSE2 && TARGET_MIX_SSE_I387"
"@
***************
*** 9383,9391 ****
"ix86_expand_fp_absneg_operator (ABS, SFmode, operands); DONE;")
(define_insn "*absnegsf2_mixed"
! [(set (match_operand:SF 0 "nonimmediate_operand" "=x#f,x#f,f#x,rm")
(match_operator:SF 3 "absneg_operator"
! [(match_operand:SF 1 "nonimmediate_operand" "0 ,x#f,0 ,0")]))
(use (match_operand:V4SF 2 "nonimmediate_operand" "xm ,0 ,X ,X"))
(clobber (reg:CC FLAGS_REG))]
"TARGET_SSE_MATH && TARGET_MIX_SSE_I387
--- 9383,9391 ----
"ix86_expand_fp_absneg_operator (ABS, SFmode, operands); DONE;")
(define_insn "*absnegsf2_mixed"
! [(set (match_operand:SF 0 "nonimmediate_operand" "=x,x,f,rm")
(match_operator:SF 3 "absneg_operator"
! [(match_operand:SF 1 "nonimmediate_operand" "0 ,x,0 ,0")]))
(use (match_operand:V4SF 2 "nonimmediate_operand" "xm ,0 ,X ,X"))
(clobber (reg:CC FLAGS_REG))]
"TARGET_SSE_MATH && TARGET_MIX_SSE_I387
***************
*** 9479,9487 ****
"ix86_expand_fp_absneg_operator (ABS, DFmode, operands); DONE;")
(define_insn "*absnegdf2_mixed"
! [(set (match_operand:DF 0 "nonimmediate_operand" "=Y#f,Y#f,f#Y,rm")
(match_operator:DF 3 "absneg_operator"
! [(match_operand:DF 1 "nonimmediate_operand" "0 ,Y#f,0 ,0")]))
(use (match_operand:V2DF 2 "nonimmediate_operand" "Ym ,0 ,X ,X"))
(clobber (reg:CC FLAGS_REG))]
"TARGET_SSE2 && TARGET_SSE_MATH && TARGET_MIX_SSE_I387
--- 9479,9487 ----
"ix86_expand_fp_absneg_operator (ABS, DFmode, operands); DONE;")
(define_insn "*absnegdf2_mixed"
! [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,Y,f,rm")
(match_operator:DF 3 "absneg_operator"
! [(match_operand:DF 1 "nonimmediate_operand" "0 ,Y,0 ,0")]))
(use (match_operand:V2DF 2 "nonimmediate_operand" "Ym ,0 ,X ,X"))
(clobber (reg:CC FLAGS_REG))]
"TARGET_SSE2 && TARGET_SSE_MATH && TARGET_MIX_SSE_I387
***************
*** 12723,12730 ****
(define_insn "*fp_jcc_1_mixed"
[(set (pc)
(if_then_else (match_operator 0 "comparison_operator"
! [(match_operand 1 "register_operand" "f#x,x#f")
! (match_operand 2 "nonimmediate_operand" "f#x,xm#f")])
(label_ref (match_operand 3 "" ""))
(pc)))
(clobber (reg:CCFP FPSR_REG))
--- 12723,12730 ----
(define_insn "*fp_jcc_1_mixed"
[(set (pc)
(if_then_else (match_operator 0 "comparison_operator"
! [(match_operand 1 "register_operand" "f,x")
! (match_operand 2 "nonimmediate_operand" "f,xm")])
(label_ref (match_operand 3 "" ""))
(pc)))
(clobber (reg:CCFP FPSR_REG))
***************
*** 12768,12775 ****
(define_insn "*fp_jcc_2_mixed"
[(set (pc)
(if_then_else (match_operator 0 "comparison_operator"
! [(match_operand 1 "register_operand" "f#x,x#f")
! (match_operand 2 "nonimmediate_operand" "f#x,xm#f")])
(pc)
(label_ref (match_operand 3 "" ""))))
(clobber (reg:CCFP FPSR_REG))
--- 12768,12775 ----
(define_insn "*fp_jcc_2_mixed"
[(set (pc)
(if_then_else (match_operator 0 "comparison_operator"
! [(match_operand 1 "register_operand" "f,x")
! (match_operand 2 "nonimmediate_operand" "f,xm")])
(pc)
(label_ref (match_operand 3 "" ""))))
(clobber (reg:CCFP FPSR_REG))
***************
*** 13906,13915 ****
;; so use special patterns for add and mull.
(define_insn "*fop_sf_comm_mixed"
! [(set (match_operand:SF 0 "register_operand" "=f#x,x#f")
(match_operator:SF 3 "binary_fp_operator"
[(match_operand:SF 1 "nonimmediate_operand" "%0,0")
! (match_operand:SF 2 "nonimmediate_operand" "fm#x,xm#f")]))]
"TARGET_MIX_SSE_I387
&& COMMUTATIVE_ARITH_P (operands[3])
&& (GET_CODE (operands[1]) != MEM || GET_CODE (operands[2]) != MEM)"
--- 13906,13915 ----
;; so use special patterns for add and mull.
(define_insn "*fop_sf_comm_mixed"
! [(set (match_operand:SF 0 "register_operand" "=f,x")
(match_operator:SF 3 "binary_fp_operator"
[(match_operand:SF 1 "nonimmediate_operand" "%0,0")
! (match_operand:SF 2 "nonimmediate_operand" "fm,xm")]))]
"TARGET_MIX_SSE_I387
&& COMMUTATIVE_ARITH_P (operands[3])
&& (GET_CODE (operands[1]) != MEM || GET_CODE (operands[2]) != MEM)"
***************
*** 13958,13964 ****
[(set (match_operand:SF 0 "register_operand" "=f,f,x")
(match_operator:SF 3 "binary_fp_operator"
[(match_operand:SF 1 "nonimmediate_operand" "0,fm,0")
! (match_operand:SF 2 "nonimmediate_operand" "fm,0,xm#f")]))]
"TARGET_MIX_SSE_I387
&& !COMMUTATIVE_ARITH_P (operands[3])
&& (GET_CODE (operands[1]) != MEM || GET_CODE (operands[2]) != MEM)"
--- 13958,13964 ----
[(set (match_operand:SF 0 "register_operand" "=f,f,x")
(match_operator:SF 3 "binary_fp_operator"
[(match_operand:SF 1 "nonimmediate_operand" "0,fm,0")
! (match_operand:SF 2 "nonimmediate_operand" "fm,0,xm")]))]
"TARGET_MIX_SSE_I387
&& !COMMUTATIVE_ARITH_P (operands[3])
&& (GET_CODE (operands[1]) != MEM || GET_CODE (operands[2]) != MEM)"
***************
*** 14052,14061 ****
(set_attr "mode" "<MODE>")])
(define_insn "*fop_df_comm_mixed"
! [(set (match_operand:DF 0 "register_operand" "=f#Y,Y#f")
(match_operator:DF 3 "binary_fp_operator"
[(match_operand:DF 1 "nonimmediate_operand" "%0,0")
! (match_operand:DF 2 "nonimmediate_operand" "fm#Y,Ym#f")]))]
"TARGET_SSE2 && TARGET_MIX_SSE_I387
&& COMMUTATIVE_ARITH_P (operands[3])
&& (GET_CODE (operands[1]) != MEM || GET_CODE (operands[2]) != MEM)"
--- 14052,14061 ----
(set_attr "mode" "<MODE>")])
(define_insn "*fop_df_comm_mixed"
! [(set (match_operand:DF 0 "register_operand" "=f,Y")
(match_operator:DF 3 "binary_fp_operator"
[(match_operand:DF 1 "nonimmediate_operand" "%0,0")
! (match_operand:DF 2 "nonimmediate_operand" "fm,Ym")]))]
"TARGET_SSE2 && TARGET_MIX_SSE_I387
&& COMMUTATIVE_ARITH_P (operands[3])
&& (GET_CODE (operands[1]) != MEM || GET_CODE (operands[2]) != MEM)"
***************
*** 14101,14110 ****
(set_attr "mode" "DF")])
(define_insn "*fop_df_1_mixed"
! [(set (match_operand:DF 0 "register_operand" "=f#Y,f#Y,Y#f")
(match_operator:DF 3 "binary_fp_operator"
[(match_operand:DF 1 "nonimmediate_operand" "0,fm,0")
! (match_operand:DF 2 "nonimmediate_operand" "fm,0,Ym#f")]))]
"TARGET_SSE2 && TARGET_SSE_MATH && TARGET_MIX_SSE_I387
&& !COMMUTATIVE_ARITH_P (operands[3])
&& (GET_CODE (operands[1]) != MEM || GET_CODE (operands[2]) != MEM)"
--- 14101,14110 ----
(set_attr "mode" "DF")])
(define_insn "*fop_df_1_mixed"
! [(set (match_operand:DF 0 "register_operand" "=f,f,Y")
(match_operator:DF 3 "binary_fp_operator"
[(match_operand:DF 1 "nonimmediate_operand" "0,fm,0")
! (match_operand:DF 2 "nonimmediate_operand" "fm,0,Ym")]))]
"TARGET_SSE2 && TARGET_SSE_MATH && TARGET_MIX_SSE_I387
&& !COMMUTATIVE_ARITH_P (operands[3])
&& (GET_CODE (operands[1]) != MEM || GET_CODE (operands[2]) != MEM)"
***************
*** 14419,14426 ****
})
(define_insn "*sqrtsf2_mixed"
! [(set (match_operand:SF 0 "register_operand" "=f#x,x#f")
! (sqrt:SF (match_operand:SF 1 "nonimmediate_operand" "0#x,xm#f")))]
"TARGET_USE_FANCY_MATH_387 && TARGET_MIX_SSE_I387"
"@
fsqrt
--- 14419,14426 ----
})
(define_insn "*sqrtsf2_mixed"
! [(set (match_operand:SF 0 "register_operand" "=f,x")
! (sqrt:SF (match_operand:SF 1 "nonimmediate_operand" "0,xm")))]
"TARGET_USE_FANCY_MATH_387 && TARGET_MIX_SSE_I387"
"@
fsqrt
***************
*** 14457,14464 ****
})
(define_insn "*sqrtdf2_mixed"
! [(set (match_operand:DF 0 "register_operand" "=f#Y,Y#f")
! (sqrt:DF (match_operand:DF 1 "nonimmediate_operand" "0#Y,Ym#f")))]
"TARGET_USE_FANCY_MATH_387 && TARGET_SSE2 && TARGET_MIX_SSE_I387"
"@
fsqrt
--- 14457,14464 ----
})
(define_insn "*sqrtdf2_mixed"
! [(set (match_operand:DF 0 "register_operand" "=f,Y")
! (sqrt:DF (match_operand:DF 1 "nonimmediate_operand" "0,Ym")))]
"TARGET_USE_FANCY_MATH_387 && TARGET_SSE2 && TARGET_MIX_SSE_I387"
"@
fsqrt
***************
*** 17921,17931 ****
"if (! ix86_expand_fp_movcc (operands)) FAIL; DONE;")
(define_insn "*movsfcc_1_387"
! [(set (match_operand:SF 0 "register_operand" "=f#r,f#r,r#f,r#f")
(if_then_else:SF (match_operator 1 "fcmov_comparison_operator"
[(reg FLAGS_REG) (const_int 0)])
! (match_operand:SF 2 "nonimmediate_operand" "f#r,0,rm#f,0")
! (match_operand:SF 3 "nonimmediate_operand" "0,f#r,0,rm#f")))]
"TARGET_80387 && TARGET_CMOVE
&& (GET_CODE (operands[2]) != MEM || GET_CODE (operands[3]) != MEM)"
"@
--- 17921,17931 ----
"if (! ix86_expand_fp_movcc (operands)) FAIL; DONE;")
(define_insn "*movsfcc_1_387"
! [(set (match_operand:SF 0 "register_operand" "=f,f,r,r")
(if_then_else:SF (match_operator 1 "fcmov_comparison_operator"
[(reg FLAGS_REG) (const_int 0)])
! (match_operand:SF 2 "nonimmediate_operand" "f,0,rm,0")
! (match_operand:SF 3 "nonimmediate_operand" "0,f,0,rm")))]
"TARGET_80387 && TARGET_CMOVE
&& (GET_CODE (operands[2]) != MEM || GET_CODE (operands[3]) != MEM)"
"@
***************
*** 17945,17955 ****
"if (! ix86_expand_fp_movcc (operands)) FAIL; DONE;")
(define_insn "*movdfcc_1"
! [(set (match_operand:DF 0 "register_operand" "=f#r,f#r,&r#f,&r#f")
(if_then_else:DF (match_operator 1 "fcmov_comparison_operator"
[(reg FLAGS_REG) (const_int 0)])
! (match_operand:DF 2 "nonimmediate_operand" "f#r,0,rm#f,0")
! (match_operand:DF 3 "nonimmediate_operand" "0,f#r,0,rm#f")))]
"!TARGET_64BIT && TARGET_80387 && TARGET_CMOVE
&& (GET_CODE (operands[2]) != MEM || GET_CODE (operands[3]) != MEM)"
"@
--- 17945,17955 ----
"if (! ix86_expand_fp_movcc (operands)) FAIL; DONE;")
(define_insn "*movdfcc_1"
! [(set (match_operand:DF 0 "register_operand" "=f,f,&r,&r")
(if_then_else:DF (match_operator 1 "fcmov_comparison_operator"
[(reg FLAGS_REG) (const_int 0)])
! (match_operand:DF 2 "nonimmediate_operand" "f,0,rm,0")
! (match_operand:DF 3 "nonimmediate_operand" "0,f,0,rm")))]
"!TARGET_64BIT && TARGET_80387 && TARGET_CMOVE
&& (GET_CODE (operands[2]) != MEM || GET_CODE (operands[3]) != MEM)"
"@
***************
*** 17961,17971 ****
(set_attr "mode" "DF")])
(define_insn "*movdfcc_1_rex64"
! [(set (match_operand:DF 0 "register_operand" "=f#r,f#r,r#f,r#f")
(if_then_else:DF (match_operator 1 "fcmov_comparison_operator"
[(reg FLAGS_REG) (const_int 0)])
! (match_operand:DF 2 "nonimmediate_operand" "f#r,0#r,rm#f,0#f")
! (match_operand:DF 3 "nonimmediate_operand" "0#r,f#r,0#f,rm#f")))]
"TARGET_64BIT && TARGET_80387 && TARGET_CMOVE
&& (GET_CODE (operands[2]) != MEM || GET_CODE (operands[3]) != MEM)"
"@
--- 17961,17971 ----
(set_attr "mode" "DF")])
(define_insn "*movdfcc_1_rex64"
! [(set (match_operand:DF 0 "register_operand" "=f,f,r,r")
(if_then_else:DF (match_operator 1 "fcmov_comparison_operator"
[(reg FLAGS_REG) (const_int 0)])
! (match_operand:DF 2 "nonimmediate_operand" "f,0,rm,0")
! (match_operand:DF 3 "nonimmediate_operand" "0,f,0,rm")))]
"TARGET_64BIT && TARGET_80387 && TARGET_CMOVE
&& (GET_CODE (operands[2]) != MEM || GET_CODE (operands[3]) != MEM)"
"@
^ permalink raw reply [flat|nested] 10+ messages in thread
* rfa (x86): 387<=>sse moves
@ 2005-07-26 1:10 Dale Johannesen
2005-07-26 7:51 ` Paolo Bonzini
0 siblings, 1 reply; 10+ messages in thread
From: Dale Johannesen @ 2005-07-26 1:10 UTC (permalink / raw)
To: GCC Development; +Cc: Dale Johannesen
With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code
like
double d = atof(foo);
int i = d;
call atof
fstpl -8(%ebp)
movsd -8(%ebp), %xmm0
cvttsd2si %xmm0, %eax
(This is Linux, Darwin is similar.) I think the difficulty is that for
(set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger}
regclass decides SSE_REGS is a zero-cost choice for 58. Which looks
wrong, as that requires a store and load from memory. In fact, memory
is
the cheapest overall choice for 58 (taking its use into account also),
and
gcc will figure that out correctly if a more reasonable assessment is
given
to SSE_REGS. The immediate cause is the #Y's in the constraint:
"=f#Y,m ,f#Y,*r ,o ,Y*x#f,Y*x#f,Y*x#f ,m
"
and there's probably a simple fix, but it eludes me. Advice? Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-08-02 16:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-31 16:51 rfa (x86): 387<=>sse moves Uros Bizjak
2005-08-01 18:52 ` Dale Johannesen
-- strict thread matches above, loose matches on Subject: below --
2005-08-02 16:31 Linthicum, Tony
2005-07-26 1:10 Dale Johannesen
2005-07-26 7:51 ` Paolo Bonzini
2005-07-26 22:34 ` Dale Johannesen
2005-07-27 6:11 ` Dale Johannesen
2005-07-27 21:19 ` Richard Henderson
2005-07-28 0:07 ` Dale Johannesen
2005-07-29 23:04 ` Dale Johannesen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).