* [PATCH, middle-end, i386]: Use signbit insn in copysing insn and implement signbit using fxam x87 insn
@ 2007-07-21 10:40 Uros Bizjak
2007-07-21 20:08 ` Richard Guenther
2007-07-26 16:17 ` Wolfgang Gellerich
0 siblings, 2 replies; 3+ messages in thread
From: Uros Bizjak @ 2007-07-21 10:40 UTC (permalink / raw)
To: GCC Patches; +Cc: Wolfgang Gellerich
[-- Attachment #1: Type: text/plain, Size: 3119 bytes --]
Hello!
This patch implements signbit built-in using x87 fxam instruction for
-mfpmath=387. Using this approach, partial memory stalls when using bit
operations on FP value are avoided (see chapters 14.7 and 15.8 of
"Pentium optimization guide" for further information on this topic).
I also took this opportunity and optimize copysign expander when using
abs/neg approach. Sign bit of the input operand can be determined using
signbit insn if available, further reducing number of partial memory
stalls. The patch also changes signbit handling a bit (to be more
consistent with other builtins). s390 target (which is currently the
only user of signbit optab) _could_ also be affected by or benefit from
this patch.
Currently, following example:
--cut here--
int test2 (double x)
{
return signbit (x+2.0);
}
--cut here
compiles into:
test2:
subl $12, %esp
flds .LC0
faddl 16(%esp)
fstpl (%esp)
movl 4(%esp), %eax
addl $12, %esp
andl $-2147483648, %eax
ret
with attached patch, gcc generates:
test2:
flds .LC0
faddl 4(%esp)
fxam
fnstsw %ax
fstp %st(0)
andl $512, %eax
ret
avoiding partial memory stall completely.
In the copysign case:
--cut here--
double test (double a, double b)
{
return copysign (a, b+2.0);
}
--cut here--
unpatched gcc generates:
test:
subl $12, %esp
fldl 16(%esp)
flds .LC0
faddl 24(%esp)
fstpl (%esp)
movl 4(%esp), %eax
fabs
testl %eax, %eax
jns .L4
fchs
.L4:
addl $12, %esp
ret
and patched gcc generates:
test:
fldl 4(%esp)
flds .LC0
faddl 12(%esp)
fxam
fnstsw %ax
fstp %st(0)
testb $2, %ah
fabs
je .L4
fchs
.L4:
rep ; ret
SSE math is not affected by this patch.
The patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
with and without -m32.
OK for mainline? (The patch needs approval for its middle-end part.)
2007-07-21 Uros Bizjak <ubizjak@gmail.com>
* optabs.h (enum optab_index): Add new OTI_signbit.
(signbit_optab): Define corresponding macro.
(enum insn_code signbit_optab[]): Remove array.
* genopinit.c (optabs): Implement scalb_optab using scalb?f3
patterns.
* optabs.c (init_optabs): Initialize signbit_optab using init_optab.
(expand_copysign_absneg): If back end provides signbit insn, use it
instead of bit operations on floating point argument.
* builtins.c (enum insn_code signbit_optab[]): Remove array.
(expand_builtin_signbit): Check
signbit_optab->handlers[].insn_code for
availability of signbit insn.
* config/i386/i386.md (signbit<mode>2): New insn pattern to
implement
signbitf, signbit and signbitl built-ins as inline x87
intrinsics when
SSE mode is not active.
(isinf<mode>2): Disable for mfpmath=sse,387.
Uros.
[-- Attachment #2: signbit.diff.txt --]
[-- Type: text/plain, Size: 8584 bytes --]
Index: optabs.c
===================================================================
--- optabs.c (revision 126808)
+++ optabs.c (working copy)
@@ -3112,63 +3112,79 @@ expand_copysign_absneg (enum machine_mod
int bitpos, bool op0_is_abs)
{
enum machine_mode imode;
- HOST_WIDE_INT hi, lo;
- int word;
- rtx label;
+ int icode;
+ rtx sign, label;
if (target == op1)
target = NULL_RTX;
- if (!op0_is_abs)
+ /* Check if the back end provides an insn that handles signbit for the
+ argument's mode. */
+ icode = (int) signbit_optab->handlers [(int) mode].insn_code;
+ if (icode != CODE_FOR_nothing)
{
- op0 = expand_unop (mode, abs_optab, op0, target, 0);
- if (op0 == NULL)
- return NULL_RTX;
- target = op0;
+ imode = insn_data[icode].operand[0].mode;
+ sign = gen_reg_rtx (imode);
+ emit_unop_insn (icode, sign, op1, UNKNOWN);
}
else
{
- if (target == NULL_RTX)
- target = copy_to_reg (op0);
+ HOST_WIDE_INT hi, lo;
+
+ if (GET_MODE_SIZE (mode) <= UNITS_PER_WORD)
+ {
+ imode = int_mode_for_mode (mode);
+ if (imode == BLKmode)
+ return NULL_RTX;
+ op1 = gen_lowpart (imode, op1);
+ }
else
- emit_move_insn (target, op0);
- }
+ {
+ int word;
- if (GET_MODE_SIZE (mode) <= UNITS_PER_WORD)
- {
- imode = int_mode_for_mode (mode);
- if (imode == BLKmode)
- return NULL_RTX;
- op1 = gen_lowpart (imode, op1);
- }
- else
- {
- imode = word_mode;
- if (FLOAT_WORDS_BIG_ENDIAN)
- word = (GET_MODE_BITSIZE (mode) - bitpos) / BITS_PER_WORD;
+ imode = word_mode;
+ if (FLOAT_WORDS_BIG_ENDIAN)
+ word = (GET_MODE_BITSIZE (mode) - bitpos) / BITS_PER_WORD;
+ else
+ word = bitpos / BITS_PER_WORD;
+ bitpos = bitpos % BITS_PER_WORD;
+ op1 = operand_subword_force (op1, word, mode);
+ }
+
+ if (bitpos < HOST_BITS_PER_WIDE_INT)
+ {
+ hi = 0;
+ lo = (HOST_WIDE_INT) 1 << bitpos;
+ }
else
- word = bitpos / BITS_PER_WORD;
- bitpos = bitpos % BITS_PER_WORD;
- op1 = operand_subword_force (op1, word, mode);
+ {
+ hi = (HOST_WIDE_INT) 1 << (bitpos - HOST_BITS_PER_WIDE_INT);
+ lo = 0;
+ }
+
+ sign = gen_reg_rtx (imode);
+ sign = expand_binop (imode, and_optab, op1,
+ immed_double_const (lo, hi, imode),
+ NULL_RTX, 1, OPTAB_LIB_WIDEN);
}
- if (bitpos < HOST_BITS_PER_WIDE_INT)
+ if (!op0_is_abs)
{
- hi = 0;
- lo = (HOST_WIDE_INT) 1 << bitpos;
+ op0 = expand_unop (mode, abs_optab, op0, target, 0);
+ if (op0 == NULL)
+ return NULL_RTX;
+ target = op0;
}
else
{
- hi = (HOST_WIDE_INT) 1 << (bitpos - HOST_BITS_PER_WIDE_INT);
- lo = 0;
+ if (target == NULL_RTX)
+ target = copy_to_reg (op0);
+ else
+ emit_move_insn (target, op0);
}
- op1 = expand_binop (imode, and_optab, op1,
- immed_double_const (lo, hi, imode),
- NULL_RTX, 1, OPTAB_LIB_WIDEN);
-
label = gen_label_rtx ();
- emit_cmp_and_jump_insns (op1, const0_rtx, EQ, NULL_RTX, imode, 1, label);
+ emit_cmp_and_jump_insns (sign, const0_rtx, EQ, NULL_RTX, imode, 1, label);
if (GET_CODE (op0) == CONST_DOUBLE)
op0 = simplify_unary_operation (NEG, mode, op0, mode);
@@ -5585,6 +5601,7 @@ init_optabs (void)
tan_optab = init_optab (UNKNOWN);
atan_optab = init_optab (UNKNOWN);
copysign_optab = init_optab (UNKNOWN);
+ signbit_optab = init_optab (UNKNOWN);
isinf_optab = init_optab (UNKNOWN);
@@ -5655,7 +5672,6 @@ init_optabs (void)
for (i = 0; i < NUM_MACHINE_MODES; i++)
{
movmem_optab[i] = CODE_FOR_nothing;
- signbit_optab[i] = CODE_FOR_nothing;
cmpstr_optab[i] = CODE_FOR_nothing;
cmpstrn_optab[i] = CODE_FOR_nothing;
cmpmem_optab[i] = CODE_FOR_nothing;
Index: optabs.h
===================================================================
--- optabs.h (revision 126808)
+++ optabs.h (working copy)
@@ -219,7 +219,8 @@ enum optab_index
OTI_atan,
/* Copy sign */
OTI_copysign,
-
+ /* Signbit */
+ OTI_signbit,
/* Test for infinite value */
OTI_isinf,
@@ -409,7 +410,7 @@ extern GTY(()) optab optab_table[OTI_MAX
#define tan_optab (optab_table[OTI_tan])
#define atan_optab (optab_table[OTI_atan])
#define copysign_optab (optab_table[OTI_copysign])
-
+#define signbit_optab (optab_table[OTI_signbit])
#define isinf_optab (optab_table[OTI_isinf])
#define cmp_optab (optab_table[OTI_cmp])
@@ -553,9 +554,6 @@ extern enum insn_code vcondu_gen_code[NU
/* This array records the insn_code of insns to perform block moves. */
extern enum insn_code movmem_optab[NUM_MACHINE_MODES];
-/* This array records the insn_code of insns to implement the signbit function. */
-extern enum insn_code signbit_optab[NUM_MACHINE_MODES];
-
/* This array records the insn_code of insns to perform block sets. */
extern enum insn_code setmem_optab[NUM_MACHINE_MODES];
Index: genopinit.c
===================================================================
--- genopinit.c (revision 126808)
+++ genopinit.c (working copy)
@@ -122,6 +122,7 @@ static const char * const optabs[] =
abs_optab->handlers[$A].insn_code = CODE_FOR_$(abs$F$a2$)",
"absv_optab->handlers[$A].insn_code = CODE_FOR_$(absv$I$a2$)",
"copysign_optab->handlers[$A].insn_code = CODE_FOR_$(copysign$F$a3$)",
+ "signbit_optab->handlers[$A].insn_code = CODE_FOR_$(signbit$F$a2$)",
"isinf_optab->handlers[$A].insn_code = CODE_FOR_$(isinf$a2$)",
"sqrt_optab->handlers[$A].insn_code = CODE_FOR_$(sqrt$a2$)",
"floor_optab->handlers[$A].insn_code = CODE_FOR_$(floor$a2$)",
@@ -177,7 +178,6 @@ static const char * const optabs[] =
"push_optab->handlers[$A].insn_code = CODE_FOR_$(push$a1$)",
"reload_in_optab[$A] = CODE_FOR_$(reload_in$a$)",
"reload_out_optab[$A] = CODE_FOR_$(reload_out$a$)",
- "signbit_optab[$A] = CODE_FOR_$(signbit$F$a2$)",
"movmem_optab[$A] = CODE_FOR_$(movmem$a$)",
"cmpstr_optab[$A] = CODE_FOR_$(cmpstr$a$)",
"cmpstrn_optab[$A] = CODE_FOR_$(cmpstrn$a$)",
Index: builtins.c
===================================================================
--- builtins.c (revision 126808)
+++ builtins.c (working copy)
@@ -240,11 +240,6 @@ static tree do_mpfr_remquo (tree, tree,
static tree do_mpfr_lgamma_r (tree, tree, tree);
#endif
-/* This array records the insn_code of insns to imlement the signbit
- function. */
-enum insn_code signbit_optab[NUM_MACHINE_MODES];
-
-
/* Return true if NODE should be considered for inline expansion regardless
of the optimization level. This means whenever a function is invoked with
its "internal" name, which normally contains the prefix "__builtin". */
@@ -5725,7 +5720,7 @@ expand_builtin_signbit (tree exp, rtx ta
HOST_WIDE_INT hi, lo;
tree arg;
int word, bitpos;
- enum insn_code signbit_insn_code;
+ enum insn_code icode;
rtx temp;
if (!validate_arglist (exp, REAL_TYPE, VOID_TYPE))
@@ -5743,11 +5738,11 @@ expand_builtin_signbit (tree exp, rtx ta
/* Check if the back end provides an insn that handles signbit for the
argument's mode. */
- signbit_insn_code = signbit_optab [(int) fmode];
- if (signbit_insn_code != CODE_FOR_nothing)
+ icode = signbit_optab->handlers [(int) fmode].insn_code;
+ if (icode != CODE_FOR_nothing)
{
target = gen_reg_rtx (TYPE_MODE (TREE_TYPE (exp)));
- emit_unop_insn (signbit_insn_code, target, temp, UNKNOWN);
+ emit_unop_insn (icode, target, temp, UNKNOWN);
return target;
}
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md (revision 126808)
+++ config/i386/i386.md (working copy)
@@ -17990,8 +17990,7 @@
(use (match_operand:X87MODEF 1 "register_operand" ""))]
"TARGET_USE_FANCY_MATH_387
&& TARGET_C99_FUNCTIONS
- && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)
- || TARGET_MIX_SSE_I387)"
+ && !(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
{
rtx mask = GEN_INT (0x45);
rtx val = GEN_INT (0x05);
@@ -18012,6 +18011,20 @@
DONE;
})
+(define_expand "signbit<mode>2"
+ [(use (match_operand:SI 0 "register_operand" ""))
+ (use (match_operand:X87MODEF 1 "register_operand" ""))]
+ "TARGET_USE_FANCY_MATH_387
+ && !(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
+{
+ rtx mask = GEN_INT (0x0200);
+
+ rtx scratch = gen_reg_rtx (HImode);
+
+ emit_insn (gen_fxam<mode>2_i387 (scratch, operands[1]));
+ emit_insn (gen_andsi3 (operands[0], gen_lowpart (SImode, scratch), mask));
+ DONE;
+})
\f
;; Block operation instructions
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH, middle-end, i386]: Use signbit insn in copysing insn and implement signbit using fxam x87 insn
2007-07-21 10:40 [PATCH, middle-end, i386]: Use signbit insn in copysing insn and implement signbit using fxam x87 insn Uros Bizjak
@ 2007-07-21 20:08 ` Richard Guenther
2007-07-26 16:17 ` Wolfgang Gellerich
1 sibling, 0 replies; 3+ messages in thread
From: Richard Guenther @ 2007-07-21 20:08 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches, Wolfgang Gellerich
On 7/21/07, Uros Bizjak <ubizjak@gmail.com> wrote:
> Hello!
>
> This patch implements signbit built-in using x87 fxam instruction for
> -mfpmath=387. Using this approach, partial memory stalls when using bit
> operations on FP value are avoided (see chapters 14.7 and 15.8 of
> "Pentium optimization guide" for further information on this topic).
>
> I also took this opportunity and optimize copysign expander when using
> abs/neg approach. Sign bit of the input operand can be determined using
> signbit insn if available, further reducing number of partial memory
> stalls. The patch also changes signbit handling a bit (to be more
> consistent with other builtins). s390 target (which is currently the
> only user of signbit optab) _could_ also be affected by or benefit from
> this patch.
>
> Currently, following example:
>
> --cut here--
> int test2 (double x)
> {
> return signbit (x+2.0);
> }
> --cut here
>
> compiles into:
>
> test2:
> subl $12, %esp
> flds .LC0
> faddl 16(%esp)
> fstpl (%esp)
> movl 4(%esp), %eax
> addl $12, %esp
> andl $-2147483648, %eax
> ret
>
> with attached patch, gcc generates:
>
> test2:
> flds .LC0
> faddl 4(%esp)
> fxam
> fnstsw %ax
> fstp %st(0)
> andl $512, %eax
> ret
>
> avoiding partial memory stall completely.
>
> In the copysign case:
>
> --cut here--
> double test (double a, double b)
> {
> return copysign (a, b+2.0);
> }
> --cut here--
>
> unpatched gcc generates:
>
> test:
> subl $12, %esp
> fldl 16(%esp)
> flds .LC0
> faddl 24(%esp)
> fstpl (%esp)
> movl 4(%esp), %eax
> fabs
> testl %eax, %eax
> jns .L4
> fchs
> .L4:
> addl $12, %esp
> ret
>
> and patched gcc generates:
>
> test:
> fldl 4(%esp)
> flds .LC0
> faddl 12(%esp)
> fxam
> fnstsw %ax
> fstp %st(0)
> testb $2, %ah
> fabs
> je .L4
> fchs
> .L4:
> rep ; ret
>
> SSE math is not affected by this patch.
>
> The patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
> with and without -m32.
>
> OK for mainline? (The patch needs approval for its middle-end part.)
>
> 2007-07-21 Uros Bizjak <ubizjak@gmail.com>
>
> * optabs.h (enum optab_index): Add new OTI_signbit.
> (signbit_optab): Define corresponding macro.
> (enum insn_code signbit_optab[]): Remove array.
> * genopinit.c (optabs): Implement scalb_optab using scalb?f3
> patterns.
I suppose this should say signbit_optab using signbit?f2.
This is ok for mainline.
Thanks,
Richard.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH, middle-end, i386]: Use signbit insn in copysing insn and implement signbit using fxam x87 insn
2007-07-21 10:40 [PATCH, middle-end, i386]: Use signbit insn in copysing insn and implement signbit using fxam x87 insn Uros Bizjak
2007-07-21 20:08 ` Richard Guenther
@ 2007-07-26 16:17 ` Wolfgang Gellerich
1 sibling, 0 replies; 3+ messages in thread
From: Wolfgang Gellerich @ 2007-07-26 16:17 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches
Uros Bizjak <ubizjak@gmail.com> wrote on 21.07.2007 12:08:06:
> Hello!
>
> This patch implements signbit built-in using x87 fxam instruction for
> -mfpmath=387. Using this approach, partial memory stalls when using bit
> operations on FP value are avoided (see chapters 14.7 and 15.8 of
> "Pentium optimization guide" for further information on this topic).
>
> I also took this opportunity and optimize copysign expander when using
> abs/neg approach. Sign bit of the input operand can be determined using
> signbit insn if available, further reducing number of partial memory
> stalls. The patch also changes signbit handling a bit (to be more
> consistent with other builtins). s390 target (which is currently the
> only user of signbit optab) _could_ also be affected by or benefit from
> this patch.
This is fine for s390!
Regards, Wolfgang
---
Dr. Wolfgang Gellerich
IBM Deutschland Entwicklung GmbH
Schönaicher Strasse 220
71032 Böblingen, Germany
Tel. +49 / 7031 / 162598
gellerich@de.ibm.com
=======================
IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-07-26 15:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-21 10:40 [PATCH, middle-end, i386]: Use signbit insn in copysing insn and implement signbit using fxam x87 insn Uros Bizjak
2007-07-21 20:08 ` Richard Guenther
2007-07-26 16:17 ` Wolfgang Gellerich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).