public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, middle-end, i386]: Use signbit insn in copysing insn and  implement signbit using fxam x87 insn
@ 2007-07-21 10:40 Uros Bizjak
  2007-07-21 20:08 ` Richard Guenther
  2007-07-26 16:17 ` Wolfgang Gellerich
  0 siblings, 2 replies; 3+ messages in thread
From: Uros Bizjak @ 2007-07-21 10:40 UTC (permalink / raw)
  To: GCC Patches; +Cc: Wolfgang Gellerich

[-- Attachment #1: Type: text/plain, Size: 3119 bytes --]

Hello!

This patch implements signbit built-in using x87 fxam instruction for 
-mfpmath=387. Using this approach, partial memory stalls when using bit 
operations on FP value are avoided (see chapters 14.7 and 15.8 of 
"Pentium optimization guide" for further information on this topic).

I also took this opportunity and optimize copysign expander when using 
abs/neg approach. Sign bit of the input operand can be determined using 
signbit insn if available, further reducing number of partial memory 
stalls. The patch also changes signbit handling a bit (to be more 
consistent with other builtins). s390 target (which is currently the 
only user of signbit optab) _could_ also be affected by or benefit from 
this patch.

Currently, following example:

--cut here--
int test2 (double x)
{
  return signbit (x+2.0);
}
--cut here

compiles into:

test2:
        subl    $12, %esp
        flds    .LC0
        faddl   16(%esp)
        fstpl   (%esp)
        movl    4(%esp), %eax
        addl    $12, %esp
        andl    $-2147483648, %eax
        ret

with attached patch, gcc generates:

test2:
        flds    .LC0
        faddl   4(%esp)
        fxam
        fnstsw  %ax
        fstp    %st(0)
        andl    $512, %eax
        ret

avoiding partial memory stall completely.

In the copysign case:

--cut here--
double test (double a, double b)
{
  return copysign (a, b+2.0);
}
--cut here--

unpatched gcc generates:

test:
        subl    $12, %esp
        fldl    16(%esp)
        flds    .LC0
        faddl   24(%esp)
        fstpl   (%esp)
        movl    4(%esp), %eax
        fabs
        testl   %eax, %eax
        jns     .L4
        fchs
.L4:
        addl    $12, %esp
        ret

and patched gcc generates:

test:
        fldl    4(%esp)
        flds    .LC0
        faddl   12(%esp)
        fxam
        fnstsw  %ax
        fstp    %st(0)
        testb   $2, %ah
        fabs
        je      .L4
        fchs
.L4:
        rep ; ret

SSE math is not affected by this patch.

The patch was bootstrapped and regression tested on x86_64-pc-linux-gnu 
with and without -m32.

OK for mainline? (The patch needs approval for its middle-end part.)

2007-07-21  Uros Bizjak  <ubizjak@gmail.com>

        * optabs.h (enum optab_index): Add new OTI_signbit.
        (signbit_optab): Define corresponding macro.
        (enum insn_code signbit_optab[]): Remove array.
        * genopinit.c (optabs): Implement scalb_optab using scalb?f3
        patterns.
        * optabs.c (init_optabs): Initialize signbit_optab using init_optab.
        (expand_copysign_absneg): If back end provides signbit insn, use it
        instead of bit operations on floating point argument.
        * builtins.c (enum insn_code signbit_optab[]): Remove array.
        (expand_builtin_signbit): Check 
signbit_optab->handlers[].insn_code for
        availability of signbit insn.

        * config/i386/i386.md (signbit<mode>2): New insn pattern to 
implement
        signbitf, signbit and signbitl built-ins as inline x87 
intrinsics when
        SSE mode is not active.
        (isinf<mode>2): Disable for mfpmath=sse,387.

Uros.

[-- Attachment #2: signbit.diff.txt --]
[-- Type: text/plain, Size: 8584 bytes --]

Index: optabs.c
===================================================================
--- optabs.c	(revision 126808)
+++ optabs.c	(working copy)
@@ -3112,63 +3112,79 @@ expand_copysign_absneg (enum machine_mod
 		        int bitpos, bool op0_is_abs)
 {
   enum machine_mode imode;
-  HOST_WIDE_INT hi, lo;
-  int word;
-  rtx label;
+  int icode;
+  rtx sign, label;
 
   if (target == op1)
     target = NULL_RTX;
 
-  if (!op0_is_abs)
+  /* Check if the back end provides an insn that handles signbit for the
+     argument's mode. */
+  icode = (int) signbit_optab->handlers [(int) mode].insn_code;
+  if (icode != CODE_FOR_nothing)
     {
-      op0 = expand_unop (mode, abs_optab, op0, target, 0);
-      if (op0 == NULL)
-	return NULL_RTX;
-      target = op0;
+      imode = insn_data[icode].operand[0].mode;
+      sign = gen_reg_rtx (imode);
+      emit_unop_insn (icode, sign, op1, UNKNOWN);
     }
   else
     {
-      if (target == NULL_RTX)
-        target = copy_to_reg (op0);
+      HOST_WIDE_INT hi, lo;
+
+      if (GET_MODE_SIZE (mode) <= UNITS_PER_WORD)
+	{
+	  imode = int_mode_for_mode (mode);
+	  if (imode == BLKmode)
+	    return NULL_RTX;
+	  op1 = gen_lowpart (imode, op1);
+	}
       else
-	emit_move_insn (target, op0);
-    }
+	{
+	  int word;
 
-  if (GET_MODE_SIZE (mode) <= UNITS_PER_WORD)
-    {
-      imode = int_mode_for_mode (mode);
-      if (imode == BLKmode)
-	return NULL_RTX;
-      op1 = gen_lowpart (imode, op1);
-    }
-  else
-    {
-      imode = word_mode;
-      if (FLOAT_WORDS_BIG_ENDIAN)
-	word = (GET_MODE_BITSIZE (mode) - bitpos) / BITS_PER_WORD;
+	  imode = word_mode;
+	  if (FLOAT_WORDS_BIG_ENDIAN)
+	    word = (GET_MODE_BITSIZE (mode) - bitpos) / BITS_PER_WORD;
+	  else
+	    word = bitpos / BITS_PER_WORD;
+	  bitpos = bitpos % BITS_PER_WORD;
+	  op1 = operand_subword_force (op1, word, mode);
+	}
+
+      if (bitpos < HOST_BITS_PER_WIDE_INT)
+	{
+	  hi = 0;
+	  lo = (HOST_WIDE_INT) 1 << bitpos;
+	}
       else
-	word = bitpos / BITS_PER_WORD;
-      bitpos = bitpos % BITS_PER_WORD;
-      op1 = operand_subword_force (op1, word, mode);
+	{
+	  hi = (HOST_WIDE_INT) 1 << (bitpos - HOST_BITS_PER_WIDE_INT);
+	  lo = 0;
+	}
+
+      sign = gen_reg_rtx (imode);
+      sign = expand_binop (imode, and_optab, op1,
+			   immed_double_const (lo, hi, imode),
+			   NULL_RTX, 1, OPTAB_LIB_WIDEN);
     }
 
-  if (bitpos < HOST_BITS_PER_WIDE_INT)
+  if (!op0_is_abs)
     {
-      hi = 0;
-      lo = (HOST_WIDE_INT) 1 << bitpos;
+      op0 = expand_unop (mode, abs_optab, op0, target, 0);
+      if (op0 == NULL)
+	return NULL_RTX;
+      target = op0;
     }
   else
     {
-      hi = (HOST_WIDE_INT) 1 << (bitpos - HOST_BITS_PER_WIDE_INT);
-      lo = 0;
+      if (target == NULL_RTX)
+        target = copy_to_reg (op0);
+      else
+	emit_move_insn (target, op0);
     }
 
-  op1 = expand_binop (imode, and_optab, op1,
-		      immed_double_const (lo, hi, imode),
-		      NULL_RTX, 1, OPTAB_LIB_WIDEN);
-
   label = gen_label_rtx ();
-  emit_cmp_and_jump_insns (op1, const0_rtx, EQ, NULL_RTX, imode, 1, label);
+  emit_cmp_and_jump_insns (sign, const0_rtx, EQ, NULL_RTX, imode, 1, label);
 
   if (GET_CODE (op0) == CONST_DOUBLE)
     op0 = simplify_unary_operation (NEG, mode, op0, mode);
@@ -5585,6 +5601,7 @@ init_optabs (void)
   tan_optab = init_optab (UNKNOWN);
   atan_optab = init_optab (UNKNOWN);
   copysign_optab = init_optab (UNKNOWN);
+  signbit_optab = init_optab (UNKNOWN);
 
   isinf_optab = init_optab (UNKNOWN);
 
@@ -5655,7 +5672,6 @@ init_optabs (void)
   for (i = 0; i < NUM_MACHINE_MODES; i++)
     {
       movmem_optab[i] = CODE_FOR_nothing;
-      signbit_optab[i] = CODE_FOR_nothing;
       cmpstr_optab[i] = CODE_FOR_nothing;
       cmpstrn_optab[i] = CODE_FOR_nothing;
       cmpmem_optab[i] = CODE_FOR_nothing;
Index: optabs.h
===================================================================
--- optabs.h	(revision 126808)
+++ optabs.h	(working copy)
@@ -219,7 +219,8 @@ enum optab_index
   OTI_atan,
   /* Copy sign */
   OTI_copysign,
-
+  /* Signbit */
+  OTI_signbit,
   /* Test for infinite value */
   OTI_isinf,
 
@@ -409,7 +410,7 @@ extern GTY(()) optab optab_table[OTI_MAX
 #define tan_optab (optab_table[OTI_tan])
 #define atan_optab (optab_table[OTI_atan])
 #define copysign_optab (optab_table[OTI_copysign])
-
+#define signbit_optab (optab_table[OTI_signbit])
 #define isinf_optab (optab_table[OTI_isinf])
 
 #define cmp_optab (optab_table[OTI_cmp])
@@ -553,9 +554,6 @@ extern enum insn_code vcondu_gen_code[NU
 /* This array records the insn_code of insns to perform block moves.  */
 extern enum insn_code movmem_optab[NUM_MACHINE_MODES];
 
-/* This array records the insn_code of insns to implement the signbit function.  */
-extern enum insn_code signbit_optab[NUM_MACHINE_MODES];
-
 /* This array records the insn_code of insns to perform block sets.  */
 extern enum insn_code setmem_optab[NUM_MACHINE_MODES];
 
Index: genopinit.c
===================================================================
--- genopinit.c	(revision 126808)
+++ genopinit.c	(working copy)
@@ -122,6 +122,7 @@ static const char * const optabs[] =
     abs_optab->handlers[$A].insn_code = CODE_FOR_$(abs$F$a2$)",
   "absv_optab->handlers[$A].insn_code = CODE_FOR_$(absv$I$a2$)",
   "copysign_optab->handlers[$A].insn_code = CODE_FOR_$(copysign$F$a3$)",
+  "signbit_optab->handlers[$A].insn_code = CODE_FOR_$(signbit$F$a2$)",
   "isinf_optab->handlers[$A].insn_code = CODE_FOR_$(isinf$a2$)",
   "sqrt_optab->handlers[$A].insn_code = CODE_FOR_$(sqrt$a2$)",
   "floor_optab->handlers[$A].insn_code = CODE_FOR_$(floor$a2$)",
@@ -177,7 +178,6 @@ static const char * const optabs[] =
   "push_optab->handlers[$A].insn_code = CODE_FOR_$(push$a1$)",
   "reload_in_optab[$A] = CODE_FOR_$(reload_in$a$)",
   "reload_out_optab[$A] = CODE_FOR_$(reload_out$a$)",
-  "signbit_optab[$A] = CODE_FOR_$(signbit$F$a2$)",
   "movmem_optab[$A] = CODE_FOR_$(movmem$a$)",
   "cmpstr_optab[$A] = CODE_FOR_$(cmpstr$a$)",
   "cmpstrn_optab[$A] = CODE_FOR_$(cmpstrn$a$)",
Index: builtins.c
===================================================================
--- builtins.c	(revision 126808)
+++ builtins.c	(working copy)
@@ -240,11 +240,6 @@ static tree do_mpfr_remquo (tree, tree, 
 static tree do_mpfr_lgamma_r (tree, tree, tree);
 #endif
 
-/* This array records the insn_code of insns to imlement the signbit
-   function.  */
-enum insn_code signbit_optab[NUM_MACHINE_MODES];
-
-
 /* Return true if NODE should be considered for inline expansion regardless
    of the optimization level.  This means whenever a function is invoked with
    its "internal" name, which normally contains the prefix "__builtin".  */
@@ -5725,7 +5720,7 @@ expand_builtin_signbit (tree exp, rtx ta
   HOST_WIDE_INT hi, lo;
   tree arg;
   int word, bitpos;
-  enum insn_code signbit_insn_code;
+  enum insn_code icode;
   rtx temp;
 
   if (!validate_arglist (exp, REAL_TYPE, VOID_TYPE))
@@ -5743,11 +5738,11 @@ expand_builtin_signbit (tree exp, rtx ta
 
   /* Check if the back end provides an insn that handles signbit for the
      argument's mode. */
-  signbit_insn_code = signbit_optab [(int) fmode];
-  if (signbit_insn_code != CODE_FOR_nothing)
+  icode = signbit_optab->handlers [(int) fmode].insn_code;
+  if (icode != CODE_FOR_nothing)
     {
       target = gen_reg_rtx (TYPE_MODE (TREE_TYPE (exp)));
-      emit_unop_insn (signbit_insn_code, target, temp, UNKNOWN);
+      emit_unop_insn (icode, target, temp, UNKNOWN);
       return target;
     }
 
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 126808)
+++ config/i386/i386.md	(working copy)
@@ -17990,8 +17990,7 @@
    (use (match_operand:X87MODEF 1 "register_operand" ""))]
   "TARGET_USE_FANCY_MATH_387
    && TARGET_C99_FUNCTIONS
-   && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)
-       || TARGET_MIX_SSE_I387)"
+   && !(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
 {
   rtx mask = GEN_INT (0x45);
   rtx val = GEN_INT (0x05);
@@ -18012,6 +18011,20 @@
   DONE;
 })
 
+(define_expand "signbit<mode>2"
+  [(use (match_operand:SI 0 "register_operand" ""))
+   (use (match_operand:X87MODEF 1 "register_operand" ""))]
+  "TARGET_USE_FANCY_MATH_387
+   && !(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
+{
+  rtx mask = GEN_INT (0x0200);
+
+  rtx scratch = gen_reg_rtx (HImode);
+
+  emit_insn (gen_fxam<mode>2_i387 (scratch, operands[1]));
+  emit_insn (gen_andsi3 (operands[0], gen_lowpart (SImode, scratch), mask));
+  DONE;
+})
 \f
 ;; Block operation instructions
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH, middle-end, i386]: Use signbit insn in copysing insn and implement signbit using fxam x87 insn
  2007-07-21 10:40 [PATCH, middle-end, i386]: Use signbit insn in copysing insn and implement signbit using fxam x87 insn Uros Bizjak
@ 2007-07-21 20:08 ` Richard Guenther
  2007-07-26 16:17 ` Wolfgang Gellerich
  1 sibling, 0 replies; 3+ messages in thread
From: Richard Guenther @ 2007-07-21 20:08 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Wolfgang Gellerich

On 7/21/07, Uros Bizjak <ubizjak@gmail.com> wrote:
> Hello!
>
> This patch implements signbit built-in using x87 fxam instruction for
> -mfpmath=387. Using this approach, partial memory stalls when using bit
> operations on FP value are avoided (see chapters 14.7 and 15.8 of
> "Pentium optimization guide" for further information on this topic).
>
> I also took this opportunity and optimize copysign expander when using
> abs/neg approach. Sign bit of the input operand can be determined using
> signbit insn if available, further reducing number of partial memory
> stalls. The patch also changes signbit handling a bit (to be more
> consistent with other builtins). s390 target (which is currently the
> only user of signbit optab) _could_ also be affected by or benefit from
> this patch.
>
> Currently, following example:
>
> --cut here--
> int test2 (double x)
> {
>   return signbit (x+2.0);
> }
> --cut here
>
> compiles into:
>
> test2:
>         subl    $12, %esp
>         flds    .LC0
>         faddl   16(%esp)
>         fstpl   (%esp)
>         movl    4(%esp), %eax
>         addl    $12, %esp
>         andl    $-2147483648, %eax
>         ret
>
> with attached patch, gcc generates:
>
> test2:
>         flds    .LC0
>         faddl   4(%esp)
>         fxam
>         fnstsw  %ax
>         fstp    %st(0)
>         andl    $512, %eax
>         ret
>
> avoiding partial memory stall completely.
>
> In the copysign case:
>
> --cut here--
> double test (double a, double b)
> {
>   return copysign (a, b+2.0);
> }
> --cut here--
>
> unpatched gcc generates:
>
> test:
>         subl    $12, %esp
>         fldl    16(%esp)
>         flds    .LC0
>         faddl   24(%esp)
>         fstpl   (%esp)
>         movl    4(%esp), %eax
>         fabs
>         testl   %eax, %eax
>         jns     .L4
>         fchs
> .L4:
>         addl    $12, %esp
>         ret
>
> and patched gcc generates:
>
> test:
>         fldl    4(%esp)
>         flds    .LC0
>         faddl   12(%esp)
>         fxam
>         fnstsw  %ax
>         fstp    %st(0)
>         testb   $2, %ah
>         fabs
>         je      .L4
>         fchs
> .L4:
>         rep ; ret
>
> SSE math is not affected by this patch.
>
> The patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
> with and without -m32.
>
> OK for mainline? (The patch needs approval for its middle-end part.)
>
> 2007-07-21  Uros Bizjak  <ubizjak@gmail.com>
>
>         * optabs.h (enum optab_index): Add new OTI_signbit.
>         (signbit_optab): Define corresponding macro.
>         (enum insn_code signbit_optab[]): Remove array.
>         * genopinit.c (optabs): Implement scalb_optab using scalb?f3
>         patterns.

I suppose this should say signbit_optab using signbit?f2.

This is ok for mainline.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH, middle-end, i386]: Use signbit insn in copysing insn and  implement signbit using fxam x87 insn
  2007-07-21 10:40 [PATCH, middle-end, i386]: Use signbit insn in copysing insn and implement signbit using fxam x87 insn Uros Bizjak
  2007-07-21 20:08 ` Richard Guenther
@ 2007-07-26 16:17 ` Wolfgang Gellerich
  1 sibling, 0 replies; 3+ messages in thread
From: Wolfgang Gellerich @ 2007-07-26 16:17 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches




Uros Bizjak <ubizjak@gmail.com> wrote on 21.07.2007 12:08:06:

> Hello!
>
> This patch implements signbit built-in using x87 fxam instruction for
> -mfpmath=387. Using this approach, partial memory stalls when using bit
> operations on FP value are avoided (see chapters 14.7 and 15.8 of
> "Pentium optimization guide" for further information on this topic).
>
> I also took this opportunity and optimize copysign expander when using
> abs/neg approach. Sign bit of the input operand can be determined using
> signbit insn if available, further reducing number of partial memory
> stalls. The patch also changes signbit handling a bit (to be more
> consistent with other builtins). s390 target (which is currently the
> only user of signbit optab) _could_ also be affected by or benefit from
> this patch.

This is fine for s390!

Regards, Wolfgang


---
Dr. Wolfgang Gellerich
IBM Deutschland Entwicklung GmbH
Schönaicher Strasse 220
71032 Böblingen, Germany
Tel. +49 / 7031 / 162598
gellerich@de.ibm.com

=======================

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-07-26 15:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-21 10:40 [PATCH, middle-end, i386]: Use signbit insn in copysing insn and implement signbit using fxam x87 insn Uros Bizjak
2007-07-21 20:08 ` Richard Guenther
2007-07-26 16:17 ` Wolfgang Gellerich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).