public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
@ 2014-07-16 20:33 David Edelsohn
  2014-08-01  3:28 ` David Edelsohn
  0 siblings, 1 reply; 24+ messages in thread
From: David Edelsohn @ 2014-07-16 20:33 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: GCC Patches

Adhemerval,

This looks very good. Thanks for helping with the FENV implementation.

I will discuss this with some GIMPLE experts during Cauldron.  I want
to check that the GIMPLE is correct before approving this.

Thanks, David

^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
@ 2014-09-16 21:05 David Edelsohn
  0 siblings, 0 replies; 24+ messages in thread
From: David Edelsohn @ 2014-09-16 21:05 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: GCC Patches

Hi, Adhemerval

I cornered Honza during his visit to IBM Research to help me
understand my concerns with the function.

The code for *hold does:

+  tree fenv_var = create_tmp_var (double_type_node, NULL);
+
+  tree hold_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_var, call_mffs);
+
+  tree fenv_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var);

The code for *clear does:

+  tree fenv_clear = create_tmp_var (double_type_node, NULL);
+
+  tree clear_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_clear,
call_mffs);
+
+  tree fenv_clean_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var);

Note that *clear creates fenv_clear, but uses fenv_var that was
created earlier for *hold (probably should have been fenv_hold) or
something to distinguish it.

The code for *update does:

+  tree old_fenv = create_tmp_var (double_type_node, NULL);
+  tree update_mffs = build2 (MODIFY_EXPR, void_type_node, old_fenv, call_mffs);
+
+  tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, update_mffs);

But it never actually uses old_fenv that it obtained from the call to
call_mffs().

The current implementation is trying to follow the punning and
conversions in the C code, but, according to Honza, it does not need
to.  It should not need the temporary variable nor the MODIFY_EXPR.
The implementation of *update shows that the temporary really is not
necessary because it (accidentally) is not referenced and not used.

The code for *hold and *clear should be converted to the style of
*update, without the temporary. The later instruction selection and
register allocation in GCC should handle the conversion between
double_type and uint64_type through the VIEW_CONVERT_EXPR without an
explicit temporary. The call to mffs can be inserted directly into the
rest of the tree being built (creating a large and ugly tree).

Then the final COMPOUND_EXPR is not needed in any of the cases, as it
already is omitted in the *update case. The accidental omission of a
reference to old_fenv is what allowed it to be omitted from the
*update case, which prompted my original question.

Thanks, David

^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
@ 2014-09-03 14:08 Uros Bizjak
  2014-09-04 17:39 ` Maciej W. Rozycki
  0 siblings, 1 reply; 24+ messages in thread
From: Uros Bizjak @ 2014-09-03 14:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: Maciej W. Rozycki, Adhemerval Zanella

Hello!

> While at it, may I propose another change on top of this?
>
> I've noticed the test case is rather slow, it certainly takes much more
> time than the average one, I've seen elapsed times of well over a minute
> on reasonably fast hardware and occasionally a timeout midway through even
> though the test case was otherwise progressing just fine.  I think lock
> contention or unrelated system activity such as hardware interrupts (think
> a busy network!) may contribute to it for systems using LL/SC loops for
> atomicity.
>
>  So I think the default timeout that's used for really quick tests should
> be extended a bit.  I propose a factor of 2, just not to make it too
> excessive, at least for the beginning (maybe it'll have to be higher
> eventually).

Or you can just lower the iteration count as I have to do for alpha.

Uros.

^ permalink raw reply	[flat|nested] 24+ messages in thread
* [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
@ 2014-07-03 21:09 Adhemerval Zanella
  2014-07-16 18:41 ` Adhemerval Zanella
  2014-07-31  1:43 ` Joseph S. Myers
  0 siblings, 2 replies; 24+ messages in thread
From: Adhemerval Zanella @ 2014-07-03 21:09 UTC (permalink / raw)
  To: gcc-patches

This patch implements the TARGET_ATOMIC_ASSIGN_EXPAND_FENV for
powerpc-fpu. I have to adjust current c11-atomic-exec-5 testcase
because for IBM long double 0 += LDBL_MAX might generate 
overflow/underflow in internal __gcc_qadd calculations.

The c11-atomic-exec-5 now passes for linux/powerpc, checked on
powerpc32-linux-fpu, powerpc64-linux, and powerpc64le-linux.

--

2014-07-03  Adhemerval Zanella  <azanella@linux.vnet.ibm.com>

gcc:
	* config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): New
	function.

gcc/testsuite:
	* gcc.dg/atomic/c11-atomic-exec-5.c
	(test_main_long_double_add_overflow): Define and run only for
	LDBL_MANT_DIG != 106.
	(test_main_complex_long_double_add_overflow): Likewise.
	(test_main_long_double_sub_overflow): Likewise.
	(test_main_complex_long_double_sub_overflow): Likewise.

---

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index bf67e72..75a2a45 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1621,6 +1621,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
 
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
+
+#undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
+#define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
 \f
 
 /* Processor table.  */
@@ -32991,6 +32994,105 @@ emit_fusion_gpr_load (rtx *operands)
   return "";
 }
 
+/* Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook.  */
+
+static void
+rs6000_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
+{
+  if (!TARGET_HARD_FLOAT || !TARGET_FPRS)
+    return;
+
+  tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS];
+  tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF];
+  tree call_mffs = build_call_expr (mffs, 0);
+
+  /* Generates the equivalent of feholdexcept (&fenv_var)
+
+     *fenv_var = __builtin_mffs ();
+     double fenv_hold;
+     *(uint64_t*)&fenv_hold = *(uint64_t*)fenv_var & 0xffffffff00000007LL;
+     __builtin_mtfsf (0xff, fenv_hold);  */
+
+  /* Mask to clear everything except for the rounding modes and non-IEEE
+     arithmetic flag.  */
+  const unsigned HOST_WIDE_INT hold_exception_mask =
+    HOST_WIDE_INT_C (0xffffffff00000007);
+
+  tree fenv_var = create_tmp_var (double_type_node, NULL);
+
+  tree hold_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_var, call_mffs);
+
+  tree fenv_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var);
+  tree fenv_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, fenv_llu,
+    build_int_cst (uint64_type_node, hold_exception_mask));
+
+  tree fenv_mtfsf = build1 (VIEW_CONVERT_EXPR, double_type_node, fenv_llu_and);
+
+  tree hold_mtfsf = build_call_expr (mtfsf, 2,
+    build_int_cst (unsigned_type_node, 0xff), fenv_mtfsf);
+
+  *hold = build2 (COMPOUND_EXPR, void_type_node, hold_mffs, hold_mtfsf);
+
+  /* Generates the equivalent of feclearexcept (FE_ALL_EXCEPT):
+
+     double fenv_clear = __builtin_mffs ();
+     *(uint64_t)&fenv_clear &= 0xffffffff00000000LL;
+     __builtin_mtfsf (0xff, fenv_clear);  */
+
+  /* Mask to clear everything except for the rounding modes and non-IEEE
+     arithmetic flag.  */
+  const unsigned HOST_WIDE_INT clear_exception_mask =
+    HOST_WIDE_INT_UC (0xffffffff00000000);
+
+  tree fenv_clear = create_tmp_var (double_type_node, NULL);
+
+  tree clear_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_clear, call_mffs);
+
+  tree fenv_clean_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var);
+  tree fenv_clear_llu_and = build2 (BIT_AND_EXPR, uint64_type_node,
+    fenv_clean_llu, build_int_cst (uint64_type_node, clear_exception_mask));
+
+  tree fenv_clear_mtfsf = build1 (VIEW_CONVERT_EXPR, double_type_node,
+    fenv_clear_llu_and);
+
+  tree clear_mtfsf = build_call_expr (mtfsf, 2,
+    build_int_cst (unsigned_type_node, 0xff), fenv_clear_mtfsf);
+
+  *clear = build2 (COMPOUND_EXPR, void_type_node, clear_mffs, clear_mtfsf);
+
+  /* Generates the equivalent of feupdateenv (&fenv_var)
+
+     double old_fenv = __builtin_mffs ();
+     double fenv_update;
+     *(uint64_t*)&fenv_update = (*(uint64_t*)&old & 0xffffffff1fffff00LL) |
+                                (*(uint64_t*)fenv_var 0x1ff80fff);
+     __builtin_mtfsf (0xff, fenv_update);  */
+
+  const unsigned HOST_WIDE_INT update_exception_mask =
+    HOST_WIDE_INT_UC (0xffffffff1fffff00);
+  const unsigned HOST_WIDE_INT new_exception_mask =
+    HOST_WIDE_INT_UC (0x1ff80fff);
+
+  tree old_fenv = create_tmp_var (double_type_node, NULL);
+  tree update_mffs = build2 (MODIFY_EXPR, void_type_node, old_fenv, call_mffs);
+
+  tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, update_mffs);
+  tree old_llu_and = build2 (BIT_AND_EXPR, uint64_type_node,
+    old_llu, build_int_cst (uint64_type_node, update_exception_mask));
+
+  tree new_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, fenv_llu,
+    build_int_cst (uint64_type_node, new_exception_mask));
+
+  tree new_llu_mask = build2 (BIT_IOR_EXPR, uint64_type_node,
+    old_llu_and, new_llu_and);
+
+  tree fenv_mtfsf_update = build1 (VIEW_CONVERT_EXPR, double_type_node,
+    new_llu_mask);
+  
+  *update = build_call_expr (mtfsf, 2,
+    build_int_cst (unsigned_type_node, 0xff), fenv_mtfsf_update);
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c b/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
index bc87de4..0a2c9c4 100644
--- a/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
+++ b/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
@@ -325,11 +325,11 @@ TEST_FUNCS (complex_double_div_overflow, _Complex double, , /= DBL_MIN, 0,
 TEST_FUNCS (long_double_add_invalid, long double, , += __builtin_infl (), 0,
 	    0, __builtin_isinf, 0,
 	    -__builtin_infl (), FE_INVALID)
+#if LDBL_MANT_DIG != 106
 TEST_FUNCS (long_double_add_overflow, long double, , += LDBL_MAX, 0,
 	    LDBL_MAX, __builtin_isinf, FE_OVERFLOW | FE_INEXACT,
 	    0, 0)
 #define NOT_LDBL_EPSILON_2(X) ((X) != LDBL_EPSILON / 2)
-#if LDBL_MANT_DIG != 106
 TEST_FUNCS (long_double_add_inexact, long double, , += LDBL_EPSILON / 2, 0,
 	    1.0L, NOT_LDBL_EPSILON_2, FE_INEXACT,
 	    0, 0)
@@ -342,18 +342,18 @@ TEST_FUNCS (long_double_preinc_inexact, long double, ++, , 0,
 TEST_FUNCS (long_double_postinc_inexact, long double, , ++, 0,
 	    LDBL_EPSILON / 2, NOT_MINUS_1, FE_INEXACT,
 	    -1, 0)
-#endif
 TEST_FUNCS (complex_long_double_add_overflow, _Complex long double, , += LDBL_MAX, 0,
 	    LDBL_MAX, REAL_ISINF, FE_OVERFLOW | FE_INEXACT,
 	    0, 0)
+#endif
 TEST_FUNCS (long_double_sub_invalid, long double, , -= __builtin_infl (), 0,
 	    0, __builtin_isinf, 0,
 	    __builtin_infl (), FE_INVALID)
+#if LDBL_MANT_DIG != 106
 TEST_FUNCS (long_double_sub_overflow, long double, , -= LDBL_MAX, 0,
 	    -LDBL_MAX, __builtin_isinf, FE_OVERFLOW | FE_INEXACT,
 	    0, 0)
 #define NOT_MINUS_LDBL_EPSILON_2(X) ((X) != -LDBL_EPSILON / 2)
-#if LDBL_MANT_DIG != 106
 TEST_FUNCS (long_double_sub_inexact, long double, , -= LDBL_EPSILON / 2, 0,
 	    -1.0L, NOT_MINUS_LDBL_EPSILON_2, FE_INEXACT,
 	    0, 0)
@@ -366,10 +366,10 @@ TEST_FUNCS (long_double_predec_inexact, long double, --, , 0,
 TEST_FUNCS (long_double_postdec_inexact, long double, , --, 0,
 	    -LDBL_EPSILON / 2, NOT_1, FE_INEXACT,
 	    1, 0)
-#endif
 TEST_FUNCS (complex_long_double_sub_overflow, _Complex long double, , -= LDBL_MAX, 0,
 	    -LDBL_MAX, REAL_ISINF, FE_OVERFLOW | FE_INEXACT,
 	    0, 0)
+#endif
 TEST_FUNCS (long_double_mul_invalid, long double, , *= __builtin_infl (), 0,
 	    __builtin_infl (), __builtin_isinf, 0,
 	    0, FE_INVALID)
@@ -501,23 +501,23 @@ main (void)
   ret |= test_main_int_div_double_inexact ();
   ret |= test_main_complex_double_div_overflow ();
   ret |= test_main_long_double_add_invalid ();
-  ret |= test_main_long_double_add_overflow ();
 #if LDBL_MANT_DIG != 106
+  ret |= test_main_long_double_add_overflow ();
   ret |= test_main_long_double_add_inexact ();
   ret |= test_main_long_double_add_inexact_int ();
   ret |= test_main_long_double_preinc_inexact ();
   ret |= test_main_long_double_postinc_inexact ();
-#endif
   ret |= test_main_complex_long_double_add_overflow ();
+#endif
   ret |= test_main_long_double_sub_invalid ();
-  ret |= test_main_long_double_sub_overflow ();
 #if LDBL_MANT_DIG != 106
+  ret |= test_main_long_double_sub_overflow ();
   ret |= test_main_long_double_sub_inexact ();
   ret |= test_main_long_double_sub_inexact_int ();
   ret |= test_main_long_double_predec_inexact ();
   ret |= test_main_long_double_postdec_inexact ();
-#endif
   ret |= test_main_complex_long_double_sub_overflow ();
+#endif
   ret |= test_main_long_double_mul_invalid ();
   ret |= test_main_long_double_mul_overflow ();
   ret |= test_main_long_double_mul_overflow_float ();

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-11-18 16:34 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-16 20:33 [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV David Edelsohn
2014-08-01  3:28 ` David Edelsohn
2014-08-01 15:31   ` Joseph S. Myers
2014-08-06 20:21     ` Adhemerval Zanella
2014-08-19 16:54       ` Adhemerval Zanella
2014-09-02 22:23         ` Adhemerval Zanella
2014-09-03 14:01           ` Maciej W. Rozycki
2014-09-03 15:49             ` Joseph S. Myers
2014-09-04 18:40             ` Adhemerval Zanella
2014-09-15 14:38               ` Maciej W. Rozycki
2014-10-20 17:18                 ` [PING][PATCH] GCC/test: Set timeout factor for c11-atomic-exec-5.c Maciej W. Rozycki
2014-10-21  0:26                   ` David Edelsohn
2014-10-21  1:49                     ` Joseph S. Myers
2014-10-21  2:15                       ` David Edelsohn
2014-10-21 23:03                         ` Maciej W. Rozycki
2014-11-14 21:02                   ` [PING^2][PATCH] " Maciej W. Rozycki
2014-11-17 10:06                     ` Mike Stump
2014-11-18 16:48                       ` Maciej W. Rozycki
  -- strict thread matches above, loose matches on Subject: below --
2014-09-16 21:05 [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV David Edelsohn
2014-09-03 14:08 Uros Bizjak
2014-09-04 17:39 ` Maciej W. Rozycki
2014-07-03 21:09 Adhemerval Zanella
2014-07-16 18:41 ` Adhemerval Zanella
2014-07-31  1:43 ` Joseph S. Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).