public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
@ 2017-12-15 11:57 Sudakshina Das
  2018-01-05 11:47 ` [PATCH PR81647][AARCH64] PING " Sudakshina Das
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Sudakshina Das @ 2017-12-15 11:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, James Greenhalgh, Richard Earnshaw, Marcus Shawcroft

[-- Attachment #1: Type: text/plain, Size: 1550 bytes --]

Hi

This patch fixes the inconsistent behavior observed at -O3 for the 
unordered comparisons. According to the online docs 
(https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), 
all of the following should not raise an FP exception:
- UNGE_EXPR
- UNGT_EXPR
- UNLE_EXPR
- UNLT_EXPR
- UNEQ_EXPR
Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.

The aarch64-simd.md handling of these were generating exception raising 
instructions such as fcmgt. This patch changes the instructions that are 
emitted to in order to not give out the exceptions. We first check each 
operand for NaNs and force any elements containing NaN to zero before 
using them in the compare.

Example: UN<cc> (a, b) -> UNORDERED (a, b) | (cm<cc> (isnan (a) ? 0.0 : 
a, isnan (b) ? 0.0 : b))


The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and 
UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).

Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and added a new test case.

Is this ok for trunk? This will probably need a back-port to 
gcc-7-branch as well.

Thanks
Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-15  Sudakshina Das  <sudi.das@arm.com>

	PR target/81647
	* config/aarch64/aarch64-simd.md (vec_cmp<mode><v_int_equiv>): Modify 
instructions for
	UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.

*** gcc/testsuite/ChangeLog ***

2017-12-15  Sudakshina Das  <sudi.das@arm.com>

	PR target/81647
	* gcc.target/aarch64/pr81647.c: New.

[-- Attachment #2: rb8660.patch --]
[-- Type: text/x-patch, Size: 7037 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2731,10 +2731,10 @@
 	  break;
 	}
       /* Fall through.  */
-    case UNGE:
+    case UNLT:
       std::swap (operands[2], operands[3]);
       /* Fall through.  */
-    case UNLE:
+    case UNGT:
     case GT:
       comparison = gen_aarch64_cmgt<mode>;
       break;
@@ -2745,10 +2745,10 @@
 	  break;
 	}
       /* Fall through.  */
-    case UNGT:
+    case UNLE:
       std::swap (operands[2], operands[3]);
       /* Fall through.  */
-    case UNLT:
+    case UNGE:
     case GE:
       comparison = gen_aarch64_cmge<mode>;
       break;
@@ -2771,21 +2771,35 @@
     case UNGT:
     case UNLE:
     case UNLT:
-    case NE:
-      /* FCM returns false for lanes which are unordered, so if we use
-	 the inverse of the comparison we actually want to emit, then
-	 invert the result, we will end up with the correct result.
-	 Note that a NE NaN and NaN NE b are true for all a, b.
-
-	 Our transformations are:
-	 a UNGE b -> !(b GT a)
-	 a UNGT b -> !(b GE a)
-	 a UNLE b -> !(a GT b)
-	 a UNLT b -> !(a GE b)
-	 a   NE b -> !(a EQ b)  */
-      gcc_assert (comparison != NULL);
-      emit_insn (comparison (operands[0], operands[2], operands[3]));
-      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
+      {
+	/* All of the above must not raise any FP exceptions.  Thus we first
+	   check each operand for NaNs and force any elements containing NaN to
+	   zero before using them in the compare.
+	   Example: UN<cc> (a, b) -> UNORDERED (a, b) |
+				     (cm<cc> (isnan (a) ? 0.0 : a,
+					      isnan (b) ? 0.0 : b))
+	   We use the following transformations for doing the comparisions:
+	   a UNGE b -> a GE b
+	   a UNGT b -> a GT b
+	   a UNLE b -> b GE a
+	   a UNLT b -> b GT a.  */
+
+	rtx tmp0 = gen_reg_rtx (<V_INT_EQUIV>mode);
+	rtx tmp1 = gen_reg_rtx (<V_INT_EQUIV>mode);
+	rtx tmp2 = gen_reg_rtx (<V_INT_EQUIV>mode);
+	emit_insn (gen_aarch64_cmeq<mode> (tmp0, operands[2], operands[2]));
+	emit_insn (gen_aarch64_cmeq<mode> (tmp1, operands[3], operands[3]));
+	emit_insn (gen_and<v_int_equiv>3 (tmp2, tmp0, tmp1));
+	emit_insn (gen_and<v_int_equiv>3 (tmp0, tmp0,
+			lowpart_subreg (<V_INT_EQUIV>mode, operands[2], <MODE>mode)));
+	emit_insn (gen_and<v_int_equiv>3 (tmp1, tmp1,
+			lowpart_subreg (<V_INT_EQUIV>mode, operands[3], <MODE>mode)));
+	gcc_assert (comparison != NULL);
+	emit_insn (comparison (operands[0],
+			       lowpart_subreg (<MODE>mode, tmp0, <V_INT_EQUIV>mode),
+			       lowpart_subreg (<MODE>mode, tmp1, <V_INT_EQUIV>mode)));
+	emit_insn (gen_orn<v_int_equiv>3 (operands[0], tmp2, operands[0]));
+      }
       break;
 
     case LT:
@@ -2793,25 +2807,19 @@
     case GT:
     case GE:
     case EQ:
+    case NE:
       /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
 	 As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations are:
 	 a GE b -> a GE b
 	 a GT b -> a GT b
 	 a LE b -> b GE a
 	 a LT b -> b GT a
-	 a EQ b -> a EQ b  */
+	 a EQ b -> a EQ b
+	 a NE b -> ~(a EQ b)  */
       gcc_assert (comparison != NULL);
       emit_insn (comparison (operands[0], operands[2], operands[3]));
-      break;
-
-    case UNEQ:
-      /* We first check (a > b ||  b > a) which is !UNEQ, inverting
-	 this result will then give us (a == b || a UNORDERED b).  */
-      emit_insn (gen_aarch64_cmgt<mode> (operands[0],
-					 operands[2], operands[3]));
-      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
-      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
-      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
+      if (code == NE)
+	emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
       break;
 
     case LTGT:
@@ -2823,21 +2831,22 @@
       emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
       break;
 
-    case UNORDERED:
-      /* Operands are ORDERED iff (a > b || b >= a), so we can compute
-	 UNORDERED as !ORDERED.  */
-      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2], operands[3]));
-      emit_insn (gen_aarch64_cmge<mode> (operands[0],
-					 operands[3], operands[2]));
-      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
-      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
-      break;
-
     case ORDERED:
-      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2], operands[3]));
-      emit_insn (gen_aarch64_cmge<mode> (operands[0],
-					 operands[3], operands[2]));
-      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
+    case UNORDERED:
+    case UNEQ:
+      /* cmeq (a, a) & cmeq (b, b).  */
+      emit_insn (gen_aarch64_cmeq<mode> (operands[0],
+					 operands[2], operands[2]));
+      emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[3], operands[3]));
+      emit_insn (gen_and<v_int_equiv>3 (operands[0], operands[0], tmp));
+
+      if (code == UNORDERED)
+	emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
+      else if (code == UNEQ)
+	{
+	  emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[2], operands[3]));
+	  emit_insn (gen_orn<v_int_equiv>3 (operands[0], operands[0], tmp));
+	}
       break;
 
     default:
diff --git a/gcc/testsuite/gcc.target/aarch64/pr81647.c b/gcc/testsuite/gcc.target/aarch64/pr81647.c
new file mode 100644
index 0000000000000000000000000000000000000000..f60dfba49d538e3b2164b11392ab8dbfdba6546e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr81647.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fdump-tree-ssa" } */
+
+#include <fenv.h>
+
+double x[28], y[28];
+int res[28];
+
+int
+main (void)
+{
+  int i;
+  for (i = 0; i < 28; ++i)
+    {
+      x[i] = __builtin_nan ("");
+      y[i] = i;
+    }
+  __asm__ volatile ("" ::: "memory");
+  feclearexcept (FE_ALL_EXCEPT);
+  for (i = 0; i < 4; ++i)
+    res[i] = __builtin_isgreater (x[i], y[i]);
+  for (i = 4; i < 8; ++i)
+    res[i] = __builtin_isgreaterequal (x[i], y[i]);
+  for (i = 8; i < 12; ++i)
+    res[i] = __builtin_isless (x[i], y[i]);
+  for (i = 12; i < 16; ++i)
+    res[i] = __builtin_islessequal (x[i], y[i]);
+  for (i = 16; i < 20; ++i)
+    res[i] = __builtin_islessgreater (x[i], y[i]);
+  for (i = 20; i < 24; ++i)
+    res[i] = __builtin_isunordered (x[i], y[i]);
+  for (i = 24; i < 28; ++i)
+    res[i] = !(__builtin_isunordered (x[i], y[i]));
+  __asm__ volatile ("" ::: "memory");
+  return fetestexcept (FE_ALL_EXCEPT) != 0;
+}
+
+/* { dg-final { scan-tree-dump " u> " "ssa" } } */
+/* { dg-final { scan-tree-dump " u>= " "ssa" } } */
+/* { dg-final { scan-tree-dump " u< " "ssa" } } */
+/* { dg-final { scan-tree-dump " u<= " "ssa" } } */
+/* { dg-final { scan-tree-dump " u== " "ssa" } } */
+/* { dg-final { scan-tree-dump " unord " "ssa" } } */
+/* { dg-final { scan-tree-dump " ord " "ssa" } } */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR81647][AARCH64] PING Fix handling of Unordered Comparisons in aarch64-simd.md
  2017-12-15 11:57 [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md Sudakshina Das
@ 2018-01-05 11:47 ` Sudakshina Das
  2018-02-16 10:54 ` [PATCH PR81647][AARCH64][PING] " Sudakshina Das
  2018-03-19 14:32 ` [PATCH PR81647][AARCH64] " James Greenhalgh
  2 siblings, 0 replies; 8+ messages in thread
From: Sudakshina Das @ 2018-01-05 11:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, James Greenhalgh, Richard Earnshaw, Marcus Shawcroft

PING

On 15/12/17 11:57, Sudakshina Das wrote:
> Hi
> 
> This patch fixes the inconsistent behavior observed at -O3 for the 
> unordered comparisons. According to the online docs 
> (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), 
> all of the following should not raise an FP exception:
> - UNGE_EXPR
> - UNGT_EXPR
> - UNLE_EXPR
> - UNLT_EXPR
> - UNEQ_EXPR
> Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.
> 
> The aarch64-simd.md handling of these were generating exception raising 
> instructions such as fcmgt. This patch changes the instructions that are 
> emitted to in order to not give out the exceptions. We first check each 
> operand for NaNs and force any elements containing NaN to zero before 
> using them in the compare.
> 
> Example: UN<cc> (a, b) -> UNORDERED (a, b) | (cm<cc> (isnan (a) ? 0.0 : 
> a, isnan (b) ? 0.0 : b))
> 
> 
> The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and 
> UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).
> 
> Testing done: Checked for regressions on bootstrapped 
> aarch64-none-linux-gnu and added a new test case.
> 
> Is this ok for trunk? This will probably need a back-port to 
> gcc-7-branch as well.
> 
> Thanks
> Sudi
> 
> ChangeLog Entries:
> 
> *** gcc/ChangeLog ***
> 
> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
> 
>      PR target/81647
>      * config/aarch64/aarch64-simd.md (vec_cmp<mode><v_int_equiv>): 
> Modify instructions for
>      UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
> 
>      PR target/81647
>      * gcc.target/aarch64/pr81647.c: New.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR81647][AARCH64][PING] Fix handling of Unordered Comparisons in aarch64-simd.md
  2017-12-15 11:57 [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md Sudakshina Das
  2018-01-05 11:47 ` [PATCH PR81647][AARCH64] PING " Sudakshina Das
@ 2018-02-16 10:54 ` Sudakshina Das
  2018-03-19 14:32 ` [PATCH PR81647][AARCH64] " James Greenhalgh
  2 siblings, 0 replies; 8+ messages in thread
From: Sudakshina Das @ 2018-02-16 10:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, James Greenhalgh, Richard Earnshaw, Marcus Shawcroft

PING

On 15/12/17 11:57, Sudakshina Das wrote:
> Hi
> 
> This patch fixes the inconsistent behavior observed at -O3 for the 
> unordered comparisons. According to the online docs 
> (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), 
> all of the following should not raise an FP exception:
> - UNGE_EXPR
> - UNGT_EXPR
> - UNLE_EXPR
> - UNLT_EXPR
> - UNEQ_EXPR
> Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.
> 
> The aarch64-simd.md handling of these were generating exception raising 
> instructions such as fcmgt. This patch changes the instructions that are 
> emitted to in order to not give out the exceptions. We first check each 
> operand for NaNs and force any elements containing NaN to zero before 
> using them in the compare.
> 
> Example: UN<cc> (a, b) -> UNORDERED (a, b) | (cm<cc> (isnan (a) ? 0.0 : 
> a, isnan (b) ? 0.0 : b))
> 
> 
> The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and 
> UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).
> 
> Testing done: Checked for regressions on bootstrapped 
> aarch64-none-linux-gnu and added a new test case.
> 
> Is this ok for trunk? This will probably need a back-port to 
> gcc-7-branch as well.
> 
> Thanks
> Sudi
> 
> ChangeLog Entries:
> 
> *** gcc/ChangeLog ***
> 
> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
> 
>      PR target/81647
>      * config/aarch64/aarch64-simd.md (vec_cmp<mode><v_int_equiv>): 
> Modify instructions for
>      UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
> 
>      PR target/81647
>      * gcc.target/aarch64/pr81647.c: New.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
  2017-12-15 11:57 [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md Sudakshina Das
  2018-01-05 11:47 ` [PATCH PR81647][AARCH64] PING " Sudakshina Das
  2018-02-16 10:54 ` [PATCH PR81647][AARCH64][PING] " Sudakshina Das
@ 2018-03-19 14:32 ` James Greenhalgh
  2018-03-19 19:31   ` Sudakshina Das
  2 siblings, 1 reply; 8+ messages in thread
From: James Greenhalgh @ 2018-03-19 14:32 UTC (permalink / raw)
  To: Sudi Das; +Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft

On Fri, Dec 15, 2017 at 11:57:46AM +0000, Sudi Das wrote:
> Hi
> 
> This patch fixes the inconsistent behavior observed at -O3 for the 
> unordered comparisons. According to the online docs 
> (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), 
> all of the following should not raise an FP exception:
> - UNGE_EXPR
> - UNGT_EXPR
> - UNLE_EXPR
> - UNLT_EXPR
> - UNEQ_EXPR
> Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.
> 
> The aarch64-simd.md handling of these were generating exception raising 
> instructions such as fcmgt. This patch changes the instructions that are 
> emitted to in order to not give out the exceptions. We first check each 
> operand for NaNs and force any elements containing NaN to zero before 
> using them in the compare.
> 
> Example: UN<cc> (a, b) -> UNORDERED (a, b) | (cm<cc> (isnan (a) ? 0.0 : 
> a, isnan (b) ? 0.0 : b))
> 
> 
> The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and 
> UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).
> 
> Testing done: Checked for regressions on bootstrapped 
> aarch64-none-linux-gnu and added a new test case.
> 
> Is this ok for trunk? This will probably need a back-port to 
> gcc-7-branch as well.

OK.

Let it soak on trunk for a while before the backport.

Thanks,
James

> ChangeLog Entries:
> 
> *** gcc/ChangeLog ***
> 
> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
> 
> 	PR target/81647
> 	* config/aarch64/aarch64-simd.md (vec_cmp<mode><v_int_equiv>): Modify 
> instructions for
> 	UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
> 
> 	PR target/81647
> 	* gcc.target/aarch64/pr81647.c: New.

> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2731,10 +2731,10 @@
>  	  break;
>  	}
>        /* Fall through.  */
> -    case UNGE:
> +    case UNLT:
>        std::swap (operands[2], operands[3]);
>        /* Fall through.  */
> -    case UNLE:
> +    case UNGT:
>      case GT:
>        comparison = gen_aarch64_cmgt<mode>;
>        break;
> @@ -2745,10 +2745,10 @@
>  	  break;
>  	}
>        /* Fall through.  */
> -    case UNGT:
> +    case UNLE:
>        std::swap (operands[2], operands[3]);
>        /* Fall through.  */
> -    case UNLT:
> +    case UNGE:
>      case GE:
>        comparison = gen_aarch64_cmge<mode>;
>        break;
> @@ -2771,21 +2771,35 @@
>      case UNGT:
>      case UNLE:
>      case UNLT:
> -    case NE:
> -      /* FCM returns false for lanes which are unordered, so if we use
> -	 the inverse of the comparison we actually want to emit, then
> -	 invert the result, we will end up with the correct result.
> -	 Note that a NE NaN and NaN NE b are true for all a, b.
> -
> -	 Our transformations are:
> -	 a UNGE b -> !(b GT a)
> -	 a UNGT b -> !(b GE a)
> -	 a UNLE b -> !(a GT b)
> -	 a UNLT b -> !(a GE b)
> -	 a   NE b -> !(a EQ b)  */
> -      gcc_assert (comparison != NULL);
> -      emit_insn (comparison (operands[0], operands[2], operands[3]));
> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
> +      {
> +	/* All of the above must not raise any FP exceptions.  Thus we first
> +	   check each operand for NaNs and force any elements containing NaN to
> +	   zero before using them in the compare.
> +	   Example: UN<cc> (a, b) -> UNORDERED (a, b) |
> +				     (cm<cc> (isnan (a) ? 0.0 : a,
> +					      isnan (b) ? 0.0 : b))
> +	   We use the following transformations for doing the comparisions:
> +	   a UNGE b -> a GE b
> +	   a UNGT b -> a GT b
> +	   a UNLE b -> b GE a
> +	   a UNLT b -> b GT a.  */
> +
> +	rtx tmp0 = gen_reg_rtx (<V_INT_EQUIV>mode);
> +	rtx tmp1 = gen_reg_rtx (<V_INT_EQUIV>mode);
> +	rtx tmp2 = gen_reg_rtx (<V_INT_EQUIV>mode);
> +	emit_insn (gen_aarch64_cmeq<mode> (tmp0, operands[2], operands[2]));
> +	emit_insn (gen_aarch64_cmeq<mode> (tmp1, operands[3], operands[3]));
> +	emit_insn (gen_and<v_int_equiv>3 (tmp2, tmp0, tmp1));
> +	emit_insn (gen_and<v_int_equiv>3 (tmp0, tmp0,
> +			lowpart_subreg (<V_INT_EQUIV>mode, operands[2], <MODE>mode)));
> +	emit_insn (gen_and<v_int_equiv>3 (tmp1, tmp1,
> +			lowpart_subreg (<V_INT_EQUIV>mode, operands[3], <MODE>mode)));
> +	gcc_assert (comparison != NULL);
> +	emit_insn (comparison (operands[0],
> +			       lowpart_subreg (<MODE>mode, tmp0, <V_INT_EQUIV>mode),
> +			       lowpart_subreg (<MODE>mode, tmp1, <V_INT_EQUIV>mode)));
> +	emit_insn (gen_orn<v_int_equiv>3 (operands[0], tmp2, operands[0]));
> +      }
>        break;
>  
>      case LT:
> @@ -2793,25 +2807,19 @@
>      case GT:
>      case GE:
>      case EQ:
> +    case NE:
>        /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
>  	 As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations are:
>  	 a GE b -> a GE b
>  	 a GT b -> a GT b
>  	 a LE b -> b GE a
>  	 a LT b -> b GT a
> -	 a EQ b -> a EQ b  */
> +	 a EQ b -> a EQ b
> +	 a NE b -> ~(a EQ b)  */
>        gcc_assert (comparison != NULL);
>        emit_insn (comparison (operands[0], operands[2], operands[3]));
> -      break;
> -
> -    case UNEQ:
> -      /* We first check (a > b ||  b > a) which is !UNEQ, inverting
> -	 this result will then give us (a == b || a UNORDERED b).  */
> -      emit_insn (gen_aarch64_cmgt<mode> (operands[0],
> -					 operands[2], operands[3]));
> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
> +      if (code == NE)
> +	emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>        break;
>  
>      case LTGT:
> @@ -2823,21 +2831,22 @@
>        emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
>        break;
>  
> -    case UNORDERED:
> -      /* Operands are ORDERED iff (a > b || b >= a), so we can compute
> -	 UNORDERED as !ORDERED.  */
> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2], operands[3]));
> -      emit_insn (gen_aarch64_cmge<mode> (operands[0],
> -					 operands[3], operands[2]));
> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
> -      break;
> -
>      case ORDERED:
> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2], operands[3]));
> -      emit_insn (gen_aarch64_cmge<mode> (operands[0],
> -					 operands[3], operands[2]));
> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
> +    case UNORDERED:
> +    case UNEQ:
> +      /* cmeq (a, a) & cmeq (b, b).  */
> +      emit_insn (gen_aarch64_cmeq<mode> (operands[0],
> +					 operands[2], operands[2]));
> +      emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[3], operands[3]));
> +      emit_insn (gen_and<v_int_equiv>3 (operands[0], operands[0], tmp));
> +
> +      if (code == UNORDERED)
> +	emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
> +      else if (code == UNEQ)
> +	{
> +	  emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[2], operands[3]));
> +	  emit_insn (gen_orn<v_int_equiv>3 (operands[0], operands[0], tmp));
> +	}
>        break;
>  
>      default:
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81647.c b/gcc/testsuite/gcc.target/aarch64/pr81647.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f60dfba49d538e3b2164b11392ab8dbfdba6546e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr81647.c
> @@ -0,0 +1,44 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -fdump-tree-ssa" } */
> +
> +#include <fenv.h>
> +
> +double x[28], y[28];
> +int res[28];
> +
> +int
> +main (void)
> +{
> +  int i;
> +  for (i = 0; i < 28; ++i)
> +    {
> +      x[i] = __builtin_nan ("");
> +      y[i] = i;
> +    }
> +  __asm__ volatile ("" ::: "memory");
> +  feclearexcept (FE_ALL_EXCEPT);
> +  for (i = 0; i < 4; ++i)
> +    res[i] = __builtin_isgreater (x[i], y[i]);
> +  for (i = 4; i < 8; ++i)
> +    res[i] = __builtin_isgreaterequal (x[i], y[i]);
> +  for (i = 8; i < 12; ++i)
> +    res[i] = __builtin_isless (x[i], y[i]);
> +  for (i = 12; i < 16; ++i)
> +    res[i] = __builtin_islessequal (x[i], y[i]);
> +  for (i = 16; i < 20; ++i)
> +    res[i] = __builtin_islessgreater (x[i], y[i]);
> +  for (i = 20; i < 24; ++i)
> +    res[i] = __builtin_isunordered (x[i], y[i]);
> +  for (i = 24; i < 28; ++i)
> +    res[i] = !(__builtin_isunordered (x[i], y[i]));
> +  __asm__ volatile ("" ::: "memory");
> +  return fetestexcept (FE_ALL_EXCEPT) != 0;
> +}
> +
> +/* { dg-final { scan-tree-dump " u> " "ssa" } } */
> +/* { dg-final { scan-tree-dump " u>= " "ssa" } } */
> +/* { dg-final { scan-tree-dump " u< " "ssa" } } */
> +/* { dg-final { scan-tree-dump " u<= " "ssa" } } */
> +/* { dg-final { scan-tree-dump " u== " "ssa" } } */
> +/* { dg-final { scan-tree-dump " unord " "ssa" } } */
> +/* { dg-final { scan-tree-dump " ord " "ssa" } } */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
  2018-03-19 14:32 ` [PATCH PR81647][AARCH64] " James Greenhalgh
@ 2018-03-19 19:31   ` Sudakshina Das
  2018-03-20  8:19     ` Christophe Lyon
  0 siblings, 1 reply; 8+ messages in thread
From: Sudakshina Das @ 2018-03-19 19:31 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft

Hi

On 19/03/18 14:29, James Greenhalgh wrote:
> On Fri, Dec 15, 2017 at 11:57:46AM +0000, Sudi Das wrote:
>> Hi
>>
>> This patch fixes the inconsistent behavior observed at -O3 for the
>> unordered comparisons. According to the online docs
>> (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html),
>> all of the following should not raise an FP exception:
>> - UNGE_EXPR
>> - UNGT_EXPR
>> - UNLE_EXPR
>> - UNLT_EXPR
>> - UNEQ_EXPR
>> Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.
>>
>> The aarch64-simd.md handling of these were generating exception raising
>> instructions such as fcmgt. This patch changes the instructions that are
>> emitted to in order to not give out the exceptions. We first check each
>> operand for NaNs and force any elements containing NaN to zero before
>> using them in the compare.
>>
>> Example: UN<cc> (a, b) -> UNORDERED (a, b) | (cm<cc> (isnan (a) ? 0.0 :
>> a, isnan (b) ? 0.0 : b))
>>
>>
>> The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and
>> UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).
>>
>> Testing done: Checked for regressions on bootstrapped
>> aarch64-none-linux-gnu and added a new test case.
>>
>> Is this ok for trunk? This will probably need a back-port to
>> gcc-7-branch as well.
> 
> OK.
> 
> Let it soak on trunk for a while before the backport.

Thanks. Committed to trunk as r258653. Will wait a week before backport.

Sudi

> 
> Thanks,
> James
> 
>> ChangeLog Entries:
>>
>> *** gcc/ChangeLog ***
>>
>> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
>>
>> 	PR target/81647
>> 	* config/aarch64/aarch64-simd.md (vec_cmp<mode><v_int_equiv>): Modify
>> instructions for
>> 	UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
>>
>> 	PR target/81647
>> 	* gcc.target/aarch64/pr81647.c: New.
> 
>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>> index f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -2731,10 +2731,10 @@
>>   	  break;
>>   	}
>>         /* Fall through.  */
>> -    case UNGE:
>> +    case UNLT:
>>         std::swap (operands[2], operands[3]);
>>         /* Fall through.  */
>> -    case UNLE:
>> +    case UNGT:
>>       case GT:
>>         comparison = gen_aarch64_cmgt<mode>;
>>         break;
>> @@ -2745,10 +2745,10 @@
>>   	  break;
>>   	}
>>         /* Fall through.  */
>> -    case UNGT:
>> +    case UNLE:
>>         std::swap (operands[2], operands[3]);
>>         /* Fall through.  */
>> -    case UNLT:
>> +    case UNGE:
>>       case GE:
>>         comparison = gen_aarch64_cmge<mode>;
>>         break;
>> @@ -2771,21 +2771,35 @@
>>       case UNGT:
>>       case UNLE:
>>       case UNLT:
>> -    case NE:
>> -      /* FCM returns false for lanes which are unordered, so if we use
>> -	 the inverse of the comparison we actually want to emit, then
>> -	 invert the result, we will end up with the correct result.
>> -	 Note that a NE NaN and NaN NE b are true for all a, b.
>> -
>> -	 Our transformations are:
>> -	 a UNGE b -> !(b GT a)
>> -	 a UNGT b -> !(b GE a)
>> -	 a UNLE b -> !(a GT b)
>> -	 a UNLT b -> !(a GE b)
>> -	 a   NE b -> !(a EQ b)  */
>> -      gcc_assert (comparison != NULL);
>> -      emit_insn (comparison (operands[0], operands[2], operands[3]));
>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>> +      {
>> +	/* All of the above must not raise any FP exceptions.  Thus we first
>> +	   check each operand for NaNs and force any elements containing NaN to
>> +	   zero before using them in the compare.
>> +	   Example: UN<cc> (a, b) -> UNORDERED (a, b) |
>> +				     (cm<cc> (isnan (a) ? 0.0 : a,
>> +					      isnan (b) ? 0.0 : b))
>> +	   We use the following transformations for doing the comparisions:
>> +	   a UNGE b -> a GE b
>> +	   a UNGT b -> a GT b
>> +	   a UNLE b -> b GE a
>> +	   a UNLT b -> b GT a.  */
>> +
>> +	rtx tmp0 = gen_reg_rtx (<V_INT_EQUIV>mode);
>> +	rtx tmp1 = gen_reg_rtx (<V_INT_EQUIV>mode);
>> +	rtx tmp2 = gen_reg_rtx (<V_INT_EQUIV>mode);
>> +	emit_insn (gen_aarch64_cmeq<mode> (tmp0, operands[2], operands[2]));
>> +	emit_insn (gen_aarch64_cmeq<mode> (tmp1, operands[3], operands[3]));
>> +	emit_insn (gen_and<v_int_equiv>3 (tmp2, tmp0, tmp1));
>> +	emit_insn (gen_and<v_int_equiv>3 (tmp0, tmp0,
>> +			lowpart_subreg (<V_INT_EQUIV>mode, operands[2], <MODE>mode)));
>> +	emit_insn (gen_and<v_int_equiv>3 (tmp1, tmp1,
>> +			lowpart_subreg (<V_INT_EQUIV>mode, operands[3], <MODE>mode)));
>> +	gcc_assert (comparison != NULL);
>> +	emit_insn (comparison (operands[0],
>> +			       lowpart_subreg (<MODE>mode, tmp0, <V_INT_EQUIV>mode),
>> +			       lowpart_subreg (<MODE>mode, tmp1, <V_INT_EQUIV>mode)));
>> +	emit_insn (gen_orn<v_int_equiv>3 (operands[0], tmp2, operands[0]));
>> +      }
>>         break;
>>   
>>       case LT:
>> @@ -2793,25 +2807,19 @@
>>       case GT:
>>       case GE:
>>       case EQ:
>> +    case NE:
>>         /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
>>   	 As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations are:
>>   	 a GE b -> a GE b
>>   	 a GT b -> a GT b
>>   	 a LE b -> b GE a
>>   	 a LT b -> b GT a
>> -	 a EQ b -> a EQ b  */
>> +	 a EQ b -> a EQ b
>> +	 a NE b -> ~(a EQ b)  */
>>         gcc_assert (comparison != NULL);
>>         emit_insn (comparison (operands[0], operands[2], operands[3]));
>> -      break;
>> -
>> -    case UNEQ:
>> -      /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>> -	 this result will then give us (a == b || a UNORDERED b).  */
>> -      emit_insn (gen_aarch64_cmgt<mode> (operands[0],
>> -					 operands[2], operands[3]));
>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>> +      if (code == NE)
>> +	emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>>         break;
>>   
>>       case LTGT:
>> @@ -2823,21 +2831,22 @@
>>         emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
>>         break;
>>   
>> -    case UNORDERED:
>> -      /* Operands are ORDERED iff (a > b || b >= a), so we can compute
>> -	 UNORDERED as !ORDERED.  */
>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2], operands[3]));
>> -      emit_insn (gen_aarch64_cmge<mode> (operands[0],
>> -					 operands[3], operands[2]));
>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>> -      break;
>> -
>>       case ORDERED:
>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2], operands[3]));
>> -      emit_insn (gen_aarch64_cmge<mode> (operands[0],
>> -					 operands[3], operands[2]));
>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
>> +    case UNORDERED:
>> +    case UNEQ:
>> +      /* cmeq (a, a) & cmeq (b, b).  */
>> +      emit_insn (gen_aarch64_cmeq<mode> (operands[0],
>> +					 operands[2], operands[2]));
>> +      emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[3], operands[3]));
>> +      emit_insn (gen_and<v_int_equiv>3 (operands[0], operands[0], tmp));
>> +
>> +      if (code == UNORDERED)
>> +	emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>> +      else if (code == UNEQ)
>> +	{
>> +	  emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[2], operands[3]));
>> +	  emit_insn (gen_orn<v_int_equiv>3 (operands[0], operands[0], tmp));
>> +	}
>>         break;
>>   
>>       default:
>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81647.c b/gcc/testsuite/gcc.target/aarch64/pr81647.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..f60dfba49d538e3b2164b11392ab8dbfdba6546e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/pr81647.c
>> @@ -0,0 +1,44 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O3 -fdump-tree-ssa" } */
>> +
>> +#include <fenv.h>
>> +
>> +double x[28], y[28];
>> +int res[28];
>> +
>> +int
>> +main (void)
>> +{
>> +  int i;
>> +  for (i = 0; i < 28; ++i)
>> +    {
>> +      x[i] = __builtin_nan ("");
>> +      y[i] = i;
>> +    }
>> +  __asm__ volatile ("" ::: "memory");
>> +  feclearexcept (FE_ALL_EXCEPT);
>> +  for (i = 0; i < 4; ++i)
>> +    res[i] = __builtin_isgreater (x[i], y[i]);
>> +  for (i = 4; i < 8; ++i)
>> +    res[i] = __builtin_isgreaterequal (x[i], y[i]);
>> +  for (i = 8; i < 12; ++i)
>> +    res[i] = __builtin_isless (x[i], y[i]);
>> +  for (i = 12; i < 16; ++i)
>> +    res[i] = __builtin_islessequal (x[i], y[i]);
>> +  for (i = 16; i < 20; ++i)
>> +    res[i] = __builtin_islessgreater (x[i], y[i]);
>> +  for (i = 20; i < 24; ++i)
>> +    res[i] = __builtin_isunordered (x[i], y[i]);
>> +  for (i = 24; i < 28; ++i)
>> +    res[i] = !(__builtin_isunordered (x[i], y[i]));
>> +  __asm__ volatile ("" ::: "memory");
>> +  return fetestexcept (FE_ALL_EXCEPT) != 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump " u> " "ssa" } } */
>> +/* { dg-final { scan-tree-dump " u>= " "ssa" } } */
>> +/* { dg-final { scan-tree-dump " u< " "ssa" } } */
>> +/* { dg-final { scan-tree-dump " u<= " "ssa" } } */
>> +/* { dg-final { scan-tree-dump " u== " "ssa" } } */
>> +/* { dg-final { scan-tree-dump " unord " "ssa" } } */
>> +/* { dg-final { scan-tree-dump " ord " "ssa" } } */
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
  2018-03-19 19:31   ` Sudakshina Das
@ 2018-03-20  8:19     ` Christophe Lyon
  2018-03-20 10:58       ` Sudakshina Das
  0 siblings, 1 reply; 8+ messages in thread
From: Christophe Lyon @ 2018-03-20  8:19 UTC (permalink / raw)
  To: Sudakshina Das
  Cc: James Greenhalgh, gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft

On 19 March 2018 at 19:55, Sudakshina Das <sudi.das@arm.com> wrote:
> Hi
>
>
> On 19/03/18 14:29, James Greenhalgh wrote:
>>
>> On Fri, Dec 15, 2017 at 11:57:46AM +0000, Sudi Das wrote:
>>>
>>> Hi
>>>
>>> This patch fixes the inconsistent behavior observed at -O3 for the
>>> unordered comparisons. According to the online docs
>>>
>>> (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html),
>>> all of the following should not raise an FP exception:
>>> - UNGE_EXPR
>>> - UNGT_EXPR
>>> - UNLE_EXPR
>>> - UNLT_EXPR
>>> - UNEQ_EXPR
>>> Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.
>>>
>>> The aarch64-simd.md handling of these were generating exception raising
>>> instructions such as fcmgt. This patch changes the instructions that are
>>> emitted to in order to not give out the exceptions. We first check each
>>> operand for NaNs and force any elements containing NaN to zero before
>>> using them in the compare.
>>>
>>> Example: UN<cc> (a, b) -> UNORDERED (a, b) | (cm<cc> (isnan (a) ? 0.0 :
>>> a, isnan (b) ? 0.0 : b))
>>>
>>>
>>> The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and
>>> UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).
>>>
>>> Testing done: Checked for regressions on bootstrapped
>>> aarch64-none-linux-gnu and added a new test case.
>>>
>>> Is this ok for trunk? This will probably need a back-port to
>>> gcc-7-branch as well.
>>
>>
>> OK.
>>
>> Let it soak on trunk for a while before the backport.
>
>
> Thanks. Committed to trunk as r258653. Will wait a week before backport.
>

Hi,

As the test failed to compile on aarch64 bare-metal targets, I added
/* { dg-require-effective-target fenv_exceptions } */
as obvious (r258672).

2018-03-20  Christophe Lyon  <christophe.lyon@linaro.org>

       PR target/81647
       * gcc.target/aarch64/pr81647.c: Require fenv_exceptions.

Index: testsuite/gcc.target/aarch64/pr81647.c
===================================================================
--- testsuite/gcc.target/aarch64/pr81647.c      (revision 258671)
+++ testsuite/gcc.target/aarch64/pr81647.c      (revision 258672)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O3 -fdump-tree-ssa" } */
+/* { dg-require-effective-target fenv_exceptions } */

 #include <fenv.h>



Christophe

> Sudi
>
>
>>
>> Thanks,
>> James
>>
>>> ChangeLog Entries:
>>>
>>> *** gcc/ChangeLog ***
>>>
>>> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
>>>
>>>         PR target/81647
>>>         * config/aarch64/aarch64-simd.md (vec_cmp<mode><v_int_equiv>):
>>> Modify
>>> instructions for
>>>         UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>>
>>> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
>>>
>>>         PR target/81647
>>>         * gcc.target/aarch64/pr81647.c: New.
>>
>>
>>> diff --git a/gcc/config/aarch64/aarch64-simd.md
>>> b/gcc/config/aarch64/aarch64-simd.md
>>> index
>>> f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b
>>> 100644
>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>> @@ -2731,10 +2731,10 @@
>>>           break;
>>>         }
>>>         /* Fall through.  */
>>> -    case UNGE:
>>> +    case UNLT:
>>>         std::swap (operands[2], operands[3]);
>>>         /* Fall through.  */
>>> -    case UNLE:
>>> +    case UNGT:
>>>       case GT:
>>>         comparison = gen_aarch64_cmgt<mode>;
>>>         break;
>>> @@ -2745,10 +2745,10 @@
>>>           break;
>>>         }
>>>         /* Fall through.  */
>>> -    case UNGT:
>>> +    case UNLE:
>>>         std::swap (operands[2], operands[3]);
>>>         /* Fall through.  */
>>> -    case UNLT:
>>> +    case UNGE:
>>>       case GE:
>>>         comparison = gen_aarch64_cmge<mode>;
>>>         break;
>>> @@ -2771,21 +2771,35 @@
>>>       case UNGT:
>>>       case UNLE:
>>>       case UNLT:
>>> -    case NE:
>>> -      /* FCM returns false for lanes which are unordered, so if we use
>>> -        the inverse of the comparison we actually want to emit, then
>>> -        invert the result, we will end up with the correct result.
>>> -        Note that a NE NaN and NaN NE b are true for all a, b.
>>> -
>>> -        Our transformations are:
>>> -        a UNGE b -> !(b GT a)
>>> -        a UNGT b -> !(b GE a)
>>> -        a UNLE b -> !(a GT b)
>>> -        a UNLT b -> !(a GE b)
>>> -        a   NE b -> !(a EQ b)  */
>>> -      gcc_assert (comparison != NULL);
>>> -      emit_insn (comparison (operands[0], operands[2], operands[3]));
>>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>>> +      {
>>> +       /* All of the above must not raise any FP exceptions.  Thus we
>>> first
>>> +          check each operand for NaNs and force any elements containing
>>> NaN to
>>> +          zero before using them in the compare.
>>> +          Example: UN<cc> (a, b) -> UNORDERED (a, b) |
>>> +                                    (cm<cc> (isnan (a) ? 0.0 : a,
>>> +                                             isnan (b) ? 0.0 : b))
>>> +          We use the following transformations for doing the
>>> comparisions:
>>> +          a UNGE b -> a GE b
>>> +          a UNGT b -> a GT b
>>> +          a UNLE b -> b GE a
>>> +          a UNLT b -> b GT a.  */
>>> +
>>> +       rtx tmp0 = gen_reg_rtx (<V_INT_EQUIV>mode);
>>> +       rtx tmp1 = gen_reg_rtx (<V_INT_EQUIV>mode);
>>> +       rtx tmp2 = gen_reg_rtx (<V_INT_EQUIV>mode);
>>> +       emit_insn (gen_aarch64_cmeq<mode> (tmp0, operands[2],
>>> operands[2]));
>>> +       emit_insn (gen_aarch64_cmeq<mode> (tmp1, operands[3],
>>> operands[3]));
>>> +       emit_insn (gen_and<v_int_equiv>3 (tmp2, tmp0, tmp1));
>>> +       emit_insn (gen_and<v_int_equiv>3 (tmp0, tmp0,
>>> +                       lowpart_subreg (<V_INT_EQUIV>mode, operands[2],
>>> <MODE>mode)));
>>> +       emit_insn (gen_and<v_int_equiv>3 (tmp1, tmp1,
>>> +                       lowpart_subreg (<V_INT_EQUIV>mode, operands[3],
>>> <MODE>mode)));
>>> +       gcc_assert (comparison != NULL);
>>> +       emit_insn (comparison (operands[0],
>>> +                              lowpart_subreg (<MODE>mode, tmp0,
>>> <V_INT_EQUIV>mode),
>>> +                              lowpart_subreg (<MODE>mode, tmp1,
>>> <V_INT_EQUIV>mode)));
>>> +       emit_insn (gen_orn<v_int_equiv>3 (operands[0], tmp2,
>>> operands[0]));
>>> +      }
>>>         break;
>>>         case LT:
>>> @@ -2793,25 +2807,19 @@
>>>       case GT:
>>>       case GE:
>>>       case EQ:
>>> +    case NE:
>>>         /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
>>>          As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations
>>> are:
>>>          a GE b -> a GE b
>>>          a GT b -> a GT b
>>>          a LE b -> b GE a
>>>          a LT b -> b GT a
>>> -        a EQ b -> a EQ b  */
>>> +        a EQ b -> a EQ b
>>> +        a NE b -> ~(a EQ b)  */
>>>         gcc_assert (comparison != NULL);
>>>         emit_insn (comparison (operands[0], operands[2], operands[3]));
>>> -      break;
>>> -
>>> -    case UNEQ:
>>> -      /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>>> -        this result will then give us (a == b || a UNORDERED b).  */
>>> -      emit_insn (gen_aarch64_cmgt<mode> (operands[0],
>>> -                                        operands[2], operands[3]));
>>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3],
>>> operands[2]));
>>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
>>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>>> +      if (code == NE)
>>> +       emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0],
>>> operands[0]));
>>>         break;
>>>         case LTGT:
>>> @@ -2823,21 +2831,22 @@
>>>         emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0],
>>> tmp));
>>>         break;
>>>   -    case UNORDERED:
>>> -      /* Operands are ORDERED iff (a > b || b >= a), so we can compute
>>> -        UNORDERED as !ORDERED.  */
>>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2],
>>> operands[3]));
>>> -      emit_insn (gen_aarch64_cmge<mode> (operands[0],
>>> -                                        operands[3], operands[2]));
>>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
>>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>>> -      break;
>>> -
>>>       case ORDERED:
>>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2],
>>> operands[3]));
>>> -      emit_insn (gen_aarch64_cmge<mode> (operands[0],
>>> -                                        operands[3], operands[2]));
>>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
>>> +    case UNORDERED:
>>> +    case UNEQ:
>>> +      /* cmeq (a, a) & cmeq (b, b).  */
>>> +      emit_insn (gen_aarch64_cmeq<mode> (operands[0],
>>> +                                        operands[2], operands[2]));
>>> +      emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[3],
>>> operands[3]));
>>> +      emit_insn (gen_and<v_int_equiv>3 (operands[0], operands[0], tmp));
>>> +
>>> +      if (code == UNORDERED)
>>> +       emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0],
>>> operands[0]));
>>> +      else if (code == UNEQ)
>>> +       {
>>> +         emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[2],
>>> operands[3]));
>>> +         emit_insn (gen_orn<v_int_equiv>3 (operands[0], operands[0],
>>> tmp));
>>> +       }
>>>         break;
>>>         default:
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81647.c
>>> b/gcc/testsuite/gcc.target/aarch64/pr81647.c
>>> new file mode 100644
>>> index
>>> 0000000000000000000000000000000000000000..f60dfba49d538e3b2164b11392ab8dbfdba6546e
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/pr81647.c
>>> @@ -0,0 +1,44 @@
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O3 -fdump-tree-ssa" } */
>>> +
>>> +#include <fenv.h>
>>> +
>>> +double x[28], y[28];
>>> +int res[28];
>>> +
>>> +int
>>> +main (void)
>>> +{
>>> +  int i;
>>> +  for (i = 0; i < 28; ++i)
>>> +    {
>>> +      x[i] = __builtin_nan ("");
>>> +      y[i] = i;
>>> +    }
>>> +  __asm__ volatile ("" ::: "memory");
>>> +  feclearexcept (FE_ALL_EXCEPT);
>>> +  for (i = 0; i < 4; ++i)
>>> +    res[i] = __builtin_isgreater (x[i], y[i]);
>>> +  for (i = 4; i < 8; ++i)
>>> +    res[i] = __builtin_isgreaterequal (x[i], y[i]);
>>> +  for (i = 8; i < 12; ++i)
>>> +    res[i] = __builtin_isless (x[i], y[i]);
>>> +  for (i = 12; i < 16; ++i)
>>> +    res[i] = __builtin_islessequal (x[i], y[i]);
>>> +  for (i = 16; i < 20; ++i)
>>> +    res[i] = __builtin_islessgreater (x[i], y[i]);
>>> +  for (i = 20; i < 24; ++i)
>>> +    res[i] = __builtin_isunordered (x[i], y[i]);
>>> +  for (i = 24; i < 28; ++i)
>>> +    res[i] = !(__builtin_isunordered (x[i], y[i]));
>>> +  __asm__ volatile ("" ::: "memory");
>>> +  return fetestexcept (FE_ALL_EXCEPT) != 0;
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump " u> " "ssa" } } */
>>> +/* { dg-final { scan-tree-dump " u>= " "ssa" } } */
>>> +/* { dg-final { scan-tree-dump " u< " "ssa" } } */
>>> +/* { dg-final { scan-tree-dump " u<= " "ssa" } } */
>>> +/* { dg-final { scan-tree-dump " u== " "ssa" } } */
>>> +/* { dg-final { scan-tree-dump " unord " "ssa" } } */
>>> +/* { dg-final { scan-tree-dump " ord " "ssa" } } */
>>
>>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
  2018-03-20  8:19     ` Christophe Lyon
@ 2018-03-20 10:58       ` Sudakshina Das
  2018-03-28 11:37         ` Sudakshina Das
  0 siblings, 1 reply; 8+ messages in thread
From: Sudakshina Das @ 2018-03-20 10:58 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: James Greenhalgh, gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft

Hi

On 20/03/18 08:13, Christophe Lyon wrote:
> On 19 March 2018 at 19:55, Sudakshina Das <sudi.das@arm.com> wrote:
>> Hi
>>
>>
>> On 19/03/18 14:29, James Greenhalgh wrote:
>>>
>>> On Fri, Dec 15, 2017 at 11:57:46AM +0000, Sudi Das wrote:
>>>>
>>>> Hi
>>>>
>>>> This patch fixes the inconsistent behavior observed at -O3 for the
>>>> unordered comparisons. According to the online docs
>>>>
>>>> (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html),
>>>> all of the following should not raise an FP exception:
>>>> - UNGE_EXPR
>>>> - UNGT_EXPR
>>>> - UNLE_EXPR
>>>> - UNLT_EXPR
>>>> - UNEQ_EXPR
>>>> Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.
>>>>
>>>> The aarch64-simd.md handling of these were generating exception raising
>>>> instructions such as fcmgt. This patch changes the instructions that are
>>>> emitted to in order to not give out the exceptions. We first check each
>>>> operand for NaNs and force any elements containing NaN to zero before
>>>> using them in the compare.
>>>>
>>>> Example: UN<cc> (a, b) -> UNORDERED (a, b) | (cm<cc> (isnan (a) ? 0.0 :
>>>> a, isnan (b) ? 0.0 : b))
>>>>
>>>>
>>>> The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and
>>>> UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).
>>>>
>>>> Testing done: Checked for regressions on bootstrapped
>>>> aarch64-none-linux-gnu and added a new test case.
>>>>
>>>> Is this ok for trunk? This will probably need a back-port to
>>>> gcc-7-branch as well.
>>>
>>>
>>> OK.
>>>
>>> Let it soak on trunk for a while before the backport.
>>
>>
>> Thanks. Committed to trunk as r258653. Will wait a week before backport.
>>
> 
> Hi,
> 
> As the test failed to compile on aarch64 bare-metal targets, I added
> /* { dg-require-effective-target fenv_exceptions } */
> as obvious (r258672).
> 
> 2018-03-20  Christophe Lyon  <christophe.lyon@linaro.org>
> 
>         PR target/81647
>         * gcc.target/aarch64/pr81647.c: Require fenv_exceptions.
> 
> Index: testsuite/gcc.target/aarch64/pr81647.c
> ===================================================================
> --- testsuite/gcc.target/aarch64/pr81647.c      (revision 258671)
> +++ testsuite/gcc.target/aarch64/pr81647.c      (revision 258672)
> @@ -1,5 +1,6 @@
>   /* { dg-do run } */
>   /* { dg-options "-O3 -fdump-tree-ssa" } */
> +/* { dg-require-effective-target fenv_exceptions } */
> 
>   #include <fenv.h>
> 
> 
> 
> Christophe

Thanks for fixing this and apologies for missing it on the first place!

Sudi

> 
>> Sudi
>>
>>
>>>
>>> Thanks,
>>> James
>>>
>>>> ChangeLog Entries:
>>>>
>>>> *** gcc/ChangeLog ***
>>>>
>>>> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
>>>>
>>>>          PR target/81647
>>>>          * config/aarch64/aarch64-simd.md (vec_cmp<mode><v_int_equiv>):
>>>> Modify
>>>> instructions for
>>>>          UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.
>>>>
>>>> *** gcc/testsuite/ChangeLog ***
>>>>
>>>> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
>>>>
>>>>          PR target/81647
>>>>          * gcc.target/aarch64/pr81647.c: New.
>>>
>>>
>>>> diff --git a/gcc/config/aarch64/aarch64-simd.md
>>>> b/gcc/config/aarch64/aarch64-simd.md
>>>> index
>>>> f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b
>>>> 100644
>>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>>> @@ -2731,10 +2731,10 @@
>>>>            break;
>>>>          }
>>>>          /* Fall through.  */
>>>> -    case UNGE:
>>>> +    case UNLT:
>>>>          std::swap (operands[2], operands[3]);
>>>>          /* Fall through.  */
>>>> -    case UNLE:
>>>> +    case UNGT:
>>>>        case GT:
>>>>          comparison = gen_aarch64_cmgt<mode>;
>>>>          break;
>>>> @@ -2745,10 +2745,10 @@
>>>>            break;
>>>>          }
>>>>          /* Fall through.  */
>>>> -    case UNGT:
>>>> +    case UNLE:
>>>>          std::swap (operands[2], operands[3]);
>>>>          /* Fall through.  */
>>>> -    case UNLT:
>>>> +    case UNGE:
>>>>        case GE:
>>>>          comparison = gen_aarch64_cmge<mode>;
>>>>          break;
>>>> @@ -2771,21 +2771,35 @@
>>>>        case UNGT:
>>>>        case UNLE:
>>>>        case UNLT:
>>>> -    case NE:
>>>> -      /* FCM returns false for lanes which are unordered, so if we use
>>>> -        the inverse of the comparison we actually want to emit, then
>>>> -        invert the result, we will end up with the correct result.
>>>> -        Note that a NE NaN and NaN NE b are true for all a, b.
>>>> -
>>>> -        Our transformations are:
>>>> -        a UNGE b -> !(b GT a)
>>>> -        a UNGT b -> !(b GE a)
>>>> -        a UNLE b -> !(a GT b)
>>>> -        a UNLT b -> !(a GE b)
>>>> -        a   NE b -> !(a EQ b)  */
>>>> -      gcc_assert (comparison != NULL);
>>>> -      emit_insn (comparison (operands[0], operands[2], operands[3]));
>>>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>>>> +      {
>>>> +       /* All of the above must not raise any FP exceptions.  Thus we
>>>> first
>>>> +          check each operand for NaNs and force any elements containing
>>>> NaN to
>>>> +          zero before using them in the compare.
>>>> +          Example: UN<cc> (a, b) -> UNORDERED (a, b) |
>>>> +                                    (cm<cc> (isnan (a) ? 0.0 : a,
>>>> +                                             isnan (b) ? 0.0 : b))
>>>> +          We use the following transformations for doing the
>>>> comparisions:
>>>> +          a UNGE b -> a GE b
>>>> +          a UNGT b -> a GT b
>>>> +          a UNLE b -> b GE a
>>>> +          a UNLT b -> b GT a.  */
>>>> +
>>>> +       rtx tmp0 = gen_reg_rtx (<V_INT_EQUIV>mode);
>>>> +       rtx tmp1 = gen_reg_rtx (<V_INT_EQUIV>mode);
>>>> +       rtx tmp2 = gen_reg_rtx (<V_INT_EQUIV>mode);
>>>> +       emit_insn (gen_aarch64_cmeq<mode> (tmp0, operands[2],
>>>> operands[2]));
>>>> +       emit_insn (gen_aarch64_cmeq<mode> (tmp1, operands[3],
>>>> operands[3]));
>>>> +       emit_insn (gen_and<v_int_equiv>3 (tmp2, tmp0, tmp1));
>>>> +       emit_insn (gen_and<v_int_equiv>3 (tmp0, tmp0,
>>>> +                       lowpart_subreg (<V_INT_EQUIV>mode, operands[2],
>>>> <MODE>mode)));
>>>> +       emit_insn (gen_and<v_int_equiv>3 (tmp1, tmp1,
>>>> +                       lowpart_subreg (<V_INT_EQUIV>mode, operands[3],
>>>> <MODE>mode)));
>>>> +       gcc_assert (comparison != NULL);
>>>> +       emit_insn (comparison (operands[0],
>>>> +                              lowpart_subreg (<MODE>mode, tmp0,
>>>> <V_INT_EQUIV>mode),
>>>> +                              lowpart_subreg (<MODE>mode, tmp1,
>>>> <V_INT_EQUIV>mode)));
>>>> +       emit_insn (gen_orn<v_int_equiv>3 (operands[0], tmp2,
>>>> operands[0]));
>>>> +      }
>>>>          break;
>>>>          case LT:
>>>> @@ -2793,25 +2807,19 @@
>>>>        case GT:
>>>>        case GE:
>>>>        case EQ:
>>>> +    case NE:
>>>>          /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
>>>>           As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations
>>>> are:
>>>>           a GE b -> a GE b
>>>>           a GT b -> a GT b
>>>>           a LE b -> b GE a
>>>>           a LT b -> b GT a
>>>> -        a EQ b -> a EQ b  */
>>>> +        a EQ b -> a EQ b
>>>> +        a NE b -> ~(a EQ b)  */
>>>>          gcc_assert (comparison != NULL);
>>>>          emit_insn (comparison (operands[0], operands[2], operands[3]));
>>>> -      break;
>>>> -
>>>> -    case UNEQ:
>>>> -      /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>>>> -        this result will then give us (a == b || a UNORDERED b).  */
>>>> -      emit_insn (gen_aarch64_cmgt<mode> (operands[0],
>>>> -                                        operands[2], operands[3]));
>>>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3],
>>>> operands[2]));
>>>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
>>>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>>>> +      if (code == NE)
>>>> +       emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0],
>>>> operands[0]));
>>>>          break;
>>>>          case LTGT:
>>>> @@ -2823,21 +2831,22 @@
>>>>          emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0],
>>>> tmp));
>>>>          break;
>>>>    -    case UNORDERED:
>>>> -      /* Operands are ORDERED iff (a > b || b >= a), so we can compute
>>>> -        UNORDERED as !ORDERED.  */
>>>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2],
>>>> operands[3]));
>>>> -      emit_insn (gen_aarch64_cmge<mode> (operands[0],
>>>> -                                        operands[3], operands[2]));
>>>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
>>>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], operands[0]));
>>>> -      break;
>>>> -
>>>>        case ORDERED:
>>>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2],
>>>> operands[3]));
>>>> -      emit_insn (gen_aarch64_cmge<mode> (operands[0],
>>>> -                                        operands[3], operands[2]));
>>>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], tmp));
>>>> +    case UNORDERED:
>>>> +    case UNEQ:
>>>> +      /* cmeq (a, a) & cmeq (b, b).  */
>>>> +      emit_insn (gen_aarch64_cmeq<mode> (operands[0],
>>>> +                                        operands[2], operands[2]));
>>>> +      emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[3],
>>>> operands[3]));
>>>> +      emit_insn (gen_and<v_int_equiv>3 (operands[0], operands[0], tmp));
>>>> +
>>>> +      if (code == UNORDERED)
>>>> +       emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0],
>>>> operands[0]));
>>>> +      else if (code == UNEQ)
>>>> +       {
>>>> +         emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[2],
>>>> operands[3]));
>>>> +         emit_insn (gen_orn<v_int_equiv>3 (operands[0], operands[0],
>>>> tmp));
>>>> +       }
>>>>          break;
>>>>          default:
>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81647.c
>>>> b/gcc/testsuite/gcc.target/aarch64/pr81647.c
>>>> new file mode 100644
>>>> index
>>>> 0000000000000000000000000000000000000000..f60dfba49d538e3b2164b11392ab8dbfdba6546e
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/aarch64/pr81647.c
>>>> @@ -0,0 +1,44 @@
>>>> +/* { dg-do run } */
>>>> +/* { dg-options "-O3 -fdump-tree-ssa" } */
>>>> +
>>>> +#include <fenv.h>
>>>> +
>>>> +double x[28], y[28];
>>>> +int res[28];
>>>> +
>>>> +int
>>>> +main (void)
>>>> +{
>>>> +  int i;
>>>> +  for (i = 0; i < 28; ++i)
>>>> +    {
>>>> +      x[i] = __builtin_nan ("");
>>>> +      y[i] = i;
>>>> +    }
>>>> +  __asm__ volatile ("" ::: "memory");
>>>> +  feclearexcept (FE_ALL_EXCEPT);
>>>> +  for (i = 0; i < 4; ++i)
>>>> +    res[i] = __builtin_isgreater (x[i], y[i]);
>>>> +  for (i = 4; i < 8; ++i)
>>>> +    res[i] = __builtin_isgreaterequal (x[i], y[i]);
>>>> +  for (i = 8; i < 12; ++i)
>>>> +    res[i] = __builtin_isless (x[i], y[i]);
>>>> +  for (i = 12; i < 16; ++i)
>>>> +    res[i] = __builtin_islessequal (x[i], y[i]);
>>>> +  for (i = 16; i < 20; ++i)
>>>> +    res[i] = __builtin_islessgreater (x[i], y[i]);
>>>> +  for (i = 20; i < 24; ++i)
>>>> +    res[i] = __builtin_isunordered (x[i], y[i]);
>>>> +  for (i = 24; i < 28; ++i)
>>>> +    res[i] = !(__builtin_isunordered (x[i], y[i]));
>>>> +  __asm__ volatile ("" ::: "memory");
>>>> +  return fetestexcept (FE_ALL_EXCEPT) != 0;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-tree-dump " u> " "ssa" } } */
>>>> +/* { dg-final { scan-tree-dump " u>= " "ssa" } } */
>>>> +/* { dg-final { scan-tree-dump " u< " "ssa" } } */
>>>> +/* { dg-final { scan-tree-dump " u<= " "ssa" } } */
>>>> +/* { dg-final { scan-tree-dump " u== " "ssa" } } */
>>>> +/* { dg-final { scan-tree-dump " unord " "ssa" } } */
>>>> +/* { dg-final { scan-tree-dump " ord " "ssa" } } */
>>>
>>>
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
  2018-03-20 10:58       ` Sudakshina Das
@ 2018-03-28 11:37         ` Sudakshina Das
  0 siblings, 0 replies; 8+ messages in thread
From: Sudakshina Das @ 2018-03-28 11:37 UTC (permalink / raw)
  To: gcc-patches
  Cc: James Greenhalgh, nd, Richard Earnshaw, Marcus Shawcroft,
	Christophe Lyon

Hi

On 20/03/18 10:57, Sudakshina Das wrote:
> Hi
> 
> On 20/03/18 08:13, Christophe Lyon wrote:
>> On 19 March 2018 at 19:55, Sudakshina Das <sudi.das@arm.com> wrote:
>>> Hi
>>>
>>>
>>> On 19/03/18 14:29, James Greenhalgh wrote:
>>>>
>>>> On Fri, Dec 15, 2017 at 11:57:46AM +0000, Sudi Das wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>> This patch fixes the inconsistent behavior observed at -O3 for the
>>>>> unordered comparisons. According to the online docs
>>>>>
>>>>> (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), 
>>>>>
>>>>> all of the following should not raise an FP exception:
>>>>> - UNGE_EXPR
>>>>> - UNGT_EXPR
>>>>> - UNLE_EXPR
>>>>> - UNLT_EXPR
>>>>> - UNEQ_EXPR
>>>>> Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.
>>>>>
>>>>> The aarch64-simd.md handling of these were generating exception 
>>>>> raising
>>>>> instructions such as fcmgt. This patch changes the instructions 
>>>>> that are
>>>>> emitted to in order to not give out the exceptions. We first check 
>>>>> each
>>>>> operand for NaNs and force any elements containing NaN to zero before
>>>>> using them in the compare.
>>>>>
>>>>> Example: UN<cc> (a, b) -> UNORDERED (a, b) | (cm<cc> (isnan (a) ? 
>>>>> 0.0 :
>>>>> a, isnan (b) ? 0.0 : b))
>>>>>
>>>>>
>>>>> The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and
>>>>> UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq 
>>>>> (a,b)).
>>>>>
>>>>> Testing done: Checked for regressions on bootstrapped
>>>>> aarch64-none-linux-gnu and added a new test case.
>>>>>
>>>>> Is this ok for trunk? This will probably need a back-port to
>>>>> gcc-7-branch as well.
>>>>
>>>>
>>>> OK.
>>>>
>>>> Let it soak on trunk for a while before the backport.

Backported both r258653 and Christophe's r258672 to gcc-7-branch
as r258917 (reg-tested). Needed non-functional edits because
of the name change from vec_cmp<mode><v_int_equiv> to
vec_cmp<mode><v_cmp_result> between gcc-7 and trunk.

Thanks
Sudi

>>>
>>>
>>> Thanks. Committed to trunk as r258653. Will wait a week before backport.
>>>
>>
>> Hi,
>>
>> As the test failed to compile on aarch64 bare-metal targets, I added
>> /* { dg-require-effective-target fenv_exceptions } */
>> as obvious (r258672).
>>
>> 2018-03-20  Christophe Lyon  <christophe.lyon@linaro.org>
>>
>>         PR target/81647
>>         * gcc.target/aarch64/pr81647.c: Require fenv_exceptions.
>>
>> Index: testsuite/gcc.target/aarch64/pr81647.c
>> ===================================================================
>> --- testsuite/gcc.target/aarch64/pr81647.c      (revision 258671)
>> +++ testsuite/gcc.target/aarch64/pr81647.c      (revision 258672)
>> @@ -1,5 +1,6 @@
>>   /* { dg-do run } */
>>   /* { dg-options "-O3 -fdump-tree-ssa" } */
>> +/* { dg-require-effective-target fenv_exceptions } */
>>
>>   #include <fenv.h>
>>
>>
>>
>> Christophe
> 
> Thanks for fixing this and apologies for missing it on the first place!
> 
> Sudi
> 
>>
>>> Sudi
>>>
>>>
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>>> ChangeLog Entries:
>>>>>
>>>>> *** gcc/ChangeLog ***
>>>>>
>>>>> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
>>>>>
>>>>>          PR target/81647
>>>>>          * config/aarch64/aarch64-simd.md 
>>>>> (vec_cmp<mode><v_int_equiv>):
>>>>> Modify
>>>>> instructions for
>>>>>          UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.
>>>>>
>>>>> *** gcc/testsuite/ChangeLog ***
>>>>>
>>>>> 2017-12-15  Sudakshina Das  <sudi.das@arm.com>
>>>>>
>>>>>          PR target/81647
>>>>>          * gcc.target/aarch64/pr81647.c: New.
>>>>
>>>>
>>>>> diff --git a/gcc/config/aarch64/aarch64-simd.md
>>>>> b/gcc/config/aarch64/aarch64-simd.md
>>>>> index
>>>>> f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 
>>>>>
>>>>> 100644
>>>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>>>> @@ -2731,10 +2731,10 @@
>>>>>            break;
>>>>>          }
>>>>>          /* Fall through.  */
>>>>> -    case UNGE:
>>>>> +    case UNLT:
>>>>>          std::swap (operands[2], operands[3]);
>>>>>          /* Fall through.  */
>>>>> -    case UNLE:
>>>>> +    case UNGT:
>>>>>        case GT:
>>>>>          comparison = gen_aarch64_cmgt<mode>;
>>>>>          break;
>>>>> @@ -2745,10 +2745,10 @@
>>>>>            break;
>>>>>          }
>>>>>          /* Fall through.  */
>>>>> -    case UNGT:
>>>>> +    case UNLE:
>>>>>          std::swap (operands[2], operands[3]);
>>>>>          /* Fall through.  */
>>>>> -    case UNLT:
>>>>> +    case UNGE:
>>>>>        case GE:
>>>>>          comparison = gen_aarch64_cmge<mode>;
>>>>>          break;
>>>>> @@ -2771,21 +2771,35 @@
>>>>>        case UNGT:
>>>>>        case UNLE:
>>>>>        case UNLT:
>>>>> -    case NE:
>>>>> -      /* FCM returns false for lanes which are unordered, so if we 
>>>>> use
>>>>> -        the inverse of the comparison we actually want to emit, then
>>>>> -        invert the result, we will end up with the correct result.
>>>>> -        Note that a NE NaN and NaN NE b are true for all a, b.
>>>>> -
>>>>> -        Our transformations are:
>>>>> -        a UNGE b -> !(b GT a)
>>>>> -        a UNGT b -> !(b GE a)
>>>>> -        a UNLE b -> !(a GT b)
>>>>> -        a UNLT b -> !(a GE b)
>>>>> -        a   NE b -> !(a EQ b)  */
>>>>> -      gcc_assert (comparison != NULL);
>>>>> -      emit_insn (comparison (operands[0], operands[2], operands[3]));
>>>>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], 
>>>>> operands[0]));
>>>>> +      {
>>>>> +       /* All of the above must not raise any FP exceptions.  Thus we
>>>>> first
>>>>> +          check each operand for NaNs and force any elements 
>>>>> containing
>>>>> NaN to
>>>>> +          zero before using them in the compare.
>>>>> +          Example: UN<cc> (a, b) -> UNORDERED (a, b) |
>>>>> +                                    (cm<cc> (isnan (a) ? 0.0 : a,
>>>>> +                                             isnan (b) ? 0.0 : b))
>>>>> +          We use the following transformations for doing the
>>>>> comparisions:
>>>>> +          a UNGE b -> a GE b
>>>>> +          a UNGT b -> a GT b
>>>>> +          a UNLE b -> b GE a
>>>>> +          a UNLT b -> b GT a.  */
>>>>> +
>>>>> +       rtx tmp0 = gen_reg_rtx (<V_INT_EQUIV>mode);
>>>>> +       rtx tmp1 = gen_reg_rtx (<V_INT_EQUIV>mode);
>>>>> +       rtx tmp2 = gen_reg_rtx (<V_INT_EQUIV>mode);
>>>>> +       emit_insn (gen_aarch64_cmeq<mode> (tmp0, operands[2],
>>>>> operands[2]));
>>>>> +       emit_insn (gen_aarch64_cmeq<mode> (tmp1, operands[3],
>>>>> operands[3]));
>>>>> +       emit_insn (gen_and<v_int_equiv>3 (tmp2, tmp0, tmp1));
>>>>> +       emit_insn (gen_and<v_int_equiv>3 (tmp0, tmp0,
>>>>> +                       lowpart_subreg (<V_INT_EQUIV>mode, 
>>>>> operands[2],
>>>>> <MODE>mode)));
>>>>> +       emit_insn (gen_and<v_int_equiv>3 (tmp1, tmp1,
>>>>> +                       lowpart_subreg (<V_INT_EQUIV>mode, 
>>>>> operands[3],
>>>>> <MODE>mode)));
>>>>> +       gcc_assert (comparison != NULL);
>>>>> +       emit_insn (comparison (operands[0],
>>>>> +                              lowpart_subreg (<MODE>mode, tmp0,
>>>>> <V_INT_EQUIV>mode),
>>>>> +                              lowpart_subreg (<MODE>mode, tmp1,
>>>>> <V_INT_EQUIV>mode)));
>>>>> +       emit_insn (gen_orn<v_int_equiv>3 (operands[0], tmp2,
>>>>> operands[0]));
>>>>> +      }
>>>>>          break;
>>>>>          case LT:
>>>>> @@ -2793,25 +2807,19 @@
>>>>>        case GT:
>>>>>        case GE:
>>>>>        case EQ:
>>>>> +    case NE:
>>>>>          /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
>>>>>           As a LT b <=> b GE a && a LE b <=> b GT a.  Our 
>>>>> transformations
>>>>> are:
>>>>>           a GE b -> a GE b
>>>>>           a GT b -> a GT b
>>>>>           a LE b -> b GE a
>>>>>           a LT b -> b GT a
>>>>> -        a EQ b -> a EQ b  */
>>>>> +        a EQ b -> a EQ b
>>>>> +        a NE b -> ~(a EQ b)  */
>>>>>          gcc_assert (comparison != NULL);
>>>>>          emit_insn (comparison (operands[0], operands[2], 
>>>>> operands[3]));
>>>>> -      break;
>>>>> -
>>>>> -    case UNEQ:
>>>>> -      /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>>>>> -        this result will then give us (a == b || a UNORDERED b).  */
>>>>> -      emit_insn (gen_aarch64_cmgt<mode> (operands[0],
>>>>> -                                        operands[2], operands[3]));
>>>>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3],
>>>>> operands[2]));
>>>>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], 
>>>>> tmp));
>>>>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], 
>>>>> operands[0]));
>>>>> +      if (code == NE)
>>>>> +       emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0],
>>>>> operands[0]));
>>>>>          break;
>>>>>          case LTGT:
>>>>> @@ -2823,21 +2831,22 @@
>>>>>          emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0],
>>>>> tmp));
>>>>>          break;
>>>>>    -    case UNORDERED:
>>>>> -      /* Operands are ORDERED iff (a > b || b >= a), so we can 
>>>>> compute
>>>>> -        UNORDERED as !ORDERED.  */
>>>>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2],
>>>>> operands[3]));
>>>>> -      emit_insn (gen_aarch64_cmge<mode> (operands[0],
>>>>> -                                        operands[3], operands[2]));
>>>>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], 
>>>>> tmp));
>>>>> -      emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0], 
>>>>> operands[0]));
>>>>> -      break;
>>>>> -
>>>>>        case ORDERED:
>>>>> -      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2],
>>>>> operands[3]));
>>>>> -      emit_insn (gen_aarch64_cmge<mode> (operands[0],
>>>>> -                                        operands[3], operands[2]));
>>>>> -      emit_insn (gen_ior<v_int_equiv>3 (operands[0], operands[0], 
>>>>> tmp));
>>>>> +    case UNORDERED:
>>>>> +    case UNEQ:
>>>>> +      /* cmeq (a, a) & cmeq (b, b).  */
>>>>> +      emit_insn (gen_aarch64_cmeq<mode> (operands[0],
>>>>> +                                        operands[2], operands[2]));
>>>>> +      emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[3],
>>>>> operands[3]));
>>>>> +      emit_insn (gen_and<v_int_equiv>3 (operands[0], operands[0], 
>>>>> tmp));
>>>>> +
>>>>> +      if (code == UNORDERED)
>>>>> +       emit_insn (gen_one_cmpl<v_int_equiv>2 (operands[0],
>>>>> operands[0]));
>>>>> +      else if (code == UNEQ)
>>>>> +       {
>>>>> +         emit_insn (gen_aarch64_cmeq<mode> (tmp, operands[2],
>>>>> operands[3]));
>>>>> +         emit_insn (gen_orn<v_int_equiv>3 (operands[0], operands[0],
>>>>> tmp));
>>>>> +       }
>>>>>          break;
>>>>>          default:
>>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81647.c
>>>>> b/gcc/testsuite/gcc.target/aarch64/pr81647.c
>>>>> new file mode 100644
>>>>> index
>>>>> 0000000000000000000000000000000000000000..f60dfba49d538e3b2164b11392ab8dbfdba6546e 
>>>>>
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/aarch64/pr81647.c
>>>>> @@ -0,0 +1,44 @@
>>>>> +/* { dg-do run } */
>>>>> +/* { dg-options "-O3 -fdump-tree-ssa" } */
>>>>> +
>>>>> +#include <fenv.h>
>>>>> +
>>>>> +double x[28], y[28];
>>>>> +int res[28];
>>>>> +
>>>>> +int
>>>>> +main (void)
>>>>> +{
>>>>> +  int i;
>>>>> +  for (i = 0; i < 28; ++i)
>>>>> +    {
>>>>> +      x[i] = __builtin_nan ("");
>>>>> +      y[i] = i;
>>>>> +    }
>>>>> +  __asm__ volatile ("" ::: "memory");
>>>>> +  feclearexcept (FE_ALL_EXCEPT);
>>>>> +  for (i = 0; i < 4; ++i)
>>>>> +    res[i] = __builtin_isgreater (x[i], y[i]);
>>>>> +  for (i = 4; i < 8; ++i)
>>>>> +    res[i] = __builtin_isgreaterequal (x[i], y[i]);
>>>>> +  for (i = 8; i < 12; ++i)
>>>>> +    res[i] = __builtin_isless (x[i], y[i]);
>>>>> +  for (i = 12; i < 16; ++i)
>>>>> +    res[i] = __builtin_islessequal (x[i], y[i]);
>>>>> +  for (i = 16; i < 20; ++i)
>>>>> +    res[i] = __builtin_islessgreater (x[i], y[i]);
>>>>> +  for (i = 20; i < 24; ++i)
>>>>> +    res[i] = __builtin_isunordered (x[i], y[i]);
>>>>> +  for (i = 24; i < 28; ++i)
>>>>> +    res[i] = !(__builtin_isunordered (x[i], y[i]));
>>>>> +  __asm__ volatile ("" ::: "memory");
>>>>> +  return fetestexcept (FE_ALL_EXCEPT) != 0;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-tree-dump " u> " "ssa" } } */
>>>>> +/* { dg-final { scan-tree-dump " u>= " "ssa" } } */
>>>>> +/* { dg-final { scan-tree-dump " u< " "ssa" } } */
>>>>> +/* { dg-final { scan-tree-dump " u<= " "ssa" } } */
>>>>> +/* { dg-final { scan-tree-dump " u== " "ssa" } } */
>>>>> +/* { dg-final { scan-tree-dump " unord " "ssa" } } */
>>>>> +/* { dg-final { scan-tree-dump " ord " "ssa" } } */
>>>>
>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-03-28 10:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-15 11:57 [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md Sudakshina Das
2018-01-05 11:47 ` [PATCH PR81647][AARCH64] PING " Sudakshina Das
2018-02-16 10:54 ` [PATCH PR81647][AARCH64][PING] " Sudakshina Das
2018-03-19 14:32 ` [PATCH PR81647][AARCH64] " James Greenhalgh
2018-03-19 19:31   ` Sudakshina Das
2018-03-20  8:19     ` Christophe Lyon
2018-03-20 10:58       ` Sudakshina Das
2018-03-28 11:37         ` Sudakshina Das

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).