[PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.
@ 2016-05-17  9:02 Bin Cheng
  2016-05-24 10:17 ` Bin.Cheng
  2016-05-31 17:07 ` James Greenhalgh
  0 siblings, 2 replies; 5+ messages in thread
From: Bin Cheng @ 2016-05-17  9:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 2230 bytes --]

Hi,
Alan and Renlin noticed that some vcond patterns are not supported in AArch64(or AArch32?) backend, and they both had some patches fixing this.  After investigation, I agree with them that vcond/vcondu in AArch64's backend should be re-implemented using vec_cmp/vcond_mask patterns, so here comes this patch which is based on Alan's.  This patch supports all vcond/vcondu patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to the original patch, it doesn't change GCC's expanding process, and it keeps vcond patterns.  The patch also introduces vec_cmp*_internal to support special case optimization for vcond/vcondu which current implementation does.
Apart from Alan's patch, I also learned ideas from Renlin's, and it is my change that shall be blamed if any potential bug is introduced.

With this patch, GCC's test condition "vect_cond_mixed" can be enabled on AArch64 (in a following patch).
Bootstrap and test on AArch64.  Is it OK?  BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was added before in tree if-conversion patch.

Thanks,
bin

2016-05-11  Alan Lawrence  <alan.lawrence@arm.com>
	    Renlin Li  <renlin.li@arm.com>
	    Bin Cheng  <bin.cheng@arm.com>

	* config/aarch64/iterators.md (V_cmp_mixed, v_cmp_mixed): New.
	* config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Call
	gen_vcondv2div2di instead of gen_aarch64_vcond_internalv2div2di.
	(aarch64_vcond_internal<mode><mode>): Delete pattern.
	(aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>): Ditto.
	(vcond_mask_<mode><v_cmp_result>): New pattern.
	(vec_cmp<mode><mode>_internal, vec_cmp<mode><mode>): New pattern.
	(vec_cmp<mode><v_cmp_result>_internal): New pattern.
	(vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode>): New pattern.
	(vcond<mode><mode>): Re-implement using vec_cmp and vcond_mask.
	(vcondu<mode><mode>): Ditto.
	(vcond<v_cmp_result><mode>): Delete.
	(vcond<v_cmp_mixed><mode>): New pattern.
	(vcondu<mode><v_cmp_mixed>): New pattern.
	(aarch64_cmtst<mode>): Revise comment using aarch64_vcond instead
	of aarch64_vcond_internal.

gcc/testsuite/ChangeLog
2016-05-11  Bin Cheng  <bin.cheng@arm.com>

	* gcc.target/aarch64/vect-vcond.c: New test.

[-- Attachment #2: aarch64-vcond-20160509.txt --]
[-- Type: text/plain, Size: 23406 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bd73bce..f51473a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1053,7 +1053,7 @@
     }
 
   cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]);
-  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
+  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
               operands[2], cmp_fmt, operands[1], operands[2]));
   DONE;
 })
@@ -2225,204 +2225,215 @@
   DONE;
 })
 
-(define_expand "aarch64_vcond_internal<mode><mode>"
+(define_expand "vcond_mask_<mode><v_cmp_result>"
+  [(match_operand:VALLDI 0 "register_operand")
+   (match_operand:VALLDI 1 "nonmemory_operand")
+   (match_operand:VALLDI 2 "nonmemory_operand")
+   (match_operand:<V_cmp_result> 3 "register_operand")]
+  "TARGET_SIMD"
+{
+  /* If we have (a = (P) ? -1 : 0);
+     Then we can simply move the generated mask (result must be int).  */
+  if (operands[1] == CONSTM1_RTX (<MODE>mode)
+      && operands[2] == CONST0_RTX (<MODE>mode))
+    emit_move_insn (operands[0], operands[3]);
+  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  */
+  else if (operands[1] == CONST0_RTX (<MODE>mode)
+	   && operands[2] == CONSTM1_RTX (<MODE>mode))
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[3]));
+  else
+    {
+      if (!REG_P (operands[1]))
+	operands[1] = force_reg (<MODE>mode, operands[1]);
+      if (!REG_P (operands[2]))
+	operands[2] = force_reg (<MODE>mode, operands[2]);
+      emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], operands[3],
+					     operands[1], operands[2]));
+    }
+
+  DONE;
+})
+
+;; Patterns comparing two vectors to produce a mask.
+
+(define_expand "vec_cmp<mode><mode>_internal"
   [(set (match_operand:VSDQ_I_DI 0 "register_operand")
-	(if_then_else:VSDQ_I_DI
-	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VSDQ_I_DI 4 "register_operand")
-	     (match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
-	  (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
-	  (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
   "TARGET_SIMD"
 {
-  rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx mask = gen_reg_rtx (<MODE>mode);
-  enum rtx_code code = GET_CODE (operands[3]);
+  rtx mask = operands[0];
+  enum rtx_code code = GET_CODE (operands[1]);
 
-  /* Switching OP1 and OP2 is necessary for NE (to output a cmeq insn),
-     and desirable for other comparisons if it results in FOO ? -1 : 0
-     (this allows direct use of the comparison result without a bsl).  */
-  if (code == NE
-      || (code != EQ
-	  && op1 == CONST0_RTX (<V_cmp_result>mode)
-	  && op2 == CONSTM1_RTX (<V_cmp_result>mode)))
+  if (operands[3] == CONST0_RTX (<MODE>mode)
+      && (code == NE || code == LE || code == LT
+	  || code == GE || code == GT || code == EQ))
     {
-      op1 = operands[2];
-      op2 = operands[1];
-      switch (code)
-        {
-        case LE: code = GT; break;
-        case LT: code = GE; break;
-        case GE: code = LT; break;
-        case GT: code = LE; break;
-        /* No case EQ.  */
-        case NE: code = EQ; break;
-        case LTU: code = GEU; break;
-        case LEU: code = GTU; break;
-        case GTU: code = LEU; break;
-        case GEU: code = LTU; break;
-        default: gcc_unreachable ();
-        }
+      /* Some instructions have a form taking an immediate zero.  */
+      ;
     }
-
-  /* Make sure we can handle the last operand.  */
-  switch (code)
+  else if (!REG_P (operands[3]))
     {
-    case NE:
-      /* Normalized to EQ above.  */
-      gcc_unreachable ();
-
-    case LE:
-    case LT:
-    case GE:
-    case GT:
-    case EQ:
-      /* These instructions have a form taking an immediate zero.  */
-      if (operands[5] == CONST0_RTX (<MODE>mode))
-        break;
-      /* Fall through, as may need to load into register.  */
-    default:
-      if (!REG_P (operands[5]))
-        operands[5] = force_reg (<MODE>mode, operands[5]);
-      break;
+      /* Make sure we can handle the last operand.  */
+      operands[3] = force_reg (<MODE>mode, operands[3]);
     }
 
   switch (code)
     {
     case LT:
-      emit_insn (gen_aarch64_cmlt<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmlt<mode> (mask, operands[2], operands[3]));
       break;
 
     case GE:
-      emit_insn (gen_aarch64_cmge<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmge<mode> (mask, operands[2], operands[3]));
       break;
 
     case LE:
-      emit_insn (gen_aarch64_cmle<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmle<mode> (mask, operands[2], operands[3]));
       break;
 
     case GT:
-      emit_insn (gen_aarch64_cmgt<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmgt<mode> (mask, operands[2], operands[3]));
       break;
 
     case LTU:
-      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[5], operands[4]));
+      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[3], operands[2]));
       break;
 
     case GEU:
-      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[2], operands[3]));
       break;
 
     case LEU:
-      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[5], operands[4]));
+      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[3], operands[2]));
       break;
 
     case GTU:
-      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[2], operands[3]));
       break;
 
-    /* NE has been normalized to EQ above.  */
+    case NE:
     case EQ:
-      emit_insn (gen_aarch64_cmeq<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmeq<mode> (mask, operands[2], operands[3]));
       break;
 
     default:
       gcc_unreachable ();
     }
 
-    /* If we have (a = (b CMP c) ? -1 : 0);
-       Then we can simply move the generated mask.  */
-
-    if (op1 == CONSTM1_RTX (<V_cmp_result>mode)
-	&& op2 == CONST0_RTX (<V_cmp_result>mode))
-      emit_move_insn (operands[0], mask);
-    else
-      {
-	if (!REG_P (op1))
-	  op1 = force_reg (<MODE>mode, op1);
-	if (!REG_P (op2))
-	  op2 = force_reg (<MODE>mode, op2);
-	emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], mask,
-					       op1, op2));
-      }
+  DONE;
+})
 
+(define_expand "vec_cmp<mode><mode>"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  emit_insn (gen_vec_cmp<mode><mode>_internal (operands[0],
+					 operands[1], operands[2],
+					 operands[3]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  */
+  if (code == NE)
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
   DONE;
 })
 
-(define_expand "aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>"
-  [(set (match_operand:VDQF_COND 0 "register_operand")
-	(if_then_else:VDQF
-	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VDQF 4 "register_operand")
-	     (match_operand:VDQF 5 "nonmemory_operand")])
-	  (match_operand:VDQF_COND 1 "nonmemory_operand")
-	  (match_operand:VDQF_COND 2 "nonmemory_operand")))]
+(define_expand "vec_cmp<mode><v_cmp_result>_internal"
+  [(set (match_operand:<V_cmp_result> 0 "register_operand")
+	(match_operator 1 "comparison_operator"
+	    [(match_operand:VDQF 2 "register_operand")
+	     (match_operand:VDQF 3 "nonmemory_operand")]))]
   "TARGET_SIMD"
 {
-  int inverse = 0;
   int use_zero_form = 0;
-  int swap_bsl_operands = 0;
-  rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx mask = gen_reg_rtx (<VDQF_COND:V_cmp_result>mode);
-  rtx tmp = gen_reg_rtx (<VDQF_COND:V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx tmp = gen_reg_rtx (<V_cmp_result>mode);
 
-  rtx (*base_comparison) (rtx, rtx, rtx);
-  rtx (*complimentary_comparison) (rtx, rtx, rtx);
+  rtx (*comparison) (rtx, rtx, rtx);
 
-  switch (GET_CODE (operands[3]))
+  if (operands[3] == CONST0_RTX (<MODE>mode)
+      && (code == LE || code == LT || code == GE || code == GT || code == EQ))
     {
-    case GE:
-    case GT:
-    case LE:
-    case LT:
-    case EQ:
-      if (operands[5] == CONST0_RTX (<MODE>mode))
-	{
-	  use_zero_form = 1;
-	  break;
-	}
-      /* Fall through.  */
-    default:
-      if (!REG_P (operands[5]))
-	operands[5] = force_reg (<VDQF:MODE>mode, operands[5]);
+      /* Some instructions have a form taking an immediate zero.  */
+      use_zero_form = 1;
+    }
+  else if (!REG_P (operands[3]))
+    {
+      /* Make sure we can handle the last operand.  */
+      operands[3] = force_reg (<MODE>mode, operands[3]);
     }
 
-  switch (GET_CODE (operands[3]))
+  switch (code)
     {
     case LT:
-    case UNLT:
-      inverse = 1;
-      /* Fall through.  */
-    case GE:
+      if (use_zero_form)
+	{
+	  comparison = gen_aarch64_cmlt<mode>;
+	  break;
+	}
+      /* Else, fall through.  */
     case UNGE:
-    case ORDERED:
-    case UNORDERED:
-      base_comparison = gen_aarch64_cmge<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmgt<VDQF:mode>;
-      break;
-    case LE:
-    case UNLE:
-      inverse = 1;
+      std::swap (operands[2], operands[3]);
       /* Fall through.  */
+    case UNLE:
     case GT:
+      comparison = gen_aarch64_cmgt<mode>;
+      break;
+    case LE:
+      if (use_zero_form)
+	{
+	  comparison = gen_aarch64_cmle<mode>;
+	  break;
+	}
+      /* Else, fall through.  */
     case UNGT:
-      base_comparison = gen_aarch64_cmgt<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmge<VDQF:mode>;
+      std::swap (operands[2], operands[3]);
+      /* Fall through.  */
+    case UNLT:
+    case GE:
+      comparison = gen_aarch64_cmge<mode>;
       break;
-    case EQ:
     case NE:
+    case EQ:
+      comparison = gen_aarch64_cmeq<mode>;
+      break;
     case UNEQ:
-      base_comparison = gen_aarch64_cmeq<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmeq<VDQF:mode>;
+    case ORDERED:
+    case UNORDERED:
       break;
     default:
       gcc_unreachable ();
     }
 
-  switch (GET_CODE (operands[3]))
+  switch (code)
     {
+    case UNGT:
+    case UNGE:
+    case NE:
+    case UNLT:
+    case UNLE:
+      /* FCM returns false for lanes which are unordered, so if we use
+	 the inverse of the comparison we actually want to emit, then
+	 revert the result, we will end up with the correct result.
+	 Note that a NE NaN and NaN NE b are true for all a, b.
+
+	 Our transformations are:
+	 a GE b -> !(b GT a)
+	 a GT b -> !(b GE a)
+	 a LE b -> !(a GT b)
+	 a LT b -> !(a GE b)
+	 a NE b -> !(a EQ b)
+
+	 Note we revert the result in vec_cmp.
+
+	 Fall through.  */
     case LT:
     case LE:
     case GT:
@@ -2437,99 +2448,69 @@
 	 a EQ b -> a EQ b
 	 Note that there also exist direct comparison against 0 forms,
 	 so catch those as a special case.  */
-      if (use_zero_form)
-	{
-	  inverse = 0;
-	  switch (GET_CODE (operands[3]))
-	    {
-	    case LT:
-	      base_comparison = gen_aarch64_cmlt<VDQF:mode>;
-	      break;
-	    case LE:
-	      base_comparison = gen_aarch64_cmle<VDQF:mode>;
-	      break;
-	    default:
-	      /* Do nothing, other zero form cases already have the correct
-		 base_comparison.  */
-	      break;
-	    }
-	}
 
-      if (!inverse)
-	emit_insn (base_comparison (mask, operands[4], operands[5]));
-      else
-	emit_insn (complimentary_comparison (mask, operands[5], operands[4]));
+      emit_insn (comparison (operands[0], operands[2], operands[3]));
       break;
-    case UNLT:
-    case UNLE:
-    case UNGT:
-    case UNGE:
-    case NE:
-      /* FCM returns false for lanes which are unordered, so if we use
-	 the inverse of the comparison we actually want to emit, then
-	 swap the operands to BSL, we will end up with the correct result.
-	 Note that a NE NaN and NaN NE b are true for all a, b.
-
-	 Our transformations are:
-	 a GE b -> !(b GT a)
-	 a GT b -> !(b GE a)
-	 a LE b -> !(a GT b)
-	 a LT b -> !(a GE b)
-	 a NE b -> !(a EQ b)  */
 
-      if (inverse)
-	emit_insn (base_comparison (mask, operands[4], operands[5]));
-      else
-	emit_insn (complimentary_comparison (mask, operands[5], operands[4]));
-
-      swap_bsl_operands = 1;
-      break;
     case UNEQ:
-      /* We check (a > b ||  b > a).  combining these comparisons give us
-	 true iff !(a != b && a ORDERED b), swapping the operands to BSL
-	 will then give us (a == b ||  a UNORDERED b) as intended.  */
-
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (mask, operands[4], operands[5]));
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (tmp, operands[5], operands[4]));
-      emit_insn (gen_ior<VDQF_COND:v_cmp_result>3 (mask, mask, tmp));
-      swap_bsl_operands = 1;
+      /* We first check (a > b ||  b > a) which is !UNEQ, reverting
+	 this result will then give us (a == b || a UNORDERED b).
+	 Note we revert the result in vec_cmp.  */
+
+      emit_insn (gen_aarch64_cmgt<mode> (operands[0],
+					 operands[2], operands[3]));
+      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
+      emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
       break;
     case UNORDERED:
-       /* Operands are ORDERED iff (a > b || b >= a).
-	 Swapping the operands to BSL will give the UNORDERED case.  */
-     swap_bsl_operands = 1;
-     /* Fall through.  */
+      /* Operands are ORDERED iff (a > b || b >= a), so we can compute
+	 UNORDERED as !ORDERED.  Note we revert the result in vec_cmp.
+
+	 Fall through.  */
     case ORDERED:
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (tmp, operands[4], operands[5]));
-      emit_insn (gen_aarch64_cmge<VDQF:mode> (mask, operands[5], operands[4]));
-      emit_insn (gen_ior<VDQF_COND:v_cmp_result>3 (mask, mask, tmp));
+      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2], operands[3]));
+      emit_insn (gen_aarch64_cmge<mode> (operands[0],
+					 operands[3], operands[2]));
+      emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
       break;
     default:
       gcc_unreachable ();
     }
 
-  if (swap_bsl_operands)
-    {
-      op1 = operands[2];
-      op2 = operands[1];
-    }
+  DONE;
+})
 
-    /* If we have (a = (b CMP c) ? -1 : 0);
-       Then we can simply move the generated mask.  */
+(define_expand "vec_cmp<mode><v_cmp_result>"
+  [(set (match_operand:<V_cmp_result> 0 "register_operand")
+	(match_operator 1 "comparison_operator"
+	    [(match_operand:VDQF 2 "register_operand")
+	     (match_operand:VDQF 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (operands[0],
+					 operands[1], operands[2],
+					 operands[3]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
 
-    if (op1 == CONSTM1_RTX (<VDQF_COND:V_cmp_result>mode)
-	&& op2 == CONST0_RTX (<VDQF_COND:V_cmp_result>mode))
-      emit_move_insn (operands[0], mask);
-    else
-      {
-	if (!REG_P (op1))
-	  op1 = force_reg (<VDQF_COND:MODE>mode, op1);
-	if (!REG_P (op2))
-	  op2 = force_reg (<VDQF_COND:MODE>mode, op2);
-	emit_insn (gen_aarch64_simd_bsl<VDQF_COND:mode> (operands[0], mask,
-					       op1, op2));
-      }
+  DONE;
+})
 
+(define_expand "vec_cmpu<mode><mode>"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_vec_cmp<mode><mode> (operands[0], operands[1],
+				      operands[2], operands[3]));
   DONE;
 })
 
@@ -2543,26 +2524,50 @@
 	  (match_operand:VALLDI 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<mode><mode> (operands[0], operands[1],
-					       operands[2], operands[3],
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (mask, operands[3],
 					       operands[4], operands[5]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  In this case we can simply swap the operands
+     to bsl.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
   DONE;
 })
 
-(define_expand "vcond<v_cmp_result><mode>"
-  [(set (match_operand:<V_cmp_result> 0 "register_operand")
-	(if_then_else:<V_cmp_result>
+(define_expand "vcond<v_cmp_mixed><mode>"
+  [(set (match_operand:<V_cmp_mixed> 0 "register_operand")
+	(if_then_else:<V_cmp_mixed>
 	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VDQF 4 "register_operand")
-	     (match_operand:VDQF 5 "nonmemory_operand")])
-	  (match_operand:<V_cmp_result> 1 "nonmemory_operand")
-	  (match_operand:<V_cmp_result> 2 "nonmemory_operand")))]
+	    [(match_operand:VDQF_COND 4 "register_operand")
+	     (match_operand:VDQF_COND 5 "nonmemory_operand")])
+	  (match_operand:<V_cmp_mixed> 1 "nonmemory_operand")
+	  (match_operand:<V_cmp_mixed> 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<v_cmp_result><mode> (
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (mask, operands[3],
+					       operands[4], operands[5]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  In this case we can simply swap the operands
+     to bsl.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<v_cmp_mixed><v_cmp_result> (
 						operands[0], operands[1],
-						operands[2], operands[3],
-						operands[4], operands[5]));
+						operands[2], mask));
   DONE;
 })
 
@@ -2576,9 +2581,48 @@
 	  (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<mode><mode> (operands[0], operands[1],
-					       operands[2], operands[3],
+  rtx mask = gen_reg_rtx (<MODE>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><mode>_internal (mask, operands[3],
 					       operands[4], operands[5]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  In this case we can simply swap the operands
+     to bsl.  */
+  if (code == NE)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
+  DONE;
+})
+
+(define_expand "vcondu<mode><v_cmp_mixed>"
+  [(set (match_operand:VDQF 0 "register_operand")
+	(if_then_else:VDQF
+	  (match_operator 3 "comparison_operator"
+	    [(match_operand:<V_cmp_mixed> 4 "register_operand")
+	     (match_operand:<V_cmp_mixed> 5 "nonmemory_operand")])
+	  (match_operand:VDQF 1 "nonmemory_operand")
+	  (match_operand:VDQF 2 "nonmemory_operand")))]
+  "TARGET_SIMD"
+{
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<v_cmp_mixed><v_cmp_mixed>_internal (
+						  mask, operands[3],
+						  operands[4], operands[5]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  In this case we can simply swap the operands
+     to bsl.  */
+  if (code == NE)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
   DONE;
 })
 
@@ -4179,7 +4223,7 @@
 ;; cmtst
 
 ;; Although neg (ne (and x y) 0) is the natural way of expressing a cmtst,
-;; we don't have any insns using ne, and aarch64_vcond_internal outputs
+;; we don't have any insns using ne, and aarch64_vcond outputs
 ;; not (neg (eq (and x y) 0))
 ;; which is rewritten by simplify_rtx as
 ;; plus (eq (and x y) 0) -1.
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d9bd391..2f15678 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -610,6 +610,16 @@
 				(V2DF "v2di") (DF    "di")
 				(SF   "si")])
 
+;; Mode for vector conditional operations where the comparison has
+;; different type from the lhs.
+(define_mode_attr V_cmp_mixed [(V2SI "V2SF") (V4SI "V4SF")
+			       (V2DI "V2DF") (V2SF "V2SI")
+			       (V4SF "V4SI") (V2DF "V2DI")])
+
+(define_mode_attr v_cmp_mixed [(V2SI "v2sf") (V4SI "v4sf")
+			       (V2DI "v2df") (V2SF "v2si")
+			       (V4SF "v4si") (V2DF "v2di")])
+
 ;; Lower case element modes (as used in shift immediate patterns).
 (define_mode_attr ve_mode [(V8QI "qi") (V16QI "qi")
 			   (V4HI "hi") (V8HI  "hi")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vcond.c b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c
new file mode 100644
index 0000000..b0ca6d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+
+float fa[1024] = {0.0}, fb[1024] = {0.0}, fc[1024];
+double da[1024] = {0.0}, db[1024] = {0.0}, dc[1024];
+
+int vcond_f_i (int *x)
+{
+  int i = 0;
+  for (i = 0; i < 1024; i++)
+    {
+      fa[i] -= 1.0;
+      fb[i] += 1.0;
+      fc[i] = (x[i] < i) ? fa[i] : fb[i];
+    }
+
+  return 0;
+}
+
+int vcond_f_u (unsigned int *x)
+{
+  unsigned int i = 0;
+  for (i = 0; i < 1024; i++)
+    {
+      fa[i] -= 1.0;
+      fb[i] += 1.0;
+      fc[i] = (x[i] < i) ? fa[i] : fb[i];
+    }
+
+  return 0;
+}
+
+int vcond_d_i (long long *x)
+{
+  long long i = 0;
+  for (i = 0; i < 1024; i++)
+    {
+      da[i] -= 1.0;
+      db[i] += 1.0;
+      dc[i] = (x[i] < i) ? da[i] : db[i];
+    }
+
+  return 0;
+}
+
+int vcond_d_u (unsigned long long *x)
+{
+  unsigned long long i = 0;
+  for (i = 0; i < 1024; i++)
+    {
+      da[i] -= 1.0;
+      db[i] += 1.0;
+      dc[i] = (x[i] < i) ? da[i] : db[i];
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.
  2016-05-17  9:02 [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns Bin Cheng
@ 2016-05-24 10:17 ` Bin.Cheng
  2016-05-31 17:07 ` James Greenhalgh
  1 sibling, 0 replies; 5+ messages in thread
From: Bin.Cheng @ 2016-05-24 10:17 UTC (permalink / raw)
  Cc: gcc-patches

Ping.

Thanks,
bin

On Tue, May 17, 2016 at 10:02 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> Alan and Renlin noticed that some vcond patterns are not supported in AArch64(or AArch32?) backend, and they both had some patches fixing this.  After investigation, I agree with them that vcond/vcondu in AArch64's backend should be re-implemented using vec_cmp/vcond_mask patterns, so here comes this patch which is based on Alan's.  This patch supports all vcond/vcondu patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to the original patch, it doesn't change GCC's expanding process, and it keeps vcond patterns.  The patch also introduces vec_cmp*_internal to support special case optimization for vcond/vcondu which current implementation does.
> Apart from Alan's patch, I also learned ideas from Renlin's, and it is my change that shall be blamed if any potential bug is introduced.
>
> With this patch, GCC's test condition "vect_cond_mixed" can be enabled on AArch64 (in a following patch).
> Bootstrap and test on AArch64.  Is it OK?  BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was added before in tree if-conversion patch.
>
> Thanks,
> bin
>
> 2016-05-11  Alan Lawrence  <alan.lawrence@arm.com>
>             Renlin Li  <renlin.li@arm.com>
>             Bin Cheng  <bin.cheng@arm.com>
>
>         * config/aarch64/iterators.md (V_cmp_mixed, v_cmp_mixed): New.
>         * config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Call
>         gen_vcondv2div2di instead of gen_aarch64_vcond_internalv2div2di.
>         (aarch64_vcond_internal<mode><mode>): Delete pattern.
>         (aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>): Ditto.
>         (vcond_mask_<mode><v_cmp_result>): New pattern.
>         (vec_cmp<mode><mode>_internal, vec_cmp<mode><mode>): New pattern.
>         (vec_cmp<mode><v_cmp_result>_internal): New pattern.
>         (vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode>): New pattern.
>         (vcond<mode><mode>): Re-implement using vec_cmp and vcond_mask.
>         (vcondu<mode><mode>): Ditto.
>         (vcond<v_cmp_result><mode>): Delete.
>         (vcond<v_cmp_mixed><mode>): New pattern.
>         (vcondu<mode><v_cmp_mixed>): New pattern.
>         (aarch64_cmtst<mode>): Revise comment using aarch64_vcond instead
>         of aarch64_vcond_internal.
>
> gcc/testsuite/ChangeLog
> 2016-05-11  Bin Cheng  <bin.cheng@arm.com>
>
>         * gcc.target/aarch64/vect-vcond.c: New test.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.
  2016-05-17  9:02 [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns Bin Cheng
  2016-05-24 10:17 ` Bin.Cheng
@ 2016-05-31 17:07 ` James Greenhalgh
  2016-06-08 14:58   ` Bin Cheng
  1 sibling, 1 reply; 5+ messages in thread
From: James Greenhalgh @ 2016-05-31 17:07 UTC (permalink / raw)
  To: Bin Cheng; +Cc: gcc-patches, nd

On Tue, May 17, 2016 at 09:02:22AM +0000, Bin Cheng wrote:
> Hi,
> Alan and Renlin noticed that some vcond patterns are not supported in
> AArch64(or AArch32?) backend, and they both had some patches fixing this.
> After investigation, I agree with them that vcond/vcondu in AArch64's backend
> should be re-implemented using vec_cmp/vcond_mask patterns, so here comes
> this patch which is based on Alan's.  This patch supports all vcond/vcondu
> patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to
> the original patch, it doesn't change GCC's expanding process, and it keeps
> vcond patterns.  The patch also introduces vec_cmp*_internal to support
> special case optimization for vcond/vcondu which current implementation does.
> Apart from Alan's patch, I also learned ideas from Renlin's, and it is my
> change that shall be blamed if any potential bug is introduced.
> 
> With this patch, GCC's test condition "vect_cond_mixed" can be enabled on
> AArch64 (in a following patch).  Bootstrap and test on AArch64.  Is it OK?
> BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was
> added before in tree if-conversion patch.

Splitting this patch would have been very helpful. One patch each for the
new standard pattern names, and one patch for the refactor of vcond. As
it is, this patch is rather difficult to read.

> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index bd73bce..f51473a 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1053,7 +1053,7 @@
>      }
>  
>    cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]);
> -  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
> +  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
>                operands[2], cmp_fmt, operands[1], operands[2]));
>    DONE;
>  })
> @@ -2225,204 +2225,215 @@
>    DONE;
>  })
>  
> -(define_expand "aarch64_vcond_internal<mode><mode>"
> +(define_expand "vcond_mask_<mode><v_cmp_result>"
> +  [(match_operand:VALLDI 0 "register_operand")
> +   (match_operand:VALLDI 1 "nonmemory_operand")
> +   (match_operand:VALLDI 2 "nonmemory_operand")
> +   (match_operand:<V_cmp_result> 3 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  /* If we have (a = (P) ? -1 : 0);
> +     Then we can simply move the generated mask (result must be int).  */
> +  if (operands[1] == CONSTM1_RTX (<MODE>mode)
> +      && operands[2] == CONST0_RTX (<MODE>mode))
> +    emit_move_insn (operands[0], operands[3]);
> +  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  */
> +  else if (operands[1] == CONST0_RTX (<MODE>mode)
> +	   && operands[2] == CONSTM1_RTX (<MODE>mode))
> +    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[3]));
> +  else
> +    {
> +      if (!REG_P (operands[1]))
> +	operands[1] = force_reg (<MODE>mode, operands[1]);
> +      if (!REG_P (operands[2]))
> +	operands[2] = force_reg (<MODE>mode, operands[2]);
> +      emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], operands[3],
> +					     operands[1], operands[2]));
> +    }
> +
> +  DONE;
> +})
> +

This pattern is fine.

> +;; Patterns comparing two vectors to produce a mask.

This comment is insufficient. The logic in vec_cmp<mode><mode>_internal
does not always return the expected mask (in particular for NE), but this
is not made clear in the comment.

> +
> +(define_expand "vec_cmp<mode><mode>_internal"
>    [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> -	(if_then_else:VSDQ_I_DI
> -	  (match_operator 3 "comparison_operator"
> -	    [(match_operand:VSDQ_I_DI 4 "register_operand")
> -	     (match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
> -	  (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
> -	  (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
> +	  (match_operator 1 "comparison_operator"
> +	    [(match_operand:VSDQ_I_DI 2 "register_operand")
> +	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
>    "TARGET_SIMD"

<snip>

> +(define_expand "vec_cmp<mode><mode>"
> +  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> +	  (match_operator 1 "comparison_operator"
> +	    [(match_operand:VSDQ_I_DI 2 "register_operand")
> +	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
> +  "TARGET_SIMD"
> +{
> +  enum rtx_code code = GET_CODE (operands[1]);
> +
> +  emit_insn (gen_vec_cmp<mode><mode>_internal (operands[0],
> +					 operands[1], operands[2],
> +					 operands[3]));
> +  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
> +     operators are computed using other operators, and we need to
> +     revert the result.  */

s/revert/invert

There are no comments in vec_cmp<mode><v_cmp_result>_internal for this
to refer to. But, there should be - more comments would definitely help
this patch!

These comments apply equally to the other copies of this comment in the
patch.

> +  if (code == NE)
> +    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>    DONE;
>  })
>  
> -(define_expand "aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>"
> -  [(set (match_operand:VDQF_COND 0 "register_operand")
> -	(if_then_else:VDQF
> -	  (match_operator 3 "comparison_operator"
> -	    [(match_operand:VDQF 4 "register_operand")
> -	     (match_operand:VDQF 5 "nonmemory_operand")])
> -	  (match_operand:VDQF_COND 1 "nonmemory_operand")
> -	  (match_operand:VDQF_COND 2 "nonmemory_operand")))]
> +(define_expand "vec_cmp<mode><v_cmp_result>_internal"
> +  [(set (match_operand:<V_cmp_result> 0 "register_operand")
> +	(match_operator 1 "comparison_operator"
> +	    [(match_operand:VDQF 2 "register_operand")
> +	     (match_operand:VDQF 3 "nonmemory_operand")]))]

This needs much more thorough commenting. The logic is now extremely
difficult to verify as to which pattern is responible for inverting the
mask. This makes it hard to spot whether the inversions applied to the
unordered comparisons are correct. I can't really review the patch with
the logic split across uncommented patterns in this way - is there
anything you can do to clean it up?

> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vcond.c b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c
> new file mode 100644
> index 0000000..b0ca6d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c

None of this is AArch64 specific?

> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +
> +float fa[1024] = {0.0}, fb[1024] = {0.0}, fc[1024];
> +double da[1024] = {0.0}, db[1024] = {0.0}, dc[1024];
> +
> +int vcond_f_i (int *x)
> +{
> +  int i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      fa[i] -= 1.0;
> +      fb[i] += 1.0;
> +      fc[i] = (x[i] < i) ? fa[i] : fb[i];
> +    }
> +
> +  return 0;
> +}
> +
> +int vcond_f_u (unsigned int *x)
> +{
> +  unsigned int i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      fa[i] -= 1.0;
> +      fb[i] += 1.0;
> +      fc[i] = (x[i] < i) ? fa[i] : fb[i];
> +    }
> +
> +  return 0;
> +}
> +
> +int vcond_d_i (long long *x)
> +{
> +  long long i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      da[i] -= 1.0;
> +      db[i] += 1.0;
> +      dc[i] = (x[i] < i) ? da[i] : db[i];
> +    }
> +
> +  return 0;
> +}
> +
> +int vcond_d_u (unsigned long long *x)
> +{
> +  unsigned long long i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      da[i] -= 1.0;
> +      db[i] += 1.0;
> +      dc[i] = (x[i] < i) ? da[i] : db[i];
> +    }
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */

Thanks,
James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.
  2016-05-31 17:07 ` James Greenhalgh
@ 2016-06-08 14:58   ` Bin Cheng
  2016-06-14 14:51     ` James Greenhalgh
  0 siblings, 1 reply; 5+ messages in thread
From: Bin Cheng @ 2016-06-08 14:58 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches, nd

[-- Attachment #1: Type: text/plain, Size: 10424 bytes --]

> From: James Greenhalgh <james.greenhalgh@arm.com>
> Sent: 31 May 2016 16:24
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org; nd
> Subject: Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.
>     
> On Tue, May 17, 2016 at 09:02:22AM +0000, Bin Cheng wrote:
> > Hi,
> > Alan and Renlin noticed that some vcond patterns are not supported in
> > AArch64(or AArch32?) backend, and they both had some patches fixing this.
> > After investigation, I agree with them that vcond/vcondu in AArch64's backend
> > should be re-implemented using vec_cmp/vcond_mask patterns, so here comes
> > this patch which is based on Alan's.  This patch supports all vcond/vcondu
> > patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to
> > the original patch, it doesn't change GCC's expanding process, and it keeps
> > vcond patterns.  The patch also introduces vec_cmp*_internal to support
> > special case optimization for vcond/vcondu which current implementation does.
> > Apart from Alan's patch, I also learned ideas from Renlin's, and it is my
> > change that shall be blamed if any potential bug is introduced.
> > 
> > With this patch, GCC's test condition "vect_cond_mixed" can be enabled on
> > AArch64 (in a following patch).  Bootstrap and test on AArch64.  Is it OK?
> > BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was
> > added before in tree if-conversion patch.
> 
> Splitting this patch would have been very helpful. One patch each for the
> new standard pattern names, and one patch for the refactor of vcond. As
> it is, this patch is rather difficult to read.
Done, patch split into two with one implementing new vcond_mask&vec_cmp patterns, and another re-writing vcond patterns.

> 
> > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> > index bd73bce..f51473a 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -1053,7 +1053,7 @@
> >      }
> >  
> >    cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]);
> > -  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
> > +  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
> >                operands[2], cmp_fmt, operands[1], operands[2]));
> >    DONE;
> >  })
> > @@ -2225,204 +2225,215 @@
> >    DONE;
> >  })
> >  
> > -(define_expand "aarch64_vcond_internal<mode><mode>"
> > +(define_expand "vcond_mask_<mode><v_cmp_result>"
> > +  [(match_operand:VALLDI 0 "register_operand")
> > +   (match_operand:VALLDI 1 "nonmemory_operand")
> > +   (match_operand:VALLDI 2 "nonmemory_operand")
> > +   (match_operand:<V_cmp_result> 3 "register_operand")]
> > +  "TARGET_SIMD"
> > +{
> > +  /* If we have (a = (P) ? -1 : 0);
> > +     Then we can simply move the generated mask (result must be int).  */
> > +  if (operands[1] == CONSTM1_RTX (<MODE>mode)
> > +      && operands[2] == CONST0_RTX (<MODE>mode))
> > +    emit_move_insn (operands[0], operands[3]);
> > +  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  */
> > +  else if (operands[1] == CONST0_RTX (<MODE>mode)
> > +        && operands[2] == CONSTM1_RTX (<MODE>mode))
> > +    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[3]));
> > +  else
> > +    {
> > +      if (!REG_P (operands[1]))
> > +     operands[1] = force_reg (<MODE>mode, operands[1]);
> > +      if (!REG_P (operands[2]))
> > +     operands[2] = force_reg (<MODE>mode, operands[2]);
> > +      emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], operands[3],
> > +                                          operands[1], operands[2]));
> > +    }
> > +
> > +  DONE;
> > +})
> > +
> 
> This pattern is fine.
> 
> > +;; Patterns comparing two vectors to produce a mask.
> 
> This comment is insufficient. The logic in vec_cmp<mode><mode>_internal
> does not always return the expected mask (in particular for NE), but this
> is not made clear in the comment.
Comments added to various places to make the code easier to understand and maintain.

> 
> > +
> > +(define_expand "vec_cmp<mode><mode>_internal"
> >    [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> > -     (if_then_else:VSDQ_I_DI
> > -       (match_operator 3 "comparison_operator"
> > -         [(match_operand:VSDQ_I_DI 4 "register_operand")
> > -          (match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
> > -       (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
> > -       (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
> > +       (match_operator 1 "comparison_operator"
> > +         [(match_operand:VSDQ_I_DI 2 "register_operand")
> > +          (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
> >    "TARGET_SIMD"
> 
> <snip>
> 
> > +(define_expand "vec_cmp<mode><mode>"
> > +  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> > +       (match_operator 1 "comparison_operator"
> > +         [(match_operand:VSDQ_I_DI 2 "register_operand")
> > +          (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
> > +  "TARGET_SIMD"
> > +{
> > +  enum rtx_code code = GET_CODE (operands[1]);
> > +
> > +  emit_insn (gen_vec_cmp<mode><mode>_internal (operands[0],
> > +                                      operands[1], operands[2],
> > +                                      operands[3]));
> > +  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
> > +     operators are computed using other operators, and we need to
> > +     revert the result.  */
> 
> s/revert/invert
> 
> There are no comments in vec_cmp<mode><v_cmp_result>_internal for this
> to refer to. But, there should be - more comments would definitely help
> this patch!
> 
> These comments apply equally to the other copies of this comment in the
> patch.
Done.

> 
> > +  if (code == NE)
> > +    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
> >    DONE;
> >  })
> >  
> > -(define_expand "aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>"
> > -  [(set (match_operand:VDQF_COND 0 "register_operand")
> > -     (if_then_else:VDQF
> > -       (match_operator 3 "comparison_operator"
> > -         [(match_operand:VDQF 4 "register_operand")
> > -          (match_operand:VDQF 5 "nonmemory_operand")])
> > -       (match_operand:VDQF_COND 1 "nonmemory_operand")
> > -       (match_operand:VDQF_COND 2 "nonmemory_operand")))]
> > +(define_expand "vec_cmp<mode><v_cmp_result>_internal"
> > +  [(set (match_operand:<V_cmp_result> 0 "register_operand")
> > +     (match_operator 1 "comparison_operator"
> > +         [(match_operand:VDQF 2 "register_operand")
> > +          (match_operand:VDQF 3 "nonmemory_operand")]))]
> 
> This needs much more thorough commenting. The logic is now extremely
> difficult to verify as to which pattern is responible for inverting the
> mask. This makes it hard to spot whether the inversions applied to the
> unordered comparisons are correct. I can't really review the patch with
> the logic split across uncommented patterns in this way - is there
> anything you can do to clean it up?
Done.

> 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vcond.c b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c
> > new file mode 100644
> > index 0000000..b0ca6d1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c
> 
> None of this is AArch64 specific?
Test case removed since I will enable vect_cond_mixed on AArch64, which will enables various generic cases testing mixed vcond vectorization, which makes adding AArch64 specific test unnecessary here.

> 
> > @@ -0,0 +1,59 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> > +
> > +float fa[1024] = {0.0}, fb[1024] = {0.0}, fc[1024];
> > +double da[1024] = {0.0}, db[1024] = {0.0}, dc[1024];
> > +
> > +int vcond_f_i (int *x)
> > +{
> > +  int i = 0;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      fa[i] -= 1.0;
> > +      fb[i] += 1.0;
> > +      fc[i] = (x[i] < i) ? fa[i] : fb[i];
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +int vcond_f_u (unsigned int *x)
> > +{
> > +  unsigned int i = 0;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      fa[i] -= 1.0;
> > +      fb[i] += 1.0;
> > +      fc[i] = (x[i] < i) ? fa[i] : fb[i];
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +int vcond_d_i (long long *x)
> > +{
> > +  long long i = 0;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      da[i] -= 1.0;
> > +      db[i] += 1.0;
> > +      dc[i] = (x[i] < i) ? da[i] : db[i];
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +int vcond_d_u (unsigned long long *x)
> > +{
> > +  unsigned long long i = 0;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      da[i] -= 1.0;
> > +      db[i] += 1.0;
> > +      dc[i] = (x[i] < i) ? da[i] : db[i];
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
> 

Patches attached.  Bootstrap and test on AArch64.  Is it OK?

Thanks,
bin

2016-06-07  Alan Lawrence  <alan.lawrence@arm.com>
	    Renlin Li  <renlin.li@arm.com>
	    Bin Cheng  <bin.cheng@arm.com>

	* config/aarch64/aarch64-simd.md (vec_cmp<mode><mode>): New pattern.
	(vec_cmp<mode><mode>_internal): New pattern.
	(vec_cmp<mode><v_cmp_result>): New pattern.
	(vec_cmp<mode><v_cmp_result>_internal): New pattern.
	(vec_cmpu<mode><mode>): New pattern.
	(vcond_mask_<mode><v_cmp_result>): New pattern.


2016-06-07  Alan Lawrence  <alan.lawrence@arm.com>
	    Renlin Li  <renlin.li@arm.com>
	    Bin Cheng  <bin.cheng@arm.com>

	* config/aarch64/iterators.md (V_cmp_mixed, v_cmp_mixed): New.
	* config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Call
	gen_vcondv2div2di instead of gen_aarch64_vcond_internalv2div2di.
	(aarch64_vcond_internal<mode><mode>): Delete pattern.
	(aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>): Ditto.
	(vcond<mode><mode>): Re-implement using vec_cmp and vcond_mask.
	(vcondu<mode><mode>): Ditto.
	(vcond<v_cmp_result><mode>): Delete.
	(vcond<v_cmp_mixed><mode>): New pattern.
	(vcondu<mode><v_cmp_mixed>): New pattern.
	(aarch64_cmtst<mode>): Revise comment using aarch64_vcond instead
	of aarch64_vcond_internal.

[-- Attachment #2: vcond_mask-vec_cmp-20160608.txt --]
[-- Type: text/plain, Size: 10472 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6ea35bf..9437d02 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2202,6 +2202,325 @@
   DONE;
 })
 
+(define_expand "vcond_mask_<mode><v_cmp_result>"
+  [(match_operand:VALLDI 0 "register_operand")
+   (match_operand:VALLDI 1 "nonmemory_operand")
+   (match_operand:VALLDI 2 "nonmemory_operand")
+   (match_operand:<V_cmp_result> 3 "register_operand")]
+  "TARGET_SIMD"
+{
+  /* If we have (a = (P) ? -1 : 0);
+     Then we can simply move the generated mask (result must be int).  */
+  if (operands[1] == CONSTM1_RTX (<MODE>mode)
+      && operands[2] == CONST0_RTX (<MODE>mode))
+    emit_move_insn (operands[0], operands[3]);
+  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  */
+  else if (operands[1] == CONST0_RTX (<MODE>mode)
+	   && operands[2] == CONSTM1_RTX (<MODE>mode))
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[3]));
+  else
+    {
+      if (!REG_P (operands[1]))
+	operands[1] = force_reg (<MODE>mode, operands[1]);
+      if (!REG_P (operands[2]))
+	operands[2] = force_reg (<MODE>mode, operands[2]);
+      emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], operands[3],
+					     operands[1], operands[2]));
+    }
+
+  DONE;
+})
+
+;; Patterns comparing two vectors to produce a mask.
+
+;; Internal pattern for vec_cmp.  It returns expected result mask for
+;; comparison operators other than NE.  For NE operator, it returns
+;; the opposite result mask.  This is intended behavior so that we
+;; can save one mask inverting instruction when using this pattern in
+;; vcond patterns.  In this case, it is the caller's responsibility
+;; to interpret and use the result mask correctly.
+(define_expand "vec_cmp<mode><mode>_internal"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  rtx mask = operands[0];
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  if (operands[3] == CONST0_RTX (<MODE>mode)
+      && (code == NE || code == LE || code == LT
+	  || code == GE || code == GT || code == EQ))
+    {
+      /* Some instructions have a form taking an immediate zero.  */
+      ;
+    }
+  else if (!REG_P (operands[3]))
+    {
+      /* Make sure we can handle the last operand.  */
+      operands[3] = force_reg (<MODE>mode, operands[3]);
+    }
+
+  switch (code)
+    {
+    case LT:
+      emit_insn (gen_aarch64_cmlt<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case GE:
+      emit_insn (gen_aarch64_cmge<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case LE:
+      emit_insn (gen_aarch64_cmle<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case GT:
+      emit_insn (gen_aarch64_cmgt<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case LTU:
+      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[3], operands[2]));
+      break;
+
+    case GEU:
+      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case LEU:
+      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[3], operands[2]));
+      break;
+
+    case GTU:
+      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case NE:
+      /* See comment at the beginning of this pattern, we return the
+	 opposite of result mask for this operator, and it's caller's
+	 resonsibility to invert the mask.
+
+	 Fall through.  */
+
+    case EQ:
+      emit_insn (gen_aarch64_cmeq<mode> (mask, operands[2], operands[3]));
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  /* Don't invert result mask for NE, which should be done in caller.  */
+
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><mode>"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  emit_insn (gen_vec_cmp<mode><mode>_internal (operands[0],
+					 operands[1], operands[2],
+					 operands[3]));
+  /* See comments of vec_cmp<mode><mode>_internal, the opposite result
+     mask is computed for NE operator, we need to invert the mask here.  */
+  if (code == NE)
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
+
+  DONE;
+})
+
+
+;; Internal pattern for vec_cmp.  Similar to vec_cmp<mode<mode>_internal,
+;; it returns the opposite result mask for operators NE, UNEQ, UNLT,
+;; UNLE, UNGT, UNGE and UNORDERED.  This is intended behavior so that
+;; we can save one mask inverting instruction when using this pattern
+;; in vcond patterns.  In these cases, it is the caller's responsibility
+;; to interpret and use the result mask correctly.
+(define_expand "vec_cmp<mode><v_cmp_result>_internal"
+  [(set (match_operand:<V_cmp_result> 0 "register_operand")
+	(match_operator 1 "comparison_operator"
+	    [(match_operand:VDQF 2 "register_operand")
+	     (match_operand:VDQF 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  int use_zero_form = 0;
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx tmp = gen_reg_rtx (<V_cmp_result>mode);
+
+  rtx (*comparison) (rtx, rtx, rtx);
+
+  if (operands[3] == CONST0_RTX (<MODE>mode)
+      && (code == LE || code == LT || code == GE || code == GT || code == EQ))
+    {
+      /* Some instructions have a form taking an immediate zero.  */
+      use_zero_form = 1;
+    }
+  else if (!REG_P (operands[3]))
+    {
+      /* Make sure we can handle the last operand.  */
+      operands[3] = force_reg (<MODE>mode, operands[3]);
+    }
+
+  switch (code)
+    {
+    case LT:
+      if (use_zero_form)
+	{
+	  comparison = gen_aarch64_cmlt<mode>;
+	  break;
+	}
+      /* Else, fall through.  */
+    case UNGE:
+      std::swap (operands[2], operands[3]);
+      /* Fall through.  */
+    case UNLE:
+    case GT:
+      comparison = gen_aarch64_cmgt<mode>;
+      break;
+    case LE:
+      if (use_zero_form)
+	{
+	  comparison = gen_aarch64_cmle<mode>;
+	  break;
+	}
+      /* Else, fall through.  */
+    case UNGT:
+      std::swap (operands[2], operands[3]);
+      /* Fall through.  */
+    case UNLT:
+    case GE:
+      comparison = gen_aarch64_cmge<mode>;
+      break;
+    case NE:
+    case EQ:
+      comparison = gen_aarch64_cmeq<mode>;
+      break;
+    case UNEQ:
+    case ORDERED:
+    case UNORDERED:
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  switch (code)
+    {
+    case UNGT:
+    case UNGE:
+    case NE:
+    case UNLT:
+    case UNLE:
+      /* FCM returns false for lanes which are unordered, so if we use
+	 the inverse of the comparison we actually want to emit, then
+	 revert the result, we will end up with the correct result.
+	 Note that a NE NaN and NaN NE b are true for all a, b.
+
+	 Our transformations are:
+	 a GE b -> !(b GT a)
+	 a GT b -> !(b GE a)
+	 a LE b -> !(a GT b)
+	 a LT b -> !(a GE b)
+	 a NE b -> !(a EQ b)
+
+	 See comment at the beginning of this pattern, we return the
+	 opposite of result mask for these operators, and it's caller's
+	 resonsibility to invert the mask.
+
+	 Fall through.  */
+    case LT:
+    case LE:
+    case GT:
+    case GE:
+    case EQ:
+      /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
+	 As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations are:
+	 a GE b -> a GE b
+	 a GT b -> a GT b
+	 a LE b -> b GE a
+	 a LT b -> b GT a
+	 a EQ b -> a EQ b
+	 Note that there also exist direct comparison against 0 forms,
+	 so catch those as a special case.  */
+
+      emit_insn (comparison (operands[0], operands[2], operands[3]));
+      break;
+
+    case UNEQ:
+      /* We first check (a > b ||  b > a) which is !UNEQ, reverting
+	 this result will then give us (a == b || a UNORDERED b).
+
+	 See comment at the beginning of this pattern, we return the
+	 opposite of result mask for this operator, and it's caller's
+	 resonsibility to invert the mask.  */
+
+      emit_insn (gen_aarch64_cmgt<mode> (operands[0],
+					 operands[2], operands[3]));
+      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
+      emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
+      break;
+    case UNORDERED:
+      /* Operands are ORDERED iff (a > b || b >= a), so we can compute
+	 UNORDERED as !ORDERED.
+
+	 See comment at the beginning of this pattern, we return the
+	 opposite of result mask for this operator, and it's caller's
+	 resonsibility to invert the mask.
+
+	 Fall through.  */
+    case ORDERED:
+      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2], operands[3]));
+      emit_insn (gen_aarch64_cmge<mode> (operands[0],
+					 operands[3], operands[2]));
+      emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><v_cmp_result>"
+  [(set (match_operand:<V_cmp_result> 0 "register_operand")
+	(match_operator 1 "comparison_operator"
+	    [(match_operand:VDQF 2 "register_operand")
+	     (match_operand:VDQF 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (operands[0],
+					 operands[1], operands[2],
+					 operands[3]));
+  /* See comments of vec_cmp<mode><v_cmp_result>_internal, the opposite
+     result masks are computed for below operators, we need to invert
+     the mask here.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
+
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><mode>"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_vec_cmp<mode><mode> (operands[0], operands[1],
+				      operands[2], operands[3]));
+  DONE;
+})
+
 (define_expand "aarch64_vcond_internal<mode><mode>"
   [(set (match_operand:VSDQ_I_DI 0 "register_operand")
 	(if_then_else:VSDQ_I_DI

[-- Attachment #3: aarch64-vcond-20160607.txt --]
[-- Type: text/plain, Size: 15267 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6ea35bf..e080b71 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1053,7 +1053,7 @@
     }
 
   cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]);
-  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
+  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
               operands[2], cmp_fmt, operands[1], operands[2]));
   DONE;
 })
@@ -2202,314 +2202,6 @@
   DONE;
 })
 
-(define_expand "aarch64_vcond_internal<mode><mode>"
-  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
-	(if_then_else:VSDQ_I_DI
-	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VSDQ_I_DI 4 "register_operand")
-	     (match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
-	  (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
-	  (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
-  "TARGET_SIMD"
-{
-  rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx mask = gen_reg_rtx (<MODE>mode);
-  enum rtx_code code = GET_CODE (operands[3]);
-
-  /* Switching OP1 and OP2 is necessary for NE (to output a cmeq insn),
-     and desirable for other comparisons if it results in FOO ? -1 : 0
-     (this allows direct use of the comparison result without a bsl).  */
-  if (code == NE
-      || (code != EQ
-	  && op1 == CONST0_RTX (<V_cmp_result>mode)
-	  && op2 == CONSTM1_RTX (<V_cmp_result>mode)))
-    {
-      op1 = operands[2];
-      op2 = operands[1];
-      switch (code)
-        {
-        case LE: code = GT; break;
-        case LT: code = GE; break;
-        case GE: code = LT; break;
-        case GT: code = LE; break;
-        /* No case EQ.  */
-        case NE: code = EQ; break;
-        case LTU: code = GEU; break;
-        case LEU: code = GTU; break;
-        case GTU: code = LEU; break;
-        case GEU: code = LTU; break;
-        default: gcc_unreachable ();
-        }
-    }
-
-  /* Make sure we can handle the last operand.  */
-  switch (code)
-    {
-    case NE:
-      /* Normalized to EQ above.  */
-      gcc_unreachable ();
-
-    case LE:
-    case LT:
-    case GE:
-    case GT:
-    case EQ:
-      /* These instructions have a form taking an immediate zero.  */
-      if (operands[5] == CONST0_RTX (<MODE>mode))
-        break;
-      /* Fall through, as may need to load into register.  */
-    default:
-      if (!REG_P (operands[5]))
-        operands[5] = force_reg (<MODE>mode, operands[5]);
-      break;
-    }
-
-  switch (code)
-    {
-    case LT:
-      emit_insn (gen_aarch64_cmlt<mode> (mask, operands[4], operands[5]));
-      break;
-
-    case GE:
-      emit_insn (gen_aarch64_cmge<mode> (mask, operands[4], operands[5]));
-      break;
-
-    case LE:
-      emit_insn (gen_aarch64_cmle<mode> (mask, operands[4], operands[5]));
-      break;
-
-    case GT:
-      emit_insn (gen_aarch64_cmgt<mode> (mask, operands[4], operands[5]));
-      break;
-
-    case LTU:
-      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[5], operands[4]));
-      break;
-
-    case GEU:
-      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[4], operands[5]));
-      break;
-
-    case LEU:
-      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[5], operands[4]));
-      break;
-
-    case GTU:
-      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[4], operands[5]));
-      break;
-
-    /* NE has been normalized to EQ above.  */
-    case EQ:
-      emit_insn (gen_aarch64_cmeq<mode> (mask, operands[4], operands[5]));
-      break;
-
-    default:
-      gcc_unreachable ();
-    }
-
-    /* If we have (a = (b CMP c) ? -1 : 0);
-       Then we can simply move the generated mask.  */
-
-    if (op1 == CONSTM1_RTX (<V_cmp_result>mode)
-	&& op2 == CONST0_RTX (<V_cmp_result>mode))
-      emit_move_insn (operands[0], mask);
-    else
-      {
-	if (!REG_P (op1))
-	  op1 = force_reg (<MODE>mode, op1);
-	if (!REG_P (op2))
-	  op2 = force_reg (<MODE>mode, op2);
-	emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], mask,
-					       op1, op2));
-      }
-
-  DONE;
-})
-
-(define_expand "aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>"
-  [(set (match_operand:VDQF_COND 0 "register_operand")
-	(if_then_else:VDQF
-	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VDQF 4 "register_operand")
-	     (match_operand:VDQF 5 "nonmemory_operand")])
-	  (match_operand:VDQF_COND 1 "nonmemory_operand")
-	  (match_operand:VDQF_COND 2 "nonmemory_operand")))]
-  "TARGET_SIMD"
-{
-  int inverse = 0;
-  int use_zero_form = 0;
-  int swap_bsl_operands = 0;
-  rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx mask = gen_reg_rtx (<VDQF_COND:V_cmp_result>mode);
-  rtx tmp = gen_reg_rtx (<VDQF_COND:V_cmp_result>mode);
-
-  rtx (*base_comparison) (rtx, rtx, rtx);
-  rtx (*complimentary_comparison) (rtx, rtx, rtx);
-
-  switch (GET_CODE (operands[3]))
-    {
-    case GE:
-    case GT:
-    case LE:
-    case LT:
-    case EQ:
-      if (operands[5] == CONST0_RTX (<MODE>mode))
-	{
-	  use_zero_form = 1;
-	  break;
-	}
-      /* Fall through.  */
-    default:
-      if (!REG_P (operands[5]))
-	operands[5] = force_reg (<VDQF:MODE>mode, operands[5]);
-    }
-
-  switch (GET_CODE (operands[3]))
-    {
-    case LT:
-    case UNLT:
-      inverse = 1;
-      /* Fall through.  */
-    case GE:
-    case UNGE:
-    case ORDERED:
-    case UNORDERED:
-      base_comparison = gen_aarch64_cmge<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmgt<VDQF:mode>;
-      break;
-    case LE:
-    case UNLE:
-      inverse = 1;
-      /* Fall through.  */
-    case GT:
-    case UNGT:
-      base_comparison = gen_aarch64_cmgt<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmge<VDQF:mode>;
-      break;
-    case EQ:
-    case NE:
-    case UNEQ:
-      base_comparison = gen_aarch64_cmeq<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmeq<VDQF:mode>;
-      break;
-    default:
-      gcc_unreachable ();
-    }
-
-  switch (GET_CODE (operands[3]))
-    {
-    case LT:
-    case LE:
-    case GT:
-    case GE:
-    case EQ:
-      /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
-	 As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations are:
-	 a GE b -> a GE b
-	 a GT b -> a GT b
-	 a LE b -> b GE a
-	 a LT b -> b GT a
-	 a EQ b -> a EQ b
-	 Note that there also exist direct comparison against 0 forms,
-	 so catch those as a special case.  */
-      if (use_zero_form)
-	{
-	  inverse = 0;
-	  switch (GET_CODE (operands[3]))
-	    {
-	    case LT:
-	      base_comparison = gen_aarch64_cmlt<VDQF:mode>;
-	      break;
-	    case LE:
-	      base_comparison = gen_aarch64_cmle<VDQF:mode>;
-	      break;
-	    default:
-	      /* Do nothing, other zero form cases already have the correct
-		 base_comparison.  */
-	      break;
-	    }
-	}
-
-      if (!inverse)
-	emit_insn (base_comparison (mask, operands[4], operands[5]));
-      else
-	emit_insn (complimentary_comparison (mask, operands[5], operands[4]));
-      break;
-    case UNLT:
-    case UNLE:
-    case UNGT:
-    case UNGE:
-    case NE:
-      /* FCM returns false for lanes which are unordered, so if we use
-	 the inverse of the comparison we actually want to emit, then
-	 swap the operands to BSL, we will end up with the correct result.
-	 Note that a NE NaN and NaN NE b are true for all a, b.
-
-	 Our transformations are:
-	 a GE b -> !(b GT a)
-	 a GT b -> !(b GE a)
-	 a LE b -> !(a GT b)
-	 a LT b -> !(a GE b)
-	 a NE b -> !(a EQ b)  */
-
-      if (inverse)
-	emit_insn (base_comparison (mask, operands[4], operands[5]));
-      else
-	emit_insn (complimentary_comparison (mask, operands[5], operands[4]));
-
-      swap_bsl_operands = 1;
-      break;
-    case UNEQ:
-      /* We check (a > b ||  b > a).  combining these comparisons give us
-	 true iff !(a != b && a ORDERED b), swapping the operands to BSL
-	 will then give us (a == b ||  a UNORDERED b) as intended.  */
-
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (mask, operands[4], operands[5]));
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (tmp, operands[5], operands[4]));
-      emit_insn (gen_ior<VDQF_COND:v_cmp_result>3 (mask, mask, tmp));
-      swap_bsl_operands = 1;
-      break;
-    case UNORDERED:
-       /* Operands are ORDERED iff (a > b || b >= a).
-	 Swapping the operands to BSL will give the UNORDERED case.  */
-     swap_bsl_operands = 1;
-     /* Fall through.  */
-    case ORDERED:
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (tmp, operands[4], operands[5]));
-      emit_insn (gen_aarch64_cmge<VDQF:mode> (mask, operands[5], operands[4]));
-      emit_insn (gen_ior<VDQF_COND:v_cmp_result>3 (mask, mask, tmp));
-      break;
-    default:
-      gcc_unreachable ();
-    }
-
-  if (swap_bsl_operands)
-    {
-      op1 = operands[2];
-      op2 = operands[1];
-    }
-
-    /* If we have (a = (b CMP c) ? -1 : 0);
-       Then we can simply move the generated mask.  */
-
-    if (op1 == CONSTM1_RTX (<VDQF_COND:V_cmp_result>mode)
-	&& op2 == CONST0_RTX (<VDQF_COND:V_cmp_result>mode))
-      emit_move_insn (operands[0], mask);
-    else
-      {
-	if (!REG_P (op1))
-	  op1 = force_reg (<VDQF_COND:MODE>mode, op1);
-	if (!REG_P (op2))
-	  op2 = force_reg (<VDQF_COND:MODE>mode, op2);
-	emit_insn (gen_aarch64_simd_bsl<VDQF_COND:mode> (operands[0], mask,
-					       op1, op2));
-      }
-
-  DONE;
-})
-
 (define_expand "vcond<mode><mode>"
   [(set (match_operand:VALLDI 0 "register_operand")
 	(if_then_else:VALLDI
@@ -2520,26 +2212,50 @@
 	  (match_operand:VALLDI 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<mode><mode> (operands[0], operands[1],
-					       operands[2], operands[3],
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (mask, operands[3],
 					       operands[4], operands[5]));
+  /* See comments of vec_cmp<mode><v_cmp_result>_internal, the opposite
+     result masks are computed for below operators, we need to invert
+     the mask here.  In this case we can save an inverting instruction
+     by simply swapping the two operands to bsl.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
   DONE;
 })
 
-(define_expand "vcond<v_cmp_result><mode>"
-  [(set (match_operand:<V_cmp_result> 0 "register_operand")
-	(if_then_else:<V_cmp_result>
+(define_expand "vcond<v_cmp_mixed><mode>"
+  [(set (match_operand:<V_cmp_mixed> 0 "register_operand")
+	(if_then_else:<V_cmp_mixed>
 	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VDQF 4 "register_operand")
-	     (match_operand:VDQF 5 "nonmemory_operand")])
-	  (match_operand:<V_cmp_result> 1 "nonmemory_operand")
-	  (match_operand:<V_cmp_result> 2 "nonmemory_operand")))]
+	    [(match_operand:VDQF_COND 4 "register_operand")
+	     (match_operand:VDQF_COND 5 "nonmemory_operand")])
+	  (match_operand:<V_cmp_mixed> 1 "nonmemory_operand")
+	  (match_operand:<V_cmp_mixed> 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<v_cmp_result><mode> (
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (mask, operands[3],
+					       operands[4], operands[5]));
+  /* See comments of vec_cmp<mode><v_cmp_result>_internal, the opposite
+     result masks are computed for below operators, we need to invert
+     the mask here.  In this case we can save an inverting instruction
+     by simply swapping the two operands to bsl.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<v_cmp_mixed><v_cmp_result> (
 						operands[0], operands[1],
-						operands[2], operands[3],
-						operands[4], operands[5]));
+						operands[2], mask));
   DONE;
 })
 
@@ -2553,9 +2269,48 @@
 	  (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<mode><mode> (operands[0], operands[1],
-					       operands[2], operands[3],
+  rtx mask = gen_reg_rtx (<MODE>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><mode>_internal (mask, operands[3],
 					       operands[4], operands[5]));
+  /* See comments of vec_cmp<mode><mode>_internal, the opposite result
+     mask is computed for NE operator, we need to invert the mask here.
+     In this case we can save an inverting instruction by simply swapping
+     the two operands to bsl.  */
+  if (code == NE)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
+  DONE;
+})
+
+(define_expand "vcondu<mode><v_cmp_mixed>"
+  [(set (match_operand:VDQF 0 "register_operand")
+	(if_then_else:VDQF
+	  (match_operator 3 "comparison_operator"
+	    [(match_operand:<V_cmp_mixed> 4 "register_operand")
+	     (match_operand:<V_cmp_mixed> 5 "nonmemory_operand")])
+	  (match_operand:VDQF 1 "nonmemory_operand")
+	  (match_operand:VDQF 2 "nonmemory_operand")))]
+  "TARGET_SIMD"
+{
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<v_cmp_mixed><v_cmp_mixed>_internal (
+						  mask, operands[3],
+						  operands[4], operands[5]));
+  /* See comments of vec_cmp<mode><mode>_internal, the opposite result
+     mask is computed for NE operator, we need to invert the mask here.
+     In this case we can save an inverting instruction by simply swapping
+     the two operands to bsl.  */
+  if (code == NE)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
   DONE;
 })
 
@@ -4156,7 +3911,7 @@
 ;; cmtst
 
 ;; Although neg (ne (and x y) 0) is the natural way of expressing a cmtst,
-;; we don't have any insns using ne, and aarch64_vcond_internal outputs
+;; we don't have any insns using ne, and aarch64_vcond outputs
 ;; not (neg (eq (and x y) 0))
 ;; which is rewritten by simplify_rtx as
 ;; plus (eq (and x y) 0) -1.
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 43b22d8..f53fe9d 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -610,6 +610,16 @@
 				(V2DF "v2di") (DF    "di")
 				(SF   "si")])
 
+;; Mode for vector conditional operations where the comparison has
+;; different type from the lhs.
+(define_mode_attr V_cmp_mixed [(V2SI "V2SF") (V4SI "V4SF")
+			       (V2DI "V2DF") (V2SF "V2SI")
+			       (V4SF "V4SI") (V2DF "V2DI")])
+
+(define_mode_attr v_cmp_mixed [(V2SI "v2sf") (V4SI "v4sf")
+			       (V2DI "v2df") (V2SF "v2si")
+			       (V4SF "v4si") (V2DF "v2di")])
+
 ;; Lower case element modes (as used in shift immediate patterns).
 (define_mode_attr ve_mode [(V8QI "qi") (V16QI "qi")
 			   (V4HI "hi") (V8HI  "hi")

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.
  2016-06-08 14:58   ` Bin Cheng
@ 2016-06-14 14:51     ` James Greenhalgh
  0 siblings, 0 replies; 5+ messages in thread
From: James Greenhalgh @ 2016-06-14 14:51 UTC (permalink / raw)
  To: Bin Cheng; +Cc: gcc-patches, nd

On Wed, Jun 08, 2016 at 03:57:42PM +0100, Bin Cheng wrote:
> > From: James Greenhalgh <james.greenhalgh@arm.com>
> > Sent: 31 May 2016 16:24
> > To: Bin Cheng
> > Cc: gcc-patches@gcc.gnu.org; nd
> > Subject: Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.
> >     
> > On Tue, May 17, 2016 at 09:02:22AM +0000, Bin Cheng wrote:
> > > Hi,
> > > Alan and Renlin noticed that some vcond patterns are not supported in
> > > AArch64(or AArch32?) backend, and they both had some patches fixing this.
> > > After investigation, I agree with them that vcond/vcondu in AArch64's backend
> > > should be re-implemented using vec_cmp/vcond_mask patterns, so here comes
> > > this patch which is based on Alan's.  This patch supports all vcond/vcondu
> > > patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to
> > > the original patch, it doesn't change GCC's expanding process, and it keeps
> > > vcond patterns.  The patch also introduces vec_cmp*_internal to support
> > > special case optimization for vcond/vcondu which current implementation does.
> > > Apart from Alan's patch, I also learned ideas from Renlin's, and it is my
> > > change that shall be blamed if any potential bug is introduced.
> > > 
> > > With this patch, GCC's test condition "vect_cond_mixed" can be enabled on
> > > AArch64 (in a following patch).  Bootstrap and test on AArch64.  Is it OK?
> > > BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was
> > > added before in tree if-conversion patch.
> > 
> > Splitting this patch would have been very helpful. One patch each for the
> > new standard pattern names, and one patch for the refactor of vcond. As
> > it is, this patch is rather difficult to read.
> Done, patch split into two with one implementing new vcond_mask&vec_cmp
> patterns, and another re-writing vcond patterns.
> 

Hi Bin,

Thanks for the changes. Just to make keeping track of review easier for
me, would you mind resending these in series as two seperate emails.

i.e.

[Patch AArch64 1/2]  Implement vcond_mask and vec_cmp patterns
[Patch AArch64 2/2]  Rewrite vcond patterns

Thanks,
James

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-06-14 14:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-17  9:02 [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns Bin Cheng
2016-05-24 10:17 ` Bin.Cheng
2016-05-31 17:07 ` James Greenhalgh
2016-06-08 14:58   ` Bin Cheng
2016-06-14 14:51     ` James Greenhalgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).