public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>, "jeffreyalaw@gmail.com" <jeffreyalaw@gmail.com>
Subject: RE: [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization.
Date: Mon, 31 Oct 2022 11:34:39 +0000	[thread overview]
Message-ID: <VI1PR08MB5325C56878364315686EBD99FF379@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2209261037560.6652@jbgna.fhfr.qr>

[-- Attachment #1: Type: text/plain, Size: 20724 bytes --]

> 
> The type of the expression should be available via the mode and the
> signedness, no?  So maybe to avoid having both RTX and TREE on the target
> hook pass it a wide_int instead for the divisor?
> 

Done.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* expmed.h (expand_divmod): Pass tree operands down in addition to RTX.
	* expmed.cc (expand_divmod): Likewise.
	* explow.cc (round_push, align_dynamic_address): Likewise.
	* expr.cc (force_operand, expand_expr_divmod): Likewise.
	* optabs.cc (expand_doubleword_mod, expand_doubleword_divmod):
	Likewise.
	* target.h: Include tree-core.
	* target.def (can_special_div_by_const): New.
	* targhooks.cc (default_can_special_div_by_const): New.
	* targhooks.h (default_can_special_div_by_const): New.
	* tree-vect-generic.cc (expand_vector_operation): Use it.
	* doc/tm.texi.in: Document it.
	* doc/tm.texi: Regenerate.
	* tree-vect-patterns.cc (vect_recog_divmod_pattern): Check for support.
	* tree-vect-stmts.cc (vectorizable_operation): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-div-bitmask-1.c: New test.
	* gcc.dg/vect/vect-div-bitmask-2.c: New test.
	* gcc.dg/vect/vect-div-bitmask-3.c: New test.
	* gcc.dg/vect/vect-div-bitmask.h: New file.

--- inline copy of patch ---

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 92bda1a7e14a3c9ea63e151e4a49a818bf4d1bdb..a29f5c39be3f0927f8ef6e094c7a712c0604fb77 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6112,6 +6112,22 @@ instruction pattern.  There is no need for the hook to handle these two
 implementation approaches itself.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST (enum @var{tree_code}, tree @var{vectype}, wide_int @var{constant}, rtx *@var{output}, rtx @var{in0}, rtx @var{in1})
+This hook is used to test whether the target has a special method of
+division of vectors of type @var{vectype} using the value @var{constant},
+and producing a vector of type @var{vectype}.  The division
+will then not be decomposed by the and kept as a div.
+
+When the hook is being used to test whether the target supports a special
+divide, @var{in0}, @var{in1}, and @var{output} are all null.  When the hook
+is being used to emit a division, @var{in0} and @var{in1} are the source
+vectors of type @var{vecttype} and @var{output} is the destination vector of
+type @var{vectype}.
+
+Return true if the operation is possible, emitting instructions for it
+if rtxes are provided and updating @var{output}.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
 This hook should return the decl of a function that implements the
 vectorized variant of the function with the @code{combined_fn} code
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 112462310b134705d860153294287cfd7d4af81d..d5a745a02acdf051ea1da1b04076d058c24ce093 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4164,6 +4164,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_VEC_PERM_CONST
 
+@hook TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST
+
 @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
 
 @hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION
diff --git a/gcc/explow.cc b/gcc/explow.cc
index ddb4d6ae3600542f8d2bb5617cdd3933a9fae6c0..568e0eb1a158c696458ae678f5e346bf34ba0036 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -1037,7 +1037,7 @@ round_push (rtx size)
      TRUNC_DIV_EXPR.  */
   size = expand_binop (Pmode, add_optab, size, alignm1_rtx,
 		       NULL_RTX, 1, OPTAB_LIB_WIDEN);
-  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx,
+  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, size, align_rtx,
 			NULL_RTX, 1);
   size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1);
 
@@ -1203,7 +1203,7 @@ align_dynamic_address (rtx target, unsigned required_align)
 			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
 				       Pmode),
 			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
-  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, target,
 			  gen_int_mode (required_align / BITS_PER_UNIT,
 					Pmode),
 			  NULL_RTX, 1);
diff --git a/gcc/expmed.h b/gcc/expmed.h
index 0b2538c4c6bd51dfdc772ef70bdf631c0bed8717..0db2986f11ff4a4b10b59501c6f33cb3595659b5 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -708,8 +708,9 @@ extern rtx expand_variable_shift (enum tree_code, machine_mode,
 extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
 			 int);
 #ifdef GCC_OPTABS_H
-extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
-			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
+extern rtx expand_divmod (int, enum tree_code, machine_mode, tree, tree,
+			  rtx, rtx, rtx, int,
+			  enum optab_methods = OPTAB_LIB_WIDEN);
 #endif
 #endif
 
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 8d7418be418406e72a895ecddf2dc7fdb950c76c..bab020c07222afa38305ef8d7333f271b1965b78 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -4222,8 +4222,8 @@ expand_sdiv_pow2 (scalar_int_mode mode, rtx op0, HOST_WIDE_INT d)
 
 rtx
 expand_divmod (int rem_flag, enum tree_code code, machine_mode mode,
-	       rtx op0, rtx op1, rtx target, int unsignedp,
-	       enum optab_methods methods)
+	       tree treeop0, tree treeop1, rtx op0, rtx op1, rtx target,
+	       int unsignedp, enum optab_methods methods)
 {
   machine_mode compute_mode;
   rtx tquotient;
@@ -4375,6 +4375,17 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode,
 
   last_div_const = ! rem_flag && op1_is_constant ? INTVAL (op1) : 0;
 
+  /* Check if the target has specific expansions for the division.  */
+  tree cst;
+  if (treeop0
+      && treeop1
+      && (cst = uniform_integer_cst_p (treeop1))
+      && targetm.vectorize.can_special_div_by_const (code, TREE_TYPE (treeop0),
+						     wi::to_wide (cst),
+						     &target, op0, op1))
+    return target;
+
+
   /* Now convert to the best mode to use.  */
   if (compute_mode != mode)
     {
@@ -4618,8 +4629,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode,
 			    || (optab_handler (sdivmod_optab, int_mode)
 				!= CODE_FOR_nothing)))
 		      quotient = expand_divmod (0, TRUNC_DIV_EXPR,
-						int_mode, op0,
-						gen_int_mode (abs_d,
+						int_mode, treeop0, treeop1,
+						op0, gen_int_mode (abs_d,
 							      int_mode),
 						NULL_RTX, 0);
 		    else
@@ -4808,8 +4819,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode,
 				      size - 1, NULL_RTX, 0);
 		t3 = force_operand (gen_rtx_MINUS (int_mode, t1, nsign),
 				    NULL_RTX);
-		t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, t3, op1,
-				    NULL_RTX, 0);
+		t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, treeop0,
+				    treeop1, t3, op1, NULL_RTX, 0);
 		if (t4)
 		  {
 		    rtx t5;
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 80bb1b8a4c5b8350fb1b8f57a99fd52e5882fcb6..b786f1d75e25f3410c0640cd96a8abc055fa34d9 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -8028,16 +8028,17 @@ force_operand (rtx value, rtx target)
 	    return expand_divmod (0,
 				  FLOAT_MODE_P (GET_MODE (value))
 				  ? RDIV_EXPR : TRUNC_DIV_EXPR,
-				  GET_MODE (value), op1, op2, target, 0);
+				  GET_MODE (value), NULL, NULL, op1, op2,
+				  target, 0);
 	case MOD:
-	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2,
-				target, 0);
+	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL,
+				op1, op2, target, 0);
 	case UDIV:
-	  return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), op1, op2,
-				target, 1);
+	  return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), NULL, NULL,
+				op1, op2, target, 1);
 	case UMOD:
-	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2,
-				target, 1);
+	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL,
+				op1, op2, target, 1);
 	case ASHIFTRT:
 	  return expand_simple_binop (GET_MODE (value), code, op1, op2,
 				      target, 0, OPTAB_LIB_WIDEN);
@@ -8990,11 +8991,13 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0,
       bool speed_p = optimize_insn_for_speed_p ();
       do_pending_stack_adjust ();
       start_sequence ();
-      rtx uns_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 1);
+      rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1,
+				   op0, op1, target, 1);
       rtx_insn *uns_insns = get_insns ();
       end_sequence ();
       start_sequence ();
-      rtx sgn_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 0);
+      rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1,
+				   op0, op1, target, 0);
       rtx_insn *sgn_insns = get_insns ();
       end_sequence ();
       unsigned uns_cost = seq_cost (uns_insns, speed_p);
@@ -9016,7 +9019,8 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0,
       emit_insn (sgn_insns);
       return sgn_ret;
     }
-  return expand_divmod (mod_p, code, mode, op0, op1, target, unsignedp);
+  return expand_divmod (mod_p, code, mode, treeop0, treeop1,
+			op0, op1, target, unsignedp);
 }
 
 rtx
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 165f8d1fa22432b96967c69a58dbb7b4bf18120d..cff37ccb0dfc3dd79b97d0abfd872f340855dc96 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -1104,8 +1104,9 @@ expand_doubleword_mod (machine_mode mode, rtx op0, rtx op1, bool unsignedp)
 		return NULL_RTX;
 	    }
 	}
-      rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, sum,
-				     gen_int_mode (INTVAL (op1), word_mode),
+      rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, NULL, NULL,
+				     sum, gen_int_mode (INTVAL (op1),
+							word_mode),
 				     NULL_RTX, 1, OPTAB_DIRECT);
       if (remainder == NULL_RTX)
 	return NULL_RTX;
@@ -1208,8 +1209,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem,
 
   if (op11 != const1_rtx)
     {
-      rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, quot1, op11,
-				NULL_RTX, unsignedp, OPTAB_DIRECT);
+      rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, NULL, NULL, quot1,
+				op11, NULL_RTX, unsignedp, OPTAB_DIRECT);
       if (rem2 == NULL_RTX)
 	return NULL_RTX;
 
@@ -1223,8 +1224,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem,
       if (rem2 == NULL_RTX)
 	return NULL_RTX;
 
-      rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, quot1, op11,
-				 NULL_RTX, unsignedp, OPTAB_DIRECT);
+      rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, NULL, NULL, quot1,
+				 op11, NULL_RTX, unsignedp, OPTAB_DIRECT);
       if (quot2 == NULL_RTX)
 	return NULL_RTX;
 
diff --git a/gcc/target.def b/gcc/target.def
index 2a7fa68f83dd15dcdd2c332e8431e6142ec7d305..f491e2233cf18760631f148dacf18d0e0b133e4c 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1902,6 +1902,25 @@ implementation approaches itself.",
 	const vec_perm_indices &sel),
  NULL)
 
+DEFHOOK
+(can_special_div_by_const,
+ "This hook is used to test whether the target has a special method of\n\
+division of vectors of type @var{vectype} using the value @var{constant},\n\
+and producing a vector of type @var{vectype}.  The division\n\
+will then not be decomposed by the and kept as a div.\n\
+\n\
+When the hook is being used to test whether the target supports a special\n\
+divide, @var{in0}, @var{in1}, and @var{output} are all null.  When the hook\n\
+is being used to emit a division, @var{in0} and @var{in1} are the source\n\
+vectors of type @var{vecttype} and @var{output} is the destination vector of\n\
+type @var{vectype}.\n\
+\n\
+Return true if the operation is possible, emitting instructions for it\n\
+if rtxes are provided and updating @var{output}.",
+ bool, (enum tree_code, tree vectype, wide_int constant, rtx *output,
+	rtx in0, rtx in1),
+ default_can_special_div_by_const)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
diff --git a/gcc/target.h b/gcc/target.h
index d6fa6931499d15edff3e5af3e429540d001c7058..c836036ac7fa7910d62bd3da56f39c061f68b665 100644
--- a/gcc/target.h
+++ b/gcc/target.h
@@ -51,6 +51,7 @@
 #include "insn-codes.h"
 #include "tm.h"
 #include "hard-reg-set.h"
+#include "tree-core.h"
 
 #if CHECKING_P
 
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index ecce55ebe797cedc940620e8d89816973a045d49..c8df2af02b9d8c41d953b7887dd980b1a7c5cf1c 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -207,6 +207,8 @@ extern void default_addr_space_diagnose_usage (addr_space_t, location_t);
 extern rtx default_addr_space_convert (rtx, tree, tree);
 extern unsigned int default_case_values_threshold (void);
 extern bool default_have_conditional_execution (void);
+extern bool default_can_special_div_by_const (enum tree_code, tree, wide_int,
+					      rtx *, rtx, rtx);
 
 extern bool default_libc_has_function (enum function_class, tree);
 extern bool default_libc_has_fast_function (int fcode);
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index b15ae19bcb60c59ae8112e67b5f06a241a9bdbf1..f941b1c218d3c4de8b7f780b69fe04593ae3419e 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1807,6 +1807,14 @@ default_have_conditional_execution (void)
   return HAVE_conditional_execution;
 }
 
+/* Default that no division by constant operations are special.  */
+bool
+default_can_special_div_by_const (enum tree_code, tree, wide_int, rtx *, rtx,
+				  rtx)
+{
+  return false;
+}
+
 /* By default we assume that c99 functions are present at the runtime,
    but sincos is not.  */
 bool
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..472cd710534bc8aa9b1b4916f3d7b4d5b64a19b9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c
@@ -0,0 +1,25 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdint.h>
+#include "tree-vect.h"
+
+#define N 50
+#define TYPE uint8_t 
+
+__attribute__((noipa, noinline, optimize("O1")))
+void fun1(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * level) / 0xff;
+}
+
+__attribute__((noipa, noinline, optimize("O3")))
+void fun2(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * level) / 0xff;
+}
+
+#include "vect-div-bitmask.h"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..e904a71885b2e8487593a2cd3db75b3e4112e2cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c
@@ -0,0 +1,25 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdint.h>
+#include "tree-vect.h"
+
+#define N 50
+#define TYPE uint16_t 
+
+__attribute__((noipa, noinline, optimize("O1")))
+void fun1(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * level) / 0xffffU;
+}
+
+__attribute__((noipa, noinline, optimize("O3")))
+void fun2(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * level) / 0xffffU;
+}
+
+#include "vect-div-bitmask.h"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..a1418ebbf5ea8731ed4e3e720157701d9d1cf852
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c
@@ -0,0 +1,26 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */
+
+#include <stdint.h>
+#include "tree-vect.h"
+
+#define N 50
+#define TYPE uint32_t 
+
+__attribute__((noipa, noinline, optimize("O1")))
+void fun1(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL;
+}
+
+__attribute__((noipa, noinline, optimize("O3")))
+void fun2(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL;
+}
+
+#include "vect-div-bitmask.h"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
new file mode 100644
index 0000000000000000000000000000000000000000..29a16739aa4b706616367bfd1832f28ebd07993e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
@@ -0,0 +1,43 @@
+#include <stdio.h>
+
+#ifndef N
+#define N 65
+#endif
+
+#ifndef TYPE
+#define TYPE uint32_t
+#endif
+
+#ifndef DEBUG
+#define DEBUG 0
+#endif
+
+#define BASE ((TYPE) -1 < 0 ? -126 : 4)
+
+int main ()
+{
+  TYPE a[N];
+  TYPE b[N];
+
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 13;
+      b[i] = BASE + i * 13;
+      if (DEBUG)
+        printf ("%d: 0x%x\n", i, a[i]);
+    }
+
+  fun1 (a, N / 2, N);
+  fun2 (b, N / 2, N);
+
+  for (int i = 0; i < N; ++i)
+    {
+      if (DEBUG)
+        printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]);
+
+      if (a[i] != b[i])
+        __builtin_abort ();
+    }
+  return 0;
+}
+
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index 350129555a0c71c0896c4f1003163f3b3557c11b..6ad6372c55eef94a742a8fa35e79d66aa24e2f3b 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -1237,6 +1237,17 @@ expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type
 	  tree rhs2 = gimple_assign_rhs2 (assign);
 	  tree ret;
 
+	  /* Check if the target was going to handle it through the special
+	     division callback hook.  */
+	  tree cst = uniform_integer_cst_p (rhs2);
+	  if (cst &&
+	      targetm.vectorize.can_special_div_by_const (code, type,
+							  wi::to_wide (cst),
+							  NULL,
+							  NULL_RTX, NULL_RTX))
+	    return NULL_TREE;
+
+
 	  if (!optimize
 	      || !VECTOR_INTEGER_TYPE_P (type)
 	      || TREE_CODE (rhs2) != VECTOR_CST
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 09574bb1a2696b3438a4ce9f09f74b42e784aca0..e91bcef56fff931a7a7ba534a0affd56e7314370 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -3432,7 +3432,7 @@ vect_recog_divmod_pattern (vec_info *vinfo,
   gimple *pattern_stmt, *def_stmt;
   enum tree_code rhs_code;
   optab optab;
-  tree q;
+  tree q, cst;
   int dummy_int, prec;
 
   if (!is_gimple_assign (last_stmt))
@@ -3596,6 +3596,14 @@ vect_recog_divmod_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
+  else if ((cst = uniform_integer_cst_p (oprnd1))
+	   && targetm.vectorize.can_special_div_by_const (rhs_code, vectype,
+							  wi::to_wide (cst),
+							  NULL, NULL_RTX,
+							  NULL_RTX))
+    {
+      return NULL;
+    }
 
   if (prec > HOST_BITS_PER_WIDE_INT
       || integer_zerop (oprnd1))
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c9dab217f059f17e91e9a7582523e627d7a45b66..1399c22ba0df75f582887d7e83b67e3ea53d25f4 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6260,6 +6260,14 @@ vectorizable_operation (vec_info *vinfo,
 	}
       target_support_p = (optab_handler (optab, vec_mode)
 			  != CODE_FOR_nothing);
+      tree cst;
+      if (!target_support_p
+	  && (cst = uniform_integer_cst_p (op1)))
+	target_support_p
+	  = targetm.vectorize.can_special_div_by_const (code, vectype,
+							wi::to_wide (cst),
+							NULL, NULL_RTX,
+							NULL_RTX);
     }
 
   bool using_emulated_vectors_p = vect_emulated_vector_p (vectype);

[-- Attachment #2: rb15779.patch --]
[-- Type: application/octet-stream, Size: 18833 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 92bda1a7e14a3c9ea63e151e4a49a818bf4d1bdb..a29f5c39be3f0927f8ef6e094c7a712c0604fb77 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6112,6 +6112,22 @@ instruction pattern.  There is no need for the hook to handle these two
 implementation approaches itself.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST (enum @var{tree_code}, tree @var{vectype}, wide_int @var{constant}, rtx *@var{output}, rtx @var{in0}, rtx @var{in1})
+This hook is used to test whether the target has a special method of
+division of vectors of type @var{vectype} using the value @var{constant},
+and producing a vector of type @var{vectype}.  The division
+will then not be decomposed by the and kept as a div.
+
+When the hook is being used to test whether the target supports a special
+divide, @var{in0}, @var{in1}, and @var{output} are all null.  When the hook
+is being used to emit a division, @var{in0} and @var{in1} are the source
+vectors of type @var{vecttype} and @var{output} is the destination vector of
+type @var{vectype}.
+
+Return true if the operation is possible, emitting instructions for it
+if rtxes are provided and updating @var{output}.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
 This hook should return the decl of a function that implements the
 vectorized variant of the function with the @code{combined_fn} code
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 112462310b134705d860153294287cfd7d4af81d..d5a745a02acdf051ea1da1b04076d058c24ce093 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4164,6 +4164,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_VEC_PERM_CONST
 
+@hook TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST
+
 @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
 
 @hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION
diff --git a/gcc/explow.cc b/gcc/explow.cc
index ddb4d6ae3600542f8d2bb5617cdd3933a9fae6c0..568e0eb1a158c696458ae678f5e346bf34ba0036 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -1037,7 +1037,7 @@ round_push (rtx size)
      TRUNC_DIV_EXPR.  */
   size = expand_binop (Pmode, add_optab, size, alignm1_rtx,
 		       NULL_RTX, 1, OPTAB_LIB_WIDEN);
-  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx,
+  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, size, align_rtx,
 			NULL_RTX, 1);
   size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1);
 
@@ -1203,7 +1203,7 @@ align_dynamic_address (rtx target, unsigned required_align)
 			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
 				       Pmode),
 			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
-  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, target,
 			  gen_int_mode (required_align / BITS_PER_UNIT,
 					Pmode),
 			  NULL_RTX, 1);
diff --git a/gcc/expmed.h b/gcc/expmed.h
index 0b2538c4c6bd51dfdc772ef70bdf631c0bed8717..0db2986f11ff4a4b10b59501c6f33cb3595659b5 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -708,8 +708,9 @@ extern rtx expand_variable_shift (enum tree_code, machine_mode,
 extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
 			 int);
 #ifdef GCC_OPTABS_H
-extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
-			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
+extern rtx expand_divmod (int, enum tree_code, machine_mode, tree, tree,
+			  rtx, rtx, rtx, int,
+			  enum optab_methods = OPTAB_LIB_WIDEN);
 #endif
 #endif
 
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 8d7418be418406e72a895ecddf2dc7fdb950c76c..bab020c07222afa38305ef8d7333f271b1965b78 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -4222,8 +4222,8 @@ expand_sdiv_pow2 (scalar_int_mode mode, rtx op0, HOST_WIDE_INT d)
 
 rtx
 expand_divmod (int rem_flag, enum tree_code code, machine_mode mode,
-	       rtx op0, rtx op1, rtx target, int unsignedp,
-	       enum optab_methods methods)
+	       tree treeop0, tree treeop1, rtx op0, rtx op1, rtx target,
+	       int unsignedp, enum optab_methods methods)
 {
   machine_mode compute_mode;
   rtx tquotient;
@@ -4375,6 +4375,17 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode,
 
   last_div_const = ! rem_flag && op1_is_constant ? INTVAL (op1) : 0;
 
+  /* Check if the target has specific expansions for the division.  */
+  tree cst;
+  if (treeop0
+      && treeop1
+      && (cst = uniform_integer_cst_p (treeop1))
+      && targetm.vectorize.can_special_div_by_const (code, TREE_TYPE (treeop0),
+						     wi::to_wide (cst),
+						     &target, op0, op1))
+    return target;
+
+
   /* Now convert to the best mode to use.  */
   if (compute_mode != mode)
     {
@@ -4618,8 +4629,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode,
 			    || (optab_handler (sdivmod_optab, int_mode)
 				!= CODE_FOR_nothing)))
 		      quotient = expand_divmod (0, TRUNC_DIV_EXPR,
-						int_mode, op0,
-						gen_int_mode (abs_d,
+						int_mode, treeop0, treeop1,
+						op0, gen_int_mode (abs_d,
 							      int_mode),
 						NULL_RTX, 0);
 		    else
@@ -4808,8 +4819,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode,
 				      size - 1, NULL_RTX, 0);
 		t3 = force_operand (gen_rtx_MINUS (int_mode, t1, nsign),
 				    NULL_RTX);
-		t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, t3, op1,
-				    NULL_RTX, 0);
+		t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, treeop0,
+				    treeop1, t3, op1, NULL_RTX, 0);
 		if (t4)
 		  {
 		    rtx t5;
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 80bb1b8a4c5b8350fb1b8f57a99fd52e5882fcb6..b786f1d75e25f3410c0640cd96a8abc055fa34d9 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -8028,16 +8028,17 @@ force_operand (rtx value, rtx target)
 	    return expand_divmod (0,
 				  FLOAT_MODE_P (GET_MODE (value))
 				  ? RDIV_EXPR : TRUNC_DIV_EXPR,
-				  GET_MODE (value), op1, op2, target, 0);
+				  GET_MODE (value), NULL, NULL, op1, op2,
+				  target, 0);
 	case MOD:
-	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2,
-				target, 0);
+	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL,
+				op1, op2, target, 0);
 	case UDIV:
-	  return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), op1, op2,
-				target, 1);
+	  return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), NULL, NULL,
+				op1, op2, target, 1);
 	case UMOD:
-	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2,
-				target, 1);
+	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL,
+				op1, op2, target, 1);
 	case ASHIFTRT:
 	  return expand_simple_binop (GET_MODE (value), code, op1, op2,
 				      target, 0, OPTAB_LIB_WIDEN);
@@ -8990,11 +8991,13 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0,
       bool speed_p = optimize_insn_for_speed_p ();
       do_pending_stack_adjust ();
       start_sequence ();
-      rtx uns_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 1);
+      rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1,
+				   op0, op1, target, 1);
       rtx_insn *uns_insns = get_insns ();
       end_sequence ();
       start_sequence ();
-      rtx sgn_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 0);
+      rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1,
+				   op0, op1, target, 0);
       rtx_insn *sgn_insns = get_insns ();
       end_sequence ();
       unsigned uns_cost = seq_cost (uns_insns, speed_p);
@@ -9016,7 +9019,8 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0,
       emit_insn (sgn_insns);
       return sgn_ret;
     }
-  return expand_divmod (mod_p, code, mode, op0, op1, target, unsignedp);
+  return expand_divmod (mod_p, code, mode, treeop0, treeop1,
+			op0, op1, target, unsignedp);
 }
 
 rtx
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 165f8d1fa22432b96967c69a58dbb7b4bf18120d..cff37ccb0dfc3dd79b97d0abfd872f340855dc96 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -1104,8 +1104,9 @@ expand_doubleword_mod (machine_mode mode, rtx op0, rtx op1, bool unsignedp)
 		return NULL_RTX;
 	    }
 	}
-      rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, sum,
-				     gen_int_mode (INTVAL (op1), word_mode),
+      rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, NULL, NULL,
+				     sum, gen_int_mode (INTVAL (op1),
+							word_mode),
 				     NULL_RTX, 1, OPTAB_DIRECT);
       if (remainder == NULL_RTX)
 	return NULL_RTX;
@@ -1208,8 +1209,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem,
 
   if (op11 != const1_rtx)
     {
-      rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, quot1, op11,
-				NULL_RTX, unsignedp, OPTAB_DIRECT);
+      rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, NULL, NULL, quot1,
+				op11, NULL_RTX, unsignedp, OPTAB_DIRECT);
       if (rem2 == NULL_RTX)
 	return NULL_RTX;
 
@@ -1223,8 +1224,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem,
       if (rem2 == NULL_RTX)
 	return NULL_RTX;
 
-      rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, quot1, op11,
-				 NULL_RTX, unsignedp, OPTAB_DIRECT);
+      rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, NULL, NULL, quot1,
+				 op11, NULL_RTX, unsignedp, OPTAB_DIRECT);
       if (quot2 == NULL_RTX)
 	return NULL_RTX;
 
diff --git a/gcc/target.def b/gcc/target.def
index 2a7fa68f83dd15dcdd2c332e8431e6142ec7d305..f491e2233cf18760631f148dacf18d0e0b133e4c 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1902,6 +1902,25 @@ implementation approaches itself.",
 	const vec_perm_indices &sel),
  NULL)
 
+DEFHOOK
+(can_special_div_by_const,
+ "This hook is used to test whether the target has a special method of\n\
+division of vectors of type @var{vectype} using the value @var{constant},\n\
+and producing a vector of type @var{vectype}.  The division\n\
+will then not be decomposed by the and kept as a div.\n\
+\n\
+When the hook is being used to test whether the target supports a special\n\
+divide, @var{in0}, @var{in1}, and @var{output} are all null.  When the hook\n\
+is being used to emit a division, @var{in0} and @var{in1} are the source\n\
+vectors of type @var{vecttype} and @var{output} is the destination vector of\n\
+type @var{vectype}.\n\
+\n\
+Return true if the operation is possible, emitting instructions for it\n\
+if rtxes are provided and updating @var{output}.",
+ bool, (enum tree_code, tree vectype, wide_int constant, rtx *output,
+	rtx in0, rtx in1),
+ default_can_special_div_by_const)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
diff --git a/gcc/target.h b/gcc/target.h
index d6fa6931499d15edff3e5af3e429540d001c7058..c836036ac7fa7910d62bd3da56f39c061f68b665 100644
--- a/gcc/target.h
+++ b/gcc/target.h
@@ -51,6 +51,7 @@
 #include "insn-codes.h"
 #include "tm.h"
 #include "hard-reg-set.h"
+#include "tree-core.h"
 
 #if CHECKING_P
 
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index ecce55ebe797cedc940620e8d89816973a045d49..c8df2af02b9d8c41d953b7887dd980b1a7c5cf1c 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -207,6 +207,8 @@ extern void default_addr_space_diagnose_usage (addr_space_t, location_t);
 extern rtx default_addr_space_convert (rtx, tree, tree);
 extern unsigned int default_case_values_threshold (void);
 extern bool default_have_conditional_execution (void);
+extern bool default_can_special_div_by_const (enum tree_code, tree, wide_int,
+					      rtx *, rtx, rtx);
 
 extern bool default_libc_has_function (enum function_class, tree);
 extern bool default_libc_has_fast_function (int fcode);
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index b15ae19bcb60c59ae8112e67b5f06a241a9bdbf1..f941b1c218d3c4de8b7f780b69fe04593ae3419e 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1807,6 +1807,14 @@ default_have_conditional_execution (void)
   return HAVE_conditional_execution;
 }
 
+/* Default that no division by constant operations are special.  */
+bool
+default_can_special_div_by_const (enum tree_code, tree, wide_int, rtx *, rtx,
+				  rtx)
+{
+  return false;
+}
+
 /* By default we assume that c99 functions are present at the runtime,
    but sincos is not.  */
 bool
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..472cd710534bc8aa9b1b4916f3d7b4d5b64a19b9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c
@@ -0,0 +1,25 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdint.h>
+#include "tree-vect.h"
+
+#define N 50
+#define TYPE uint8_t 
+
+__attribute__((noipa, noinline, optimize("O1")))
+void fun1(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * level) / 0xff;
+}
+
+__attribute__((noipa, noinline, optimize("O3")))
+void fun2(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * level) / 0xff;
+}
+
+#include "vect-div-bitmask.h"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..e904a71885b2e8487593a2cd3db75b3e4112e2cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c
@@ -0,0 +1,25 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdint.h>
+#include "tree-vect.h"
+
+#define N 50
+#define TYPE uint16_t 
+
+__attribute__((noipa, noinline, optimize("O1")))
+void fun1(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * level) / 0xffffU;
+}
+
+__attribute__((noipa, noinline, optimize("O3")))
+void fun2(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * level) / 0xffffU;
+}
+
+#include "vect-div-bitmask.h"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..a1418ebbf5ea8731ed4e3e720157701d9d1cf852
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c
@@ -0,0 +1,26 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */
+
+#include <stdint.h>
+#include "tree-vect.h"
+
+#define N 50
+#define TYPE uint32_t 
+
+__attribute__((noipa, noinline, optimize("O1")))
+void fun1(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL;
+}
+
+__attribute__((noipa, noinline, optimize("O3")))
+void fun2(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+    pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL;
+}
+
+#include "vect-div-bitmask.h"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
new file mode 100644
index 0000000000000000000000000000000000000000..29a16739aa4b706616367bfd1832f28ebd07993e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
@@ -0,0 +1,43 @@
+#include <stdio.h>
+
+#ifndef N
+#define N 65
+#endif
+
+#ifndef TYPE
+#define TYPE uint32_t
+#endif
+
+#ifndef DEBUG
+#define DEBUG 0
+#endif
+
+#define BASE ((TYPE) -1 < 0 ? -126 : 4)
+
+int main ()
+{
+  TYPE a[N];
+  TYPE b[N];
+
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 13;
+      b[i] = BASE + i * 13;
+      if (DEBUG)
+        printf ("%d: 0x%x\n", i, a[i]);
+    }
+
+  fun1 (a, N / 2, N);
+  fun2 (b, N / 2, N);
+
+  for (int i = 0; i < N; ++i)
+    {
+      if (DEBUG)
+        printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]);
+
+      if (a[i] != b[i])
+        __builtin_abort ();
+    }
+  return 0;
+}
+
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index 350129555a0c71c0896c4f1003163f3b3557c11b..6ad6372c55eef94a742a8fa35e79d66aa24e2f3b 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -1237,6 +1237,17 @@ expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type
 	  tree rhs2 = gimple_assign_rhs2 (assign);
 	  tree ret;
 
+	  /* Check if the target was going to handle it through the special
+	     division callback hook.  */
+	  tree cst = uniform_integer_cst_p (rhs2);
+	  if (cst &&
+	      targetm.vectorize.can_special_div_by_const (code, type,
+							  wi::to_wide (cst),
+							  NULL,
+							  NULL_RTX, NULL_RTX))
+	    return NULL_TREE;
+
+
 	  if (!optimize
 	      || !VECTOR_INTEGER_TYPE_P (type)
 	      || TREE_CODE (rhs2) != VECTOR_CST
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 09574bb1a2696b3438a4ce9f09f74b42e784aca0..e91bcef56fff931a7a7ba534a0affd56e7314370 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -3432,7 +3432,7 @@ vect_recog_divmod_pattern (vec_info *vinfo,
   gimple *pattern_stmt, *def_stmt;
   enum tree_code rhs_code;
   optab optab;
-  tree q;
+  tree q, cst;
   int dummy_int, prec;
 
   if (!is_gimple_assign (last_stmt))
@@ -3596,6 +3596,14 @@ vect_recog_divmod_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
+  else if ((cst = uniform_integer_cst_p (oprnd1))
+	   && targetm.vectorize.can_special_div_by_const (rhs_code, vectype,
+							  wi::to_wide (cst),
+							  NULL, NULL_RTX,
+							  NULL_RTX))
+    {
+      return NULL;
+    }
 
   if (prec > HOST_BITS_PER_WIDE_INT
       || integer_zerop (oprnd1))
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c9dab217f059f17e91e9a7582523e627d7a45b66..1399c22ba0df75f582887d7e83b67e3ea53d25f4 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6260,6 +6260,14 @@ vectorizable_operation (vec_info *vinfo,
 	}
       target_support_p = (optab_handler (optab, vec_mode)
 			  != CODE_FOR_nothing);
+      tree cst;
+      if (!target_support_p
+	  && (cst = uniform_integer_cst_p (op1)))
+	target_support_p
+	  = targetm.vectorize.can_special_div_by_const (code, vectype,
+							wi::to_wide (cst),
+							NULL, NULL_RTX,
+							NULL_RTX);
     }
 
   bool using_emulated_vectors_p = vect_emulated_vector_p (vectype);

  reply	other threads:[~2022-10-31 11:34 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-09  4:39 [PATCH 1/2]middle-end Support optimized division by pow2 bitmask Tamar Christina
2022-06-09  4:40 ` [PATCH 2/2]AArch64 aarch64: Add implementation for pow2 bitmask division Tamar Christina
2022-06-13  9:24 ` [PATCH 1/2]middle-end Support optimized division by pow2 bitmask Richard Biener
2022-06-13  9:39   ` Richard Biener
2022-06-13 10:09     ` Tamar Christina
2022-06-13 11:47       ` Richard Biener
2022-06-13 14:37         ` Tamar Christina
2022-06-14 13:18           ` Richard Biener
2022-06-14 13:38             ` Tamar Christina
2022-06-14 13:42             ` Richard Sandiford
2022-06-14 15:57               ` Tamar Christina
2022-06-14 16:09                 ` Richard Biener
2022-06-22  0:34                 ` Tamar Christina
2022-06-26 19:55                   ` Jeff Law
2022-09-23  9:33 ` [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization Tamar Christina
2022-09-23  9:33 ` [PATCH 2/4]AArch64 Add implementation for pow2 bitmask division Tamar Christina
2022-10-31 11:34   ` Tamar Christina
2022-11-09  8:33     ` Tamar Christina
2022-11-09 16:02     ` Kyrylo Tkachov
2022-09-23  9:33 ` [PATCH 3/4]AArch64 Add SVE2 " Tamar Christina
2022-10-31 11:34   ` Tamar Christina
2022-11-09  8:33     ` Tamar Christina
2022-11-12 12:17   ` Richard Sandiford
2022-09-23  9:34 ` [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB to NARROWB + NARROWT Tamar Christina
2022-10-31 11:34   ` Tamar Christina
2022-11-09  8:33     ` Tamar Christina
2022-11-12 12:25   ` Richard Sandiford
2022-11-12 12:33     ` Richard Sandiford
2022-09-26 10:39 ` [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization Richard Biener
2022-10-31 11:34   ` Tamar Christina [this message]
2022-10-31 17:12     ` Jeff Law
2022-11-08 17:36     ` Tamar Christina
2022-11-09  8:01       ` Richard Biener
2022-11-09  8:26         ` Tamar Christina
2022-11-09 10:37 ` Kyrylo Tkachov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB5325C56878364315686EBD99FF379@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).