public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFC [1/2] divmod transform
@ 2016-05-23  8:58 Prathamesh Kulkarni
  2016-05-23 12:05 ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-05-23  8:58 UTC (permalink / raw)
  To: gcc Patches, Richard Biener, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 1481 bytes --]

Hi,
I have updated my patch for divmod (attached), which was originally
based on Kugan's patch.
The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
having same operands to divmod representation, so we can cse computation of mod.

t1 = a TRUNC_DIV_EXPR b;
t2 = a TRUNC_MOD_EXPR b
is transformed to:
complex_tmp = DIVMOD (a, b);
t1 = REALPART_EXPR (complex_tmp);
t2 = IMAGPART_EXPR (complex_tmp);

* New hook divmod_expand_libfunc
The rationale for introducing the hook is that different targets have
incompatible calling conventions for divmod libfunc.
Currently three ports define divmod libfunc: c6x, spu and arm.
c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
return quotient and store remainder in argument passed as pointer,
while the arm version takes two arguments and returns both
quotient and remainder having mode double the size of the operand mode.
The port should hence override the hook expand_divmod_libfunc
to generate call to target-specific divmod.
Ports should define this hook if:
a) The port does not have divmod or div insn for the given mode.
b) The port defines divmod libfunc for the given mode.
The default hook default_expand_divmod_libfunc() generates call
to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
are of DImode.

Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
cross-tested on arm*-*-*.
Bootstrap+test in progress on arm-linux-gnueabihf.
Does this patch look OK ?

Thanks,
Prathamesh

[-- Attachment #2: divmod-part1_1.diff --]
[-- Type: text/plain, Size: 13481 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8c7f2a1..111f19f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6963,6 +6963,12 @@ This is firstly introduced on ARM/AArch64 targets, please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool @var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx *@var{quot}, rtx *@var{rem})
+Define this hook if the port does not have hardware div and divmod insn for
+the given mode but has divmod libfunc, which is incompatible
+with libgcc2.c:__udivmoddi4
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f963a58..2c9a800 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4848,6 +4848,8 @@ them: try the first ones in this list first.
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index c867ddc..0cb59f7 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2276,6 +2276,48 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+    optab_libfunc for udivmod/sdivmod.  */
+
+static void 
+expand_DIVMOD (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  bool unsignedp = TYPE_UNSIGNED (type);
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx quotient, remainder;
+
+  /* Check if optab handler exists for [u]divmod.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+    {
+      quotient = gen_reg_rtx (mode);
+      remainder = gen_reg_rtx (mode);
+      expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
+    }
+  else
+    targetm.expand_divmod_libfunc (unsignedp, mode, op0, op1,
+				   &quotient, &remainder);
+
+  /* Wrap the return value (quotient, remainder) within COMPLEX_EXPR.  */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+		       make_tree (TREE_TYPE (arg0), quotient),
+		       make_tree (TREE_TYPE (arg1), remainder)),
+	       target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Return true if FN is supported for the types in TYPES when the
    optimization type is OPT_TYPE.  The types are those associated with
    the "type0" and "type1" fields of FN's direct_internal_fn_info
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index e729d85..56a80f1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -194,6 +194,9 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* Divmod function.  */
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/target.def b/gcc/target.def
index 6392e73..4496f9a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4948,6 +4948,16 @@ Normally, this is not needed.",
  bool, (const_tree field, machine_mode mode),
  default_member_type_forces_blk)
 
+/* See tree-ssa-math-opts.c:divmod_candidate_p for conditions that gate
+   the divmod transform.  */
+DEFHOOK
+(expand_divmod_libfunc,
+ "Define this hook if the port does not have hardware div and divmod insn for\n\
+the given mode but has divmod libfunc, which is incompatible\n\
+with libgcc2.c:__udivmoddi4",
+ void, (bool unsignedp, machine_mode mode, rtx, rtx, rtx *quot, rtx *rem),
+ default_expand_divmod_libfunc)
+
 /* Return the class for a secondary reload, and fill in extra information.  */
 DEFHOOK
 (secondary_reload,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 6b4601b..e4a021a 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode, machine_mode, optimization_type)
   return true;
 }
 
+void
+default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
+			       rtx op0, rtx op1,
+			       rtx *quot_p, rtx *rem_p)
+{
+  gcc_assert (mode == DImode);
+  gcc_assert (unsignedp);
+
+  /* Generate call to
+     DImode __udivmoddi4 (DImode op0, DImode op1, DImode *rem).  */
+
+  rtx libfunc = optab_libfunc (udivmod_optab, DImode);
+  gcc_assert (libfunc);
+
+  rtx remainder = assign_stack_temp (DImode, GET_MODE_SIZE (DImode));
+  rtx address = XEXP (remainder, 0);
+
+  rtx quotient = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+					  DImode, 3,
+					  op0, GET_MODE (op0),
+					  op1, GET_MODE (op1),
+					  address, GET_MODE (address));
+
+  *quot_p = quotient;
+  *rem_p = remainder;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 7687c39..dc5e8e7 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -254,4 +254,7 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 
+extern void default_expand_divmod_libfunc (bool, machine_mode,
+					   rtx, rtx, rtx *, rtx *);
+
 #endif /* GCC_TARGHOOKS_H */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 81688cd..a04d6d3 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -112,6 +112,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
+#include "targhooks.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -184,6 +187,9 @@ static struct
 
   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -3784,6 +3790,187 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
   return true;
 }
 
+/* Return true if target has support for divmod.  */
+
+static bool
+target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode, bool unsignedp)
+{
+  /* If target supports hardware divmod insn, use it for divmod.  */
+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing)
+    return true;
+
+  /* Check if libfunc for divmod is available.  */
+  if (optab_libfunc (divmod_optab, mode) != NULL_RTX) 
+    {
+      /* If target supports hardware insn, then we don't
+	 want to use divmod libfunc.  */
+      if (optab_handler (div_optab, mode) != CODE_FOR_nothing)
+	return false;
+
+      /* If target overrides expand_divmod_libfunc hook
+	 then perform divmod by generating call to the target-specifc divmod libfunc.  */
+      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
+	return true;
+
+      /* Fall back to using libgcc2.c:__udivmoddi4.  */
+      return (mode == DImode && unsignedp);
+    }
+
+  return false;
+}
+
+/* Check if stmt is candidate for divmod transform.  */
+
+static bool
+divmod_candidate_p (gassign *stmt)
+{
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  enum machine_mode mode = TYPE_MODE (type);
+  optab divmod_optab, div_optab;
+
+  if (TYPE_UNSIGNED (type))
+    {
+      divmod_optab = udivmod_optab;
+      div_optab = udiv_optab;
+    }
+  else
+    {
+      divmod_optab = sdivmod_optab;
+      div_optab = sdiv_optab;
+    }
+
+  if (!target_supports_divmod_p (divmod_optab, div_optab,
+				 mode, TYPE_UNSIGNED (type)))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  /* Disable the transform if either is a constant, since division-by-constant
+     may have specialized expansion.  */
+  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
+    return false;
+
+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;
+
+  return true;
+}
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   For conditions enabling the transform see divmod_candidate_p().
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a
+      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+      it to stmts vector.
+   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+      dominates other div/mod stmts with same operands) and update entries in
+      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+      IMAGPART_EXPR for mod).  */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  if (!divmod_candidate_p (stmt))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  vec<gimple *> stmts = vNULL;
+  stmts.safe_push (stmt);
+
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+  gimple *top_stmt = stmt;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+    {
+      /* Check if use_stmt is TRUNC_DIV_EXPR with same operands as stmt.  */
+      if (is_gimple_assign (use_stmt)
+	  && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  basic_block bb = gimple_bb (use_stmt);
+	  basic_block top_bb = gimple_bb (top_stmt);
+
+	  if (top_bb == bb)
+	    {
+	      stmts.safe_push (use_stmt);
+	      if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
+		top_stmt = use_stmt;
+	    }
+	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+	    {
+	      stmts.safe_push (use_stmt);
+	      top_stmt = use_stmt;
+	    }
+	  else if (dominated_by_p (CDI_DOMINATORS, bb, top_bb))
+	    stmts.safe_push (use_stmt);
+	}
+    }
+
+  /* No statements added to stmts vector.  */
+  if (stmts.length () == 1)
+    return false;
+
+  /* Create libcall to internal fn DIVMOD:
+     divmod_tmp = DIVMOD (op1, op2).  */
+
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (
+		build_complex_type (TREE_TYPE (op1)),
+		call_stmt, "divmod_tmp");
+  gimple_call_set_lhs (call_stmt, res);
+
+  /* Insert the call before top_stmt.  */
+  gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt);
+  gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT);
+
+  widen_mul_stats.divmod_calls_inserted++;		
+
+  /* Update all statements in stmts.
+     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = REALPART_EXPR<divmod_tmp>
+     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = IMAGPART_EXPR<divmod_tmp>.  */
+
+  bool cfg_changed = false;
+  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
+    {
+      tree new_rhs;
+
+      switch (gimple_assign_rhs_code (use_stmt))
+	{
+	  case TRUNC_DIV_EXPR:
+	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+	    break;
+
+	  case TRUNC_MOD_EXPR:
+	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	}
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
+      update_stmt (use_stmt);
+
+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+	cfg_changed = true;
+    }
+
+  stmts.release ();
+  return cfg_changed;
+}    
 
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
@@ -3828,6 +4015,8 @@ pass_optimize_widening_mul::execute (function *fun)
   bool cfg_changed = false;
 
   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
+  renumber_gimple_stmt_uids ();
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -3861,6 +4050,10 @@ pass_optimize_widening_mul::execute (function *fun)
 		    match_uaddsub_overflow (&gsi, stmt, code);
 		  break;
 
+		case TRUNC_MOD_EXPR:
+		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
+		  break;
+
 		default:;
 		}
 	    }
@@ -3907,6 +4100,8 @@ pass_optimize_widening_mul::execute (function *fun)
 			    widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
 			    widen_mul_stats.fmas_inserted);
+  statistics_counter_event (fun, "divmod calls inserted",
+			    widen_mul_stats.divmod_calls_inserted);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }

[-- Attachment #3: ChangeLog-part1 --]
[-- Type: application/octet-stream, Size: 881 bytes --]

2016-05-24  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
	    Kugan Vivekanandarajah  <kugan.vivekanandarajah@linaro.org>
	    Jim Wilson  <jim.wilson@linaro.org>

	* doc/tm.texi: Regenerate
	* doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
	* target.def (expand_divmod_libfunc): New hook.
	* targhooks.c (default_expand_divmod_libfunc): New function/
	* targhooks.h (default_expand_divmod_libfunc): Add prototype.
	* internal-fn.c (expand_DIVMOD): New function.
	* internal-fn.def: Add entry for DIVMOD.
	* tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
	targhooks.h
	(widen_mul_stats): New member divmod_calls_inserted.
	(target_supports_divmod_p): New function.
	(divmod_candidate_p): Likewise.
	(convert_to_divmod): Likewise.
	(pass_optimize_widening_mul::execute): Call calculate_dominance_info,
	renumber_gimple_stmt_uids, convert_to_divmod.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-23  8:58 RFC [1/2] divmod transform Prathamesh Kulkarni
@ 2016-05-23 12:05 ` Richard Biener
  2016-05-24 12:08   ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2016-05-23 12:05 UTC (permalink / raw)
  To: Prathamesh Kulkarni
  Cc: gcc Patches, Richard Biener, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> Hi,
> I have updated my patch for divmod (attached), which was originally
> based on Kugan's patch.
> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> having same operands to divmod representation, so we can cse computation of mod.
>
> t1 = a TRUNC_DIV_EXPR b;
> t2 = a TRUNC_MOD_EXPR b
> is transformed to:
> complex_tmp = DIVMOD (a, b);
> t1 = REALPART_EXPR (complex_tmp);
> t2 = IMAGPART_EXPR (complex_tmp);
>
> * New hook divmod_expand_libfunc
> The rationale for introducing the hook is that different targets have
> incompatible calling conventions for divmod libfunc.
> Currently three ports define divmod libfunc: c6x, spu and arm.
> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
> return quotient and store remainder in argument passed as pointer,
> while the arm version takes two arguments and returns both
> quotient and remainder having mode double the size of the operand mode.
> The port should hence override the hook expand_divmod_libfunc
> to generate call to target-specific divmod.
> Ports should define this hook if:
> a) The port does not have divmod or div insn for the given mode.
> b) The port defines divmod libfunc for the given mode.
> The default hook default_expand_divmod_libfunc() generates call
> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
> are of DImode.
>
> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
> cross-tested on arm*-*-*.
> Bootstrap+test in progress on arm-linux-gnueabihf.
> Does this patch look OK ?

diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 6b4601b..e4a021a 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
machine_mode, optimization_type)
   return true;
 }

+void
+default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
+                              rtx op0, rtx op1,
+                              rtx *quot_p, rtx *rem_p)

functions need a comment.

ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
case we could avoid the target hook.

+      /* If target overrides expand_divmod_libfunc hook
+        then perform divmod by generating call to the target-specifc divmod
libfunc.  */
+      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
+       return true;
+
+      /* Fall back to using libgcc2.c:__udivmoddi4.  */
+      return (mode == DImode && unsignedp);

I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
but still restrict this to DImode && unsigned?  Also if
targetm.expand_divmod_libfunc
is not the default we expect the target to handle all modes?

That said - I expected the above piece to be simply a 'return true;' ;)

Usually we use some can_expand_XXX helper in optabs.c to query if the target
supports a specific operation (for example SImode divmod would use DImode
divmod by means of widening operands - for the unsigned case of course).

+  /* Disable the transform if either is a constant, since
division-by-constant
+     may have specialized expansion.  */
+  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
+    return false;

please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)

+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;

why's that?  Generally please first test cheap things (trapping, constant-ness)
before checking expensive stuff (target_supports_divmod_p).

+static bool
+convert_to_divmod (gassign *stmt)
+{
+  if (!divmod_candidate_p (stmt))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  vec<gimple *> stmts = vNULL;

use an auto_vec <gimple *> - you currently leak it in at least one place.

+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+       cfg_changed = true;

note that this suggests you should check whether any of the stmts may throw
internally as you don't update / transfer EH info correctly.  So for 'stmt' and
all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
the list of stmts to modify.

Btw, I think you should not add 'stmt' immediately but when iterating over
all uses also gather uses in TRUNC_MOD_EXPR.

Otherwise looks ok.

Thanks,
Richard.

> Thanks,
> Prathamesh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-23 12:05 ` Richard Biener
@ 2016-05-24 12:08   ` Prathamesh Kulkarni
  2016-05-24 12:19     ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-05-24 12:08 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc Patches, Richard Biener, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 5716 bytes --]

On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
> On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
>> Hi,
>> I have updated my patch for divmod (attached), which was originally
>> based on Kugan's patch.
>> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> having same operands to divmod representation, so we can cse computation of mod.
>>
>> t1 = a TRUNC_DIV_EXPR b;
>> t2 = a TRUNC_MOD_EXPR b
>> is transformed to:
>> complex_tmp = DIVMOD (a, b);
>> t1 = REALPART_EXPR (complex_tmp);
>> t2 = IMAGPART_EXPR (complex_tmp);
>>
>> * New hook divmod_expand_libfunc
>> The rationale for introducing the hook is that different targets have
>> incompatible calling conventions for divmod libfunc.
>> Currently three ports define divmod libfunc: c6x, spu and arm.
>> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> return quotient and store remainder in argument passed as pointer,
>> while the arm version takes two arguments and returns both
>> quotient and remainder having mode double the size of the operand mode.
>> The port should hence override the hook expand_divmod_libfunc
>> to generate call to target-specific divmod.
>> Ports should define this hook if:
>> a) The port does not have divmod or div insn for the given mode.
>> b) The port defines divmod libfunc for the given mode.
>> The default hook default_expand_divmod_libfunc() generates call
>> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> are of DImode.
>>
>> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> cross-tested on arm*-*-*.
>> Bootstrap+test in progress on arm-linux-gnueabihf.
>> Does this patch look OK ?
>
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 6b4601b..e4a021a 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
> machine_mode, optimization_type)
>    return true;
>  }
>
> +void
> +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
> +                              rtx op0, rtx op1,
> +                              rtx *quot_p, rtx *rem_p)
>
> functions need a comment.
>
> ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
> case we could avoid the target hook.
Well I would prefer adding the hook because that's more easier -;)
Would it be ok for now to go with the hook ?
>
> +      /* If target overrides expand_divmod_libfunc hook
> +        then perform divmod by generating call to the target-specifc divmod
> libfunc.  */
> +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
> +       return true;
> +
> +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
> +      return (mode == DImode && unsignedp);
>
> I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
> but still restrict this to DImode && unsigned?  Also if
> targetm.expand_divmod_libfunc
> is not the default we expect the target to handle all modes?
Ah indeed, the check for DImode is unnecessary.
However I suppose the check for unsignedp should be there,
since we want to generate call to __udivmoddi4 only if operand is unsigned ?
>
> That said - I expected the above piece to be simply a 'return true;' ;)
>
> Usually we use some can_expand_XXX helper in optabs.c to query if the target
> supports a specific operation (for example SImode divmod would use DImode
> divmod by means of widening operands - for the unsigned case of course).
Thanks for pointing out. So if a target does not support divmod
libfunc for a mode
but for a wider mode, then we could zero-extend operands to the wider-mode,
perform divmod on the wider-mode, and then cast result back to the
original mode.
I haven't done that in this patch, would it be OK to do that as a follow up ?
>
> +  /* Disable the transform if either is a constant, since
> division-by-constant
> +     may have specialized expansion.  */
> +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
> +    return false;
>
> please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
>
> +  if (TYPE_OVERFLOW_TRAPS (type))
> +    return false;
>
> why's that?  Generally please first test cheap things (trapping, constant-ness)
> before checking expensive stuff (target_supports_divmod_p).
I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
"When looking at TRUNC_DIV_EXPR you should also exclude
the case where TYPE_OVERFLOW_TRAPS (type) as that should
expand using the [su]divv optabs (no trapping overflow
divmod optab exists)."
>
> +static bool
> +convert_to_divmod (gassign *stmt)
> +{
> +  if (!divmod_candidate_p (stmt))
> +    return false;
> +
> +  tree op1 = gimple_assign_rhs1 (stmt);
> +  tree op2 = gimple_assign_rhs2 (stmt);
> +
> +  vec<gimple *> stmts = vNULL;
>
> use an auto_vec <gimple *> - you currently leak it in at least one place.
>
> +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> +       cfg_changed = true;
>
> note that this suggests you should check whether any of the stmts may throw
> internally as you don't update / transfer EH info correctly.  So for 'stmt' and
> all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
> the list of stmts to modify.
>
> Btw, I think you should not add 'stmt' immediately but when iterating over
> all uses also gather uses in TRUNC_MOD_EXPR.
>
> Otherwise looks ok.
Done changes in this version. I am gathering mod uses same time as div uses,
so this imposes a constraint that mod dominates mod. I am not sure if
that's desirable.

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>> Thanks,
>> Prathamesh

[-- Attachment #2: divmod-part1_2.diff --]
[-- Type: text/plain, Size: 14014 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8c7f2a1..111f19f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6963,6 +6963,12 @@ This is firstly introduced on ARM/AArch64 targets, please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool @var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx *@var{quot}, rtx *@var{rem})
+Define this hook if the port does not have hardware div and divmod insn for
+the given mode but has divmod libfunc, which is incompatible
+with libgcc2.c:__udivmoddi4
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f963a58..2c9a800 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4848,6 +4848,8 @@ them: try the first ones in this list first.
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index c867ddc..0cb59f7 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2276,6 +2276,48 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+    optab_libfunc for udivmod/sdivmod.  */
+
+static void 
+expand_DIVMOD (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  bool unsignedp = TYPE_UNSIGNED (type);
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx quotient, remainder;
+
+  /* Check if optab handler exists for [u]divmod.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+    {
+      quotient = gen_reg_rtx (mode);
+      remainder = gen_reg_rtx (mode);
+      expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
+    }
+  else
+    targetm.expand_divmod_libfunc (unsignedp, mode, op0, op1,
+				   &quotient, &remainder);
+
+  /* Wrap the return value (quotient, remainder) within COMPLEX_EXPR.  */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+		       make_tree (TREE_TYPE (arg0), quotient),
+		       make_tree (TREE_TYPE (arg1), remainder)),
+	       target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Return true if FN is supported for the types in TYPES when the
    optimization type is OPT_TYPE.  The types are those associated with
    the "type0" and "type1" fields of FN's direct_internal_fn_info
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index e729d85..56a80f1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -194,6 +194,9 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* Divmod function.  */
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/target.def b/gcc/target.def
index 6392e73..4496f9a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4948,6 +4948,16 @@ Normally, this is not needed.",
  bool, (const_tree field, machine_mode mode),
  default_member_type_forces_blk)
 
+/* See tree-ssa-math-opts.c:divmod_candidate_p for conditions that gate
+   the divmod transform.  */
+DEFHOOK
+(expand_divmod_libfunc,
+ "Define this hook if the port does not have hardware div and divmod insn for\n\
+the given mode but has divmod libfunc, which is incompatible\n\
+with libgcc2.c:__udivmoddi4",
+ void, (bool unsignedp, machine_mode mode, rtx, rtx, rtx *quot, rtx *rem),
+ default_expand_divmod_libfunc)
+
 /* Return the class for a secondary reload, and fill in extra information.  */
 DEFHOOK
 (secondary_reload,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 6b4601b..20327a6 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode, machine_mode, optimization_type)
   return true;
 }
 
+/* Generate call to
+   DImode __udivmoddi4 (DImode op0, DImode op1, DImode *rem).  */
+
+void
+default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
+			       rtx op0, rtx op1,
+			       rtx *quot_p, rtx *rem_p)
+{
+  gcc_assert (mode == DImode);
+  gcc_assert (unsignedp);
+
+  rtx libfunc = optab_libfunc (udivmod_optab, DImode);
+  gcc_assert (libfunc);
+
+  rtx remainder = assign_stack_temp (DImode, GET_MODE_SIZE (DImode));
+  rtx address = XEXP (remainder, 0);
+
+  rtx quotient = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+					  DImode, 3,
+					  op0, GET_MODE (op0),
+					  op1, GET_MODE (op1),
+					  address, GET_MODE (address));
+
+  *quot_p = quotient;
+  *rem_p = remainder;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 7687c39..dc5e8e7 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -254,4 +254,7 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 
+extern void default_expand_divmod_libfunc (bool, machine_mode,
+					   rtx, rtx, rtx *, rtx *);
+
 #endif /* GCC_TARGHOOKS_H */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 81688cd..b15da6a 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -112,6 +112,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
+#include "targhooks.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -184,6 +187,9 @@ static struct
 
   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -3784,6 +3790,202 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
   return true;
 }
 
+/* Return true if target has support for divmod.  */
+
+static bool
+target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode, bool unsignedp)
+{
+  /* If target supports hardware divmod insn, use it for divmod.  */
+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing)
+    return true;
+
+  /* Check if libfunc for divmod is available.  */
+  if (optab_libfunc (divmod_optab, mode) != NULL_RTX) 
+    {
+      /* If target supports hardware insn, then we don't
+	 want to use divmod libfunc.  */
+      if (optab_handler (div_optab, mode) != CODE_FOR_nothing)
+	return false;
+
+      /* If target overrides expand_divmod_libfunc hook
+	 then perform divmod by generating call to the target-specifc divmod libfunc.  */
+      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
+	return true;
+
+      /* Since targetm.expand_divmod_libfunc == default_expand_divmod_libfunc 
+	 mode is guaranteed to be DImode.
+	 Fall back to using libgcc2.c:__udivmoddi4 if unsignedp is true.  */
+      return unsignedp;
+    }
+
+  return false;
+}
+
+/* Check if stmt is candidate for divmod transform.  */
+
+static bool
+divmod_candidate_p (gassign *stmt)
+{
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  enum machine_mode mode = TYPE_MODE (type);
+  optab divmod_optab, div_optab;
+
+  if (TYPE_UNSIGNED (type))
+    {
+      divmod_optab = udivmod_optab;
+      div_optab = udiv_optab;
+    }
+  else
+    {
+      divmod_optab = sdivmod_optab;
+      div_optab = sdiv_optab;
+    }
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  /* Disable the transform if either is a constant, since division-by-constant
+     may have specialized expansion.  */
+  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
+    return false;
+
+  /* Exclude the case where TYPE_OVERFLOW_TRAPS (type) as that should
+     expand using the [su]divv optabs.  */
+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;
+  
+  if (!target_supports_divmod_p (divmod_optab, div_optab,
+				 mode, TYPE_UNSIGNED (type)))
+    return false;
+
+  return true;
+}
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   For conditions enabling the transform see divmod_candidate_p().
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a
+      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+      it to stmts vector.
+   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+      dominates other div/mod stmts with same operands) and update entries in
+      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+      IMAGPART_EXPR for mod).  */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  if (!divmod_candidate_p (stmt))
+    return false;
+
+  auto_vec<gimple *> stmts; 
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+  gimple *top_stmt = NULL;
+  bool div_seen = false;
+  
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+    {
+      /* Check if use_stmt is TRUNC_DIV_EXPR with same operands as stmt.  */
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  basic_block bb = gimple_bb (use_stmt);
+	  basic_block top_bb = top_stmt ? gimple_bb (top_stmt) : NULL;
+
+	  if (top_bb == NULL)
+	    {
+	      top_stmt = use_stmt;
+	      stmts.safe_push (top_stmt);
+	    }
+	  else if (top_bb == bb)
+	    {
+	      stmts.safe_push (use_stmt);
+	      if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
+		top_stmt = use_stmt;
+	    }
+	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+	    {
+	      stmts.safe_push (use_stmt);
+	      top_stmt = use_stmt;
+	    }
+	  else if (dominated_by_p (CDI_DOMINATORS, bb, top_bb))
+	    stmts.safe_push (use_stmt);
+	  else
+	    continue;
+
+	  if (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR)
+	    div_seen = true;
+	}
+    }
+
+  if (!div_seen) 
+    return false;
+
+  /* Create libcall to internal fn DIVMOD:
+     divmod_tmp = DIVMOD (op1, op2).  */
+
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (
+		build_complex_type (TREE_TYPE (op1)),
+		call_stmt, "divmod_tmp");
+  gimple_call_set_lhs (call_stmt, res);
+
+  /* Insert the call before top_stmt.  */
+  gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt);
+  gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT);
+
+  widen_mul_stats.divmod_calls_inserted++;		
+
+  /* Update all statements in stmts.
+     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = REALPART_EXPR<divmod_tmp>
+     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = IMAGPART_EXPR<divmod_tmp>.  */
+
+  bool cfg_changed = false;
+  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
+    {
+      tree new_rhs;
+
+      switch (gimple_assign_rhs_code (use_stmt))
+	{
+	  case TRUNC_DIV_EXPR:
+	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+	    break;
+
+	  case TRUNC_MOD_EXPR:
+	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	}
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
+      update_stmt (use_stmt);
+
+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+	cfg_changed = true;
+    }
+
+  stmts.release ();
+  return cfg_changed;
+}    
 
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
@@ -3828,6 +4030,8 @@ pass_optimize_widening_mul::execute (function *fun)
   bool cfg_changed = false;
 
   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
+  renumber_gimple_stmt_uids ();
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -3861,6 +4065,10 @@ pass_optimize_widening_mul::execute (function *fun)
 		    match_uaddsub_overflow (&gsi, stmt, code);
 		  break;
 
+		case TRUNC_MOD_EXPR:
+		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
+		  break;
+
 		default:;
 		}
 	    }
@@ -3907,6 +4115,8 @@ pass_optimize_widening_mul::execute (function *fun)
 			    widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
 			    widen_mul_stats.fmas_inserted);
+  statistics_counter_event (fun, "divmod calls inserted",
+			    widen_mul_stats.divmod_calls_inserted);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-24 12:08   ` Prathamesh Kulkarni
@ 2016-05-24 12:19     ` Richard Biener
  2016-05-24 14:52       ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2016-05-24 12:19 UTC (permalink / raw)
  To: Prathamesh Kulkarni
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

On Tue, 24 May 2016, Prathamesh Kulkarni wrote:

> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> >> Hi,
> >> I have updated my patch for divmod (attached), which was originally
> >> based on Kugan's patch.
> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> >> having same operands to divmod representation, so we can cse computation of mod.
> >>
> >> t1 = a TRUNC_DIV_EXPR b;
> >> t2 = a TRUNC_MOD_EXPR b
> >> is transformed to:
> >> complex_tmp = DIVMOD (a, b);
> >> t1 = REALPART_EXPR (complex_tmp);
> >> t2 = IMAGPART_EXPR (complex_tmp);
> >>
> >> * New hook divmod_expand_libfunc
> >> The rationale for introducing the hook is that different targets have
> >> incompatible calling conventions for divmod libfunc.
> >> Currently three ports define divmod libfunc: c6x, spu and arm.
> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
> >> return quotient and store remainder in argument passed as pointer,
> >> while the arm version takes two arguments and returns both
> >> quotient and remainder having mode double the size of the operand mode.
> >> The port should hence override the hook expand_divmod_libfunc
> >> to generate call to target-specific divmod.
> >> Ports should define this hook if:
> >> a) The port does not have divmod or div insn for the given mode.
> >> b) The port defines divmod libfunc for the given mode.
> >> The default hook default_expand_divmod_libfunc() generates call
> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
> >> are of DImode.
> >>
> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
> >> cross-tested on arm*-*-*.
> >> Bootstrap+test in progress on arm-linux-gnueabihf.
> >> Does this patch look OK ?
> >
> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > index 6b4601b..e4a021a 100644
> > --- a/gcc/targhooks.c
> > +++ b/gcc/targhooks.c
> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
> > machine_mode, optimization_type)
> >    return true;
> >  }
> >
> > +void
> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
> > +                              rtx op0, rtx op1,
> > +                              rtx *quot_p, rtx *rem_p)
> >
> > functions need a comment.
> >
> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
> > case we could avoid the target hook.
> Well I would prefer adding the hook because that's more easier -;)
> Would it be ok for now to go with the hook ?
> >
> > +      /* If target overrides expand_divmod_libfunc hook
> > +        then perform divmod by generating call to the target-specifc divmod
> > libfunc.  */
> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
> > +       return true;
> > +
> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
> > +      return (mode == DImode && unsignedp);
> >
> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
> > but still restrict this to DImode && unsigned?  Also if
> > targetm.expand_divmod_libfunc
> > is not the default we expect the target to handle all modes?
> Ah indeed, the check for DImode is unnecessary.
> However I suppose the check for unsignedp should be there,
> since we want to generate call to __udivmoddi4 only if operand is unsigned ?

The optab libfunc for sdivmod should be NULL in that case.

> >
> > That said - I expected the above piece to be simply a 'return true;' ;)
> >
> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
> > supports a specific operation (for example SImode divmod would use DImode
> > divmod by means of widening operands - for the unsigned case of course).
> Thanks for pointing out. So if a target does not support divmod
> libfunc for a mode
> but for a wider mode, then we could zero-extend operands to the wider-mode,
> perform divmod on the wider-mode, and then cast result back to the
> original mode.
> I haven't done that in this patch, would it be OK to do that as a follow up ?

I think that you should conservatively handle the div_optab query, thus if
the target has a HW division in a wider mode don't use the divmod IFN.
You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
out if that is available.

> > +  /* Disable the transform if either is a constant, since
> > division-by-constant
> > +     may have specialized expansion.  */
> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
> > +    return false;
> >
> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
> >
> > +  if (TYPE_OVERFLOW_TRAPS (type))
> > +    return false;
> >
> > why's that?  Generally please first test cheap things (trapping, constant-ness)
> > before checking expensive stuff (target_supports_divmod_p).
> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
> "When looking at TRUNC_DIV_EXPR you should also exclude
> the case where TYPE_OVERFLOW_TRAPS (type) as that should
> expand using the [su]divv optabs (no trapping overflow
> divmod optab exists)."

Ok, didn't remember that.

> >
> > +static bool
> > +convert_to_divmod (gassign *stmt)
> > +{
> > +  if (!divmod_candidate_p (stmt))
> > +    return false;
> > +
> > +  tree op1 = gimple_assign_rhs1 (stmt);
> > +  tree op2 = gimple_assign_rhs2 (stmt);
> > +
> > +  vec<gimple *> stmts = vNULL;
> >
> > use an auto_vec <gimple *> - you currently leak it in at least one place.
> >
> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> > +       cfg_changed = true;
> >
> > note that this suggests you should check whether any of the stmts may throw
> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and
> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
> > the list of stmts to modify.
> >
> > Btw, I think you should not add 'stmt' immediately but when iterating over
> > all uses also gather uses in TRUNC_MOD_EXPR.
> >
> > Otherwise looks ok.
> Done changes in this version. I am gathering mod uses same time as div uses,
> so this imposes a constraint that mod dominates mod. I am not sure if
> that's desirable.

I think you also need a mod_seen variable now that you don't necessarily
end up with 'stmt' in the vector of stmts.  I don't see how there is a
constraint that mod dominates mod - it's just that the top_stmt needs
to dominate all other uses that can be replaced with replacing top_stmt
with a divmod.  It's just that the actual stmt set we choose may now
depend on immediate uses order which on a second thought is bad
as immediate uses order could be affected by debug stmts ... hmm.

To avoid this please re-add the code adding 'stmt' to stmts immediately
and add a use_stmt != stmt check to the immediate use processing loop
so that we don't end up adding it twice.

Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-24 12:19     ` Richard Biener
@ 2016-05-24 14:52       ` Prathamesh Kulkarni
  2016-05-24 14:59         ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-05-24 14:52 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 7855 bytes --]

On 24 May 2016 at 17:42, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>
>> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
>> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>> > <prathamesh.kulkarni@linaro.org> wrote:
>> >> Hi,
>> >> I have updated my patch for divmod (attached), which was originally
>> >> based on Kugan's patch.
>> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> >> having same operands to divmod representation, so we can cse computation of mod.
>> >>
>> >> t1 = a TRUNC_DIV_EXPR b;
>> >> t2 = a TRUNC_MOD_EXPR b
>> >> is transformed to:
>> >> complex_tmp = DIVMOD (a, b);
>> >> t1 = REALPART_EXPR (complex_tmp);
>> >> t2 = IMAGPART_EXPR (complex_tmp);
>> >>
>> >> * New hook divmod_expand_libfunc
>> >> The rationale for introducing the hook is that different targets have
>> >> incompatible calling conventions for divmod libfunc.
>> >> Currently three ports define divmod libfunc: c6x, spu and arm.
>> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> >> return quotient and store remainder in argument passed as pointer,
>> >> while the arm version takes two arguments and returns both
>> >> quotient and remainder having mode double the size of the operand mode.
>> >> The port should hence override the hook expand_divmod_libfunc
>> >> to generate call to target-specific divmod.
>> >> Ports should define this hook if:
>> >> a) The port does not have divmod or div insn for the given mode.
>> >> b) The port defines divmod libfunc for the given mode.
>> >> The default hook default_expand_divmod_libfunc() generates call
>> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> >> are of DImode.
>> >>
>> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> >> cross-tested on arm*-*-*.
>> >> Bootstrap+test in progress on arm-linux-gnueabihf.
>> >> Does this patch look OK ?
>> >
>> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> > index 6b4601b..e4a021a 100644
>> > --- a/gcc/targhooks.c
>> > +++ b/gcc/targhooks.c
>> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
>> > machine_mode, optimization_type)
>> >    return true;
>> >  }
>> >
>> > +void
>> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
>> > +                              rtx op0, rtx op1,
>> > +                              rtx *quot_p, rtx *rem_p)
>> >
>> > functions need a comment.
>> >
>> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
>> > case we could avoid the target hook.
>> Well I would prefer adding the hook because that's more easier -;)
>> Would it be ok for now to go with the hook ?
>> >
>> > +      /* If target overrides expand_divmod_libfunc hook
>> > +        then perform divmod by generating call to the target-specifc divmod
>> > libfunc.  */
>> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
>> > +       return true;
>> > +
>> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
>> > +      return (mode == DImode && unsignedp);
>> >
>> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
>> > but still restrict this to DImode && unsigned?  Also if
>> > targetm.expand_divmod_libfunc
>> > is not the default we expect the target to handle all modes?
>> Ah indeed, the check for DImode is unnecessary.
>> However I suppose the check for unsignedp should be there,
>> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
>
> The optab libfunc for sdivmod should be NULL in that case.
Ah indeed, thanks.
>
>> >
>> > That said - I expected the above piece to be simply a 'return true;' ;)
>> >
>> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
>> > supports a specific operation (for example SImode divmod would use DImode
>> > divmod by means of widening operands - for the unsigned case of course).
>> Thanks for pointing out. So if a target does not support divmod
>> libfunc for a mode
>> but for a wider mode, then we could zero-extend operands to the wider-mode,
>> perform divmod on the wider-mode, and then cast result back to the
>> original mode.
>> I haven't done that in this patch, would it be OK to do that as a follow up ?
>
> I think that you should conservatively handle the div_optab query, thus if
> the target has a HW division in a wider mode don't use the divmod IFN.
> You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
> if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
> out if that is available.
Done.
>
>> > +  /* Disable the transform if either is a constant, since
>> > division-by-constant
>> > +     may have specialized expansion.  */
>> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
>> > +    return false;
>> >
>> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
>> >
>> > +  if (TYPE_OVERFLOW_TRAPS (type))
>> > +    return false;
>> >
>> > why's that?  Generally please first test cheap things (trapping, constant-ness)
>> > before checking expensive stuff (target_supports_divmod_p).
>> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
>> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
>> "When looking at TRUNC_DIV_EXPR you should also exclude
>> the case where TYPE_OVERFLOW_TRAPS (type) as that should
>> expand using the [su]divv optabs (no trapping overflow
>> divmod optab exists)."
>
> Ok, didn't remember that.
>
>> >
>> > +static bool
>> > +convert_to_divmod (gassign *stmt)
>> > +{
>> > +  if (!divmod_candidate_p (stmt))
>> > +    return false;
>> > +
>> > +  tree op1 = gimple_assign_rhs1 (stmt);
>> > +  tree op2 = gimple_assign_rhs2 (stmt);
>> > +
>> > +  vec<gimple *> stmts = vNULL;
>> >
>> > use an auto_vec <gimple *> - you currently leak it in at least one place.
>> >
>> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> > +       cfg_changed = true;
>> >
>> > note that this suggests you should check whether any of the stmts may throw
>> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and
>> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
>> > the list of stmts to modify.
>> >
>> > Btw, I think you should not add 'stmt' immediately but when iterating over
>> > all uses also gather uses in TRUNC_MOD_EXPR.
>> >
>> > Otherwise looks ok.
>> Done changes in this version. I am gathering mod uses same time as div uses,
>> so this imposes a constraint that mod dominates mod. I am not sure if
>> that's desirable.
>
> I think you also need a mod_seen variable now that you don't necessarily
> end up with 'stmt' in the vector of stmts.  I don't see how there is a
> constraint that mod dominates mod - it's just that the top_stmt needs
> to dominate all other uses that can be replaced with replacing top_stmt
> with a divmod.  It's just that the actual stmt set we choose may now
> depend on immediate uses order which on a second thought is bad
> as immediate uses order could be affected by debug stmts ... hmm.
>
> To avoid this please re-add the code adding 'stmt' to stmts immediately
> and add a use_stmt != stmt check to the immediate use processing loop
> so that we don't end up adding it twice.
Well I wonder what will happen for the following case:
t1 = x / y;
if (cond)
  t2 = x % y;
else
  t3 = x % y;

Assuming stmt == top_stmt is "t2 = x % y" and use_stmt is "t3 = x % y",
use_stmt will not get added to stmts vector, since top_stmt and
use_stmt are not in same bb,
and bb's containing top_stmt and use_stmt don't dominate each other.
Not sure if this is practical case (I assume fre will hoist mod
outside if-else?)

Now that we immediately add stmt to stmts vector, I suppose mod_seen
shall not be required ?

Thanks,
Prathamesh
>
> Richard.

[-- Attachment #2: divmod-part1_3.diff --]
[-- Type: text/plain, Size: 13699 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8c7f2a1..111f19f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6963,6 +6963,12 @@ This is firstly introduced on ARM/AArch64 targets, please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool @var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx *@var{quot}, rtx *@var{rem})
+Define this hook if the port does not have hardware div and divmod insn for
+the given mode but has divmod libfunc, which is incompatible
+with libgcc2.c:__udivmoddi4
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f963a58..2c9a800 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4848,6 +4848,8 @@ them: try the first ones in this list first.
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index c867ddc..0cb59f7 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2276,6 +2276,48 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+    optab_libfunc for udivmod/sdivmod.  */
+
+static void 
+expand_DIVMOD (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  bool unsignedp = TYPE_UNSIGNED (type);
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx quotient, remainder;
+
+  /* Check if optab handler exists for [u]divmod.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+    {
+      quotient = gen_reg_rtx (mode);
+      remainder = gen_reg_rtx (mode);
+      expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
+    }
+  else
+    targetm.expand_divmod_libfunc (unsignedp, mode, op0, op1,
+				   &quotient, &remainder);
+
+  /* Wrap the return value (quotient, remainder) within COMPLEX_EXPR.  */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+		       make_tree (TREE_TYPE (arg0), quotient),
+		       make_tree (TREE_TYPE (arg1), remainder)),
+	       target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Return true if FN is supported for the types in TYPES when the
    optimization type is OPT_TYPE.  The types are those associated with
    the "type0" and "type1" fields of FN's direct_internal_fn_info
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index e729d85..56a80f1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -194,6 +194,9 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* Divmod function.  */
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/target.def b/gcc/target.def
index 6392e73..4496f9a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4948,6 +4948,16 @@ Normally, this is not needed.",
  bool, (const_tree field, machine_mode mode),
  default_member_type_forces_blk)
 
+/* See tree-ssa-math-opts.c:divmod_candidate_p for conditions that gate
+   the divmod transform.  */
+DEFHOOK
+(expand_divmod_libfunc,
+ "Define this hook if the port does not have hardware div and divmod insn for\n\
+the given mode but has divmod libfunc, which is incompatible\n\
+with libgcc2.c:__udivmoddi4",
+ void, (bool unsignedp, machine_mode mode, rtx, rtx, rtx *quot, rtx *rem),
+ default_expand_divmod_libfunc)
+
 /* Return the class for a secondary reload, and fill in extra information.  */
 DEFHOOK
 (secondary_reload,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 6b4601b..20327a6 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode, machine_mode, optimization_type)
   return true;
 }
 
+/* Generate call to
+   DImode __udivmoddi4 (DImode op0, DImode op1, DImode *rem).  */
+
+void
+default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
+			       rtx op0, rtx op1,
+			       rtx *quot_p, rtx *rem_p)
+{
+  gcc_assert (mode == DImode);
+  gcc_assert (unsignedp);
+
+  rtx libfunc = optab_libfunc (udivmod_optab, DImode);
+  gcc_assert (libfunc);
+
+  rtx remainder = assign_stack_temp (DImode, GET_MODE_SIZE (DImode));
+  rtx address = XEXP (remainder, 0);
+
+  rtx quotient = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+					  DImode, 3,
+					  op0, GET_MODE (op0),
+					  op1, GET_MODE (op1),
+					  address, GET_MODE (address));
+
+  *quot_p = quotient;
+  *rem_p = remainder;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 7687c39..dc5e8e7 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -254,4 +254,7 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 
+extern void default_expand_divmod_libfunc (bool, machine_mode,
+					   rtx, rtx, rtx *, rtx *);
+
 #endif /* GCC_TARGHOOKS_H */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 81688cd..242b676 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -112,6 +112,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
+#include "targhooks.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -184,6 +187,9 @@ static struct
 
   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -3784,6 +3790,196 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
   return true;
 }
 
+/* Return true if target has support for divmod.  */
+
+static bool
+target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode)
+{
+  /* If target supports hardware divmod insn, use it for divmod.  */
+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing)
+    return true;
+
+  /* Check if libfunc for divmod is available.  */
+  if (optab_libfunc (divmod_optab, mode) != NULL_RTX)
+    {
+      /* If optab_handler exists for div_optab, perhaps in a wider mode,
+	 we don't want to use the libfunc even if it exists for given mode.  */ 
+      for (machine_mode div_mode = mode;
+	   div_mode != VOIDmode;
+	   div_mode = GET_MODE_WIDER_MODE (div_mode))
+	if (optab_handler (div_optab, div_mode) != CODE_FOR_nothing)
+	  return false;
+ 
+      return true;
+    }
+
+  return false;
+}
+
+/* Check if stmt is candidate for divmod transform.  */
+
+static bool
+divmod_candidate_p (gassign *stmt)
+{
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  enum machine_mode mode = TYPE_MODE (type);
+  optab divmod_optab, div_optab;
+
+  if (TYPE_UNSIGNED (type))
+    {
+      divmod_optab = udivmod_optab;
+      div_optab = udiv_optab;
+    }
+  else
+    {
+      divmod_optab = sdivmod_optab;
+      div_optab = sdiv_optab;
+    }
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  /* Disable the transform if either is a constant, since division-by-constant
+     may have specialized expansion.  */
+  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
+    return false;
+
+  /* Exclude the case where TYPE_OVERFLOW_TRAPS (type) as that should
+     expand using the [su]divv optabs.  */
+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;
+  
+  if (!target_supports_divmod_p (divmod_optab, div_optab, mode))
+    return false;
+
+  return true;
+}
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   For conditions enabling the transform see divmod_candidate_p().
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a
+      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+      it to stmts vector.
+   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+      dominates other div/mod stmts with same operands) and update entries in
+      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+      IMAGPART_EXPR for mod).  */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  if (!divmod_candidate_p (stmt))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+  gimple *top_stmt = stmt;
+  basic_block top_bb = gimple_bb (top_stmt); 
+  bool div_seen = false;
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+  auto_vec<gimple *> stmts; 
+  
+  stmts.safe_push (stmt);  
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+    {
+      /* Check if use_stmt is TRUNC_DIV_EXPR with same operands as stmt.  */
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  if (use_stmt == stmt)
+	    continue;
+
+	  basic_block bb = gimple_bb (use_stmt);
+	  top_bb = gimple_bb (top_stmt);
+
+	  if (top_bb == bb)
+	    {
+	      stmts.safe_push (use_stmt);
+	      if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
+		top_stmt = use_stmt;
+	    }
+	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+	    {
+	      stmts.safe_push (use_stmt);
+	      top_stmt = use_stmt;
+	    }
+	  else if (dominated_by_p (CDI_DOMINATORS, bb, top_bb))
+	    stmts.safe_push (use_stmt);
+	  else
+	    continue;
+
+	  if (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR)
+	    div_seen = true;
+	}
+    }
+
+  if (!div_seen) 
+    return false;
+
+  /* Create libcall to internal fn DIVMOD:
+     divmod_tmp = DIVMOD (op1, op2).  */
+
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (
+		build_complex_type (TREE_TYPE (op1)),
+		call_stmt, "divmod_tmp");
+  gimple_call_set_lhs (call_stmt, res);
+
+  /* Insert the call before top_stmt.  */
+  gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt);
+  gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT);
+
+  widen_mul_stats.divmod_calls_inserted++;		
+
+  /* Update all statements in stmts.
+     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = REALPART_EXPR<divmod_tmp>
+     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = IMAGPART_EXPR<divmod_tmp>.  */
+
+  bool cfg_changed = false;
+  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
+    {
+      tree new_rhs;
+
+      switch (gimple_assign_rhs_code (use_stmt))
+	{
+	  case TRUNC_DIV_EXPR:
+	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+	    break;
+
+	  case TRUNC_MOD_EXPR:
+	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	}
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
+      update_stmt (use_stmt);
+
+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+	cfg_changed = true;
+    }
+
+  stmts.release ();
+  return cfg_changed;
+}    
 
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
@@ -3828,6 +4024,8 @@ pass_optimize_widening_mul::execute (function *fun)
   bool cfg_changed = false;
 
   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
+  renumber_gimple_stmt_uids ();
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -3861,6 +4059,10 @@ pass_optimize_widening_mul::execute (function *fun)
 		    match_uaddsub_overflow (&gsi, stmt, code);
 		  break;
 
+		case TRUNC_MOD_EXPR:
+		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
+		  break;
+
 		default:;
 		}
 	    }
@@ -3907,6 +4109,8 @@ pass_optimize_widening_mul::execute (function *fun)
 			    widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
 			    widen_mul_stats.fmas_inserted);
+  statistics_counter_event (fun, "divmod calls inserted",
+			    widen_mul_stats.divmod_calls_inserted);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-24 14:52       ` Prathamesh Kulkarni
@ 2016-05-24 14:59         ` Richard Biener
  2016-05-24 16:50           ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2016-05-24 14:59 UTC (permalink / raw)
  To: Prathamesh Kulkarni
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

On Tue, 24 May 2016, Prathamesh Kulkarni wrote:

> On 24 May 2016 at 17:42, Richard Biener <rguenther@suse.de> wrote:
> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >
> >> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
> >> > <prathamesh.kulkarni@linaro.org> wrote:
> >> >> Hi,
> >> >> I have updated my patch for divmod (attached), which was originally
> >> >> based on Kugan's patch.
> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> >> >> having same operands to divmod representation, so we can cse computation of mod.
> >> >>
> >> >> t1 = a TRUNC_DIV_EXPR b;
> >> >> t2 = a TRUNC_MOD_EXPR b
> >> >> is transformed to:
> >> >> complex_tmp = DIVMOD (a, b);
> >> >> t1 = REALPART_EXPR (complex_tmp);
> >> >> t2 = IMAGPART_EXPR (complex_tmp);
> >> >>
> >> >> * New hook divmod_expand_libfunc
> >> >> The rationale for introducing the hook is that different targets have
> >> >> incompatible calling conventions for divmod libfunc.
> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
> >> >> return quotient and store remainder in argument passed as pointer,
> >> >> while the arm version takes two arguments and returns both
> >> >> quotient and remainder having mode double the size of the operand mode.
> >> >> The port should hence override the hook expand_divmod_libfunc
> >> >> to generate call to target-specific divmod.
> >> >> Ports should define this hook if:
> >> >> a) The port does not have divmod or div insn for the given mode.
> >> >> b) The port defines divmod libfunc for the given mode.
> >> >> The default hook default_expand_divmod_libfunc() generates call
> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
> >> >> are of DImode.
> >> >>
> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
> >> >> cross-tested on arm*-*-*.
> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
> >> >> Does this patch look OK ?
> >> >
> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> >> > index 6b4601b..e4a021a 100644
> >> > --- a/gcc/targhooks.c
> >> > +++ b/gcc/targhooks.c
> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
> >> > machine_mode, optimization_type)
> >> >    return true;
> >> >  }
> >> >
> >> > +void
> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
> >> > +                              rtx op0, rtx op1,
> >> > +                              rtx *quot_p, rtx *rem_p)
> >> >
> >> > functions need a comment.
> >> >
> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
> >> > case we could avoid the target hook.
> >> Well I would prefer adding the hook because that's more easier -;)
> >> Would it be ok for now to go with the hook ?
> >> >
> >> > +      /* If target overrides expand_divmod_libfunc hook
> >> > +        then perform divmod by generating call to the target-specifc divmod
> >> > libfunc.  */
> >> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
> >> > +       return true;
> >> > +
> >> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
> >> > +      return (mode == DImode && unsignedp);
> >> >
> >> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
> >> > but still restrict this to DImode && unsigned?  Also if
> >> > targetm.expand_divmod_libfunc
> >> > is not the default we expect the target to handle all modes?
> >> Ah indeed, the check for DImode is unnecessary.
> >> However I suppose the check for unsignedp should be there,
> >> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
> >
> > The optab libfunc for sdivmod should be NULL in that case.
> Ah indeed, thanks.
> >
> >> >
> >> > That said - I expected the above piece to be simply a 'return true;' ;)
> >> >
> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
> >> > supports a specific operation (for example SImode divmod would use DImode
> >> > divmod by means of widening operands - for the unsigned case of course).
> >> Thanks for pointing out. So if a target does not support divmod
> >> libfunc for a mode
> >> but for a wider mode, then we could zero-extend operands to the wider-mode,
> >> perform divmod on the wider-mode, and then cast result back to the
> >> original mode.
> >> I haven't done that in this patch, would it be OK to do that as a follow up ?
> >
> > I think that you should conservatively handle the div_optab query, thus if
> > the target has a HW division in a wider mode don't use the divmod IFN.
> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
> > if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
> > out if that is available.
> Done.
> >
> >> > +  /* Disable the transform if either is a constant, since
> >> > division-by-constant
> >> > +     may have specialized expansion.  */
> >> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
> >> > +    return false;
> >> >
> >> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
> >> >
> >> > +  if (TYPE_OVERFLOW_TRAPS (type))
> >> > +    return false;
> >> >
> >> > why's that?  Generally please first test cheap things (trapping, constant-ness)
> >> > before checking expensive stuff (target_supports_divmod_p).
> >> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
> >> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
> >> "When looking at TRUNC_DIV_EXPR you should also exclude
> >> the case where TYPE_OVERFLOW_TRAPS (type) as that should
> >> expand using the [su]divv optabs (no trapping overflow
> >> divmod optab exists)."
> >
> > Ok, didn't remember that.
> >
> >> >
> >> > +static bool
> >> > +convert_to_divmod (gassign *stmt)
> >> > +{
> >> > +  if (!divmod_candidate_p (stmt))
> >> > +    return false;
> >> > +
> >> > +  tree op1 = gimple_assign_rhs1 (stmt);
> >> > +  tree op2 = gimple_assign_rhs2 (stmt);
> >> > +
> >> > +  vec<gimple *> stmts = vNULL;
> >> >
> >> > use an auto_vec <gimple *> - you currently leak it in at least one place.
> >> >
> >> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> >> > +       cfg_changed = true;
> >> >
> >> > note that this suggests you should check whether any of the stmts may throw
> >> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and
> >> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
> >> > the list of stmts to modify.
> >> >
> >> > Btw, I think you should not add 'stmt' immediately but when iterating over
> >> > all uses also gather uses in TRUNC_MOD_EXPR.
> >> >
> >> > Otherwise looks ok.
> >> Done changes in this version. I am gathering mod uses same time as div uses,
> >> so this imposes a constraint that mod dominates mod. I am not sure if
> >> that's desirable.
> >
> > I think you also need a mod_seen variable now that you don't necessarily
> > end up with 'stmt' in the vector of stmts.  I don't see how there is a
> > constraint that mod dominates mod - it's just that the top_stmt needs
> > to dominate all other uses that can be replaced with replacing top_stmt
> > with a divmod.  It's just that the actual stmt set we choose may now
> > depend on immediate uses order which on a second thought is bad
> > as immediate uses order could be affected by debug stmts ... hmm.
> >
> > To avoid this please re-add the code adding 'stmt' to stmts immediately
> > and add a use_stmt != stmt check to the immediate use processing loop
> > so that we don't end up adding it twice.
> Well I wonder what will happen for the following case:
> t1 = x / y;
> if (cond)
>   t2 = x % y;
> else
>   t3 = x % y;
> 
> Assuming stmt == top_stmt is "t2 = x % y" and use_stmt is "t3 = x % y",
> use_stmt will not get added to stmts vector, since top_stmt and
> use_stmt are not in same bb,
> and bb's containing top_stmt and use_stmt don't dominate each other.
> Not sure if this is practical case (I assume fre will hoist mod
> outside if-else?)
> 
> Now that we immediately add stmt to stmts vector, I suppose mod_seen
> shall not be required ?

In that case mod_seen is not needed.  But the situation you say will
still happen so I wonder if we'd need a better way of iterating over
immediate uses, like first pushing all candidates into a worklist
vector and then iterating over that until we find no more candidates.

You can then also handle the case of more than one group of stmts
(the pass currently doesn't iterate in any particular useful order
over BBs).

Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-24 14:59         ` Richard Biener
@ 2016-05-24 16:50           ` Prathamesh Kulkarni
  2016-05-25  9:20             ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-05-24 16:50 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

On 24 May 2016 at 19:39, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>
>> On 24 May 2016 at 17:42, Richard Biener <rguenther@suse.de> wrote:
>> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
>> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>> >> > <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> Hi,
>> >> >> I have updated my patch for divmod (attached), which was originally
>> >> >> based on Kugan's patch.
>> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> >> >> having same operands to divmod representation, so we can cse computation of mod.
>> >> >>
>> >> >> t1 = a TRUNC_DIV_EXPR b;
>> >> >> t2 = a TRUNC_MOD_EXPR b
>> >> >> is transformed to:
>> >> >> complex_tmp = DIVMOD (a, b);
>> >> >> t1 = REALPART_EXPR (complex_tmp);
>> >> >> t2 = IMAGPART_EXPR (complex_tmp);
>> >> >>
>> >> >> * New hook divmod_expand_libfunc
>> >> >> The rationale for introducing the hook is that different targets have
>> >> >> incompatible calling conventions for divmod libfunc.
>> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
>> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> >> >> return quotient and store remainder in argument passed as pointer,
>> >> >> while the arm version takes two arguments and returns both
>> >> >> quotient and remainder having mode double the size of the operand mode.
>> >> >> The port should hence override the hook expand_divmod_libfunc
>> >> >> to generate call to target-specific divmod.
>> >> >> Ports should define this hook if:
>> >> >> a) The port does not have divmod or div insn for the given mode.
>> >> >> b) The port defines divmod libfunc for the given mode.
>> >> >> The default hook default_expand_divmod_libfunc() generates call
>> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> >> >> are of DImode.
>> >> >>
>> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> >> >> cross-tested on arm*-*-*.
>> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
>> >> >> Does this patch look OK ?
>> >> >
>> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> >> > index 6b4601b..e4a021a 100644
>> >> > --- a/gcc/targhooks.c
>> >> > +++ b/gcc/targhooks.c
>> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
>> >> > machine_mode, optimization_type)
>> >> >    return true;
>> >> >  }
>> >> >
>> >> > +void
>> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
>> >> > +                              rtx op0, rtx op1,
>> >> > +                              rtx *quot_p, rtx *rem_p)
>> >> >
>> >> > functions need a comment.
>> >> >
>> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
>> >> > case we could avoid the target hook.
>> >> Well I would prefer adding the hook because that's more easier -;)
>> >> Would it be ok for now to go with the hook ?
>> >> >
>> >> > +      /* If target overrides expand_divmod_libfunc hook
>> >> > +        then perform divmod by generating call to the target-specifc divmod
>> >> > libfunc.  */
>> >> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
>> >> > +       return true;
>> >> > +
>> >> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
>> >> > +      return (mode == DImode && unsignedp);
>> >> >
>> >> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
>> >> > but still restrict this to DImode && unsigned?  Also if
>> >> > targetm.expand_divmod_libfunc
>> >> > is not the default we expect the target to handle all modes?
>> >> Ah indeed, the check for DImode is unnecessary.
>> >> However I suppose the check for unsignedp should be there,
>> >> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
>> >
>> > The optab libfunc for sdivmod should be NULL in that case.
>> Ah indeed, thanks.
>> >
>> >> >
>> >> > That said - I expected the above piece to be simply a 'return true;' ;)
>> >> >
>> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
>> >> > supports a specific operation (for example SImode divmod would use DImode
>> >> > divmod by means of widening operands - for the unsigned case of course).
>> >> Thanks for pointing out. So if a target does not support divmod
>> >> libfunc for a mode
>> >> but for a wider mode, then we could zero-extend operands to the wider-mode,
>> >> perform divmod on the wider-mode, and then cast result back to the
>> >> original mode.
>> >> I haven't done that in this patch, would it be OK to do that as a follow up ?
>> >
>> > I think that you should conservatively handle the div_optab query, thus if
>> > the target has a HW division in a wider mode don't use the divmod IFN.
>> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
>> > if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
>> > out if that is available.
>> Done.
>> >
>> >> > +  /* Disable the transform if either is a constant, since
>> >> > division-by-constant
>> >> > +     may have specialized expansion.  */
>> >> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
>> >> > +    return false;
>> >> >
>> >> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
>> >> >
>> >> > +  if (TYPE_OVERFLOW_TRAPS (type))
>> >> > +    return false;
>> >> >
>> >> > why's that?  Generally please first test cheap things (trapping, constant-ness)
>> >> > before checking expensive stuff (target_supports_divmod_p).
>> >> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
>> >> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
>> >> "When looking at TRUNC_DIV_EXPR you should also exclude
>> >> the case where TYPE_OVERFLOW_TRAPS (type) as that should
>> >> expand using the [su]divv optabs (no trapping overflow
>> >> divmod optab exists)."
>> >
>> > Ok, didn't remember that.
>> >
>> >> >
>> >> > +static bool
>> >> > +convert_to_divmod (gassign *stmt)
>> >> > +{
>> >> > +  if (!divmod_candidate_p (stmt))
>> >> > +    return false;
>> >> > +
>> >> > +  tree op1 = gimple_assign_rhs1 (stmt);
>> >> > +  tree op2 = gimple_assign_rhs2 (stmt);
>> >> > +
>> >> > +  vec<gimple *> stmts = vNULL;
>> >> >
>> >> > use an auto_vec <gimple *> - you currently leak it in at least one place.
>> >> >
>> >> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> >> > +       cfg_changed = true;
>> >> >
>> >> > note that this suggests you should check whether any of the stmts may throw
>> >> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and
>> >> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
>> >> > the list of stmts to modify.
>> >> >
>> >> > Btw, I think you should not add 'stmt' immediately but when iterating over
>> >> > all uses also gather uses in TRUNC_MOD_EXPR.
>> >> >
>> >> > Otherwise looks ok.
>> >> Done changes in this version. I am gathering mod uses same time as div uses,
>> >> so this imposes a constraint that mod dominates mod. I am not sure if
>> >> that's desirable.
>> >
>> > I think you also need a mod_seen variable now that you don't necessarily
>> > end up with 'stmt' in the vector of stmts.  I don't see how there is a
>> > constraint that mod dominates mod - it's just that the top_stmt needs
>> > to dominate all other uses that can be replaced with replacing top_stmt
>> > with a divmod.  It's just that the actual stmt set we choose may now
>> > depend on immediate uses order which on a second thought is bad
>> > as immediate uses order could be affected by debug stmts ... hmm.
>> >
>> > To avoid this please re-add the code adding 'stmt' to stmts immediately
>> > and add a use_stmt != stmt check to the immediate use processing loop
>> > so that we don't end up adding it twice.
>> Well I wonder what will happen for the following case:
>> t1 = x / y;
>> if (cond)
>>   t2 = x % y;
>> else
>>   t3 = x % y;
>>
>> Assuming stmt == top_stmt is "t2 = x % y" and use_stmt is "t3 = x % y",
>> use_stmt will not get added to stmts vector, since top_stmt and
>> use_stmt are not in same bb,
>> and bb's containing top_stmt and use_stmt don't dominate each other.
>> Not sure if this is practical case (I assume fre will hoist mod
>> outside if-else?)
>>
>> Now that we immediately add stmt to stmts vector, I suppose mod_seen
>> shall not be required ?
>
> In that case mod_seen is not needed.  But the situation you say will
> still happen so I wonder if we'd need a better way of iterating over
> immediate uses, like first pushing all candidates into a worklist
> vector and then iterating over that until we find no more candidates.
>
> You can then also handle the case of more than one group of stmts
> (the pass currently doesn't iterate in any particular useful order
> over BBs).
IIUC, we want to perform the transform if:
i) there exists top_stmt with code trunc_div_expr/trunc_mod_expr and
having same operands as stmt.
ii) top_stmt dominates all other stmts with code
trunc_div_expr/trunc_mod_expr and having same operands as top_stmt.

Firstly, we try to get to top_stmt if it exists, by iterating over uses of stmt,
and then iterate over all uses of top_stmt and add them to stmts vector
only if top_stmt dominates all the stmts with same operands as top_stmt
and have code trunc_div_expr/trunc_mod_expr.

/* Get to top_stmt.  */
top_stmt = stmt;
top_bb = gimple_bb (stmt);

FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
{
  if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
      && use_stmt has same operands as stmt)
    {
      if (gimple_bb (use_stmt) dominates top_bb)
        {
          top_bb = gimple_bb (use_stmt);
          top_stmt = use_stmt;
        }
      else if (gimple_bb (use_stmt) == top_stmt
               && gimple_uid (use_stmt) < top_stmt)
        top_stmt = use_stmt;
    }
}

/* Speculatively consider top_stmt as dominating all other
div_expr/mod_expr stmts with same operands as stmt.  */
stmts.safe_push (top_stmt);

/* Walk uses of top_stmt to ensure that all stmts are dominated by top_stmt.  */
top_op1 = gimple_assign_rhs1 (top_stmt);
FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
{
  if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
      && use_stmt has same operands as top_stmt)
    {
      if (use_stmt == top_stmt)
        continue;

      /* No top_stmt exits, do not proceed with transform  */
      if (top_bb does not dominate gimple_bb (use_stmt))
        return false;

      stmts.safe_push (use_stmt);
    }
}

For the case:
t1 = x / y;
if (cond)
  t2 = x % y;
else
  t3 = x % y;

Assuming stmt is "t2 = x % y", it will walk uses of stmt and set
top_stmt to "t1 = x / y"
Then it will walk all uses of top_stmt:
"t2 = x % y" -> dominated by top_stmt
"t3 = x % y" -> dominated by top_stmt
Since all stmts are dominated by top_stmt, we add all three stmts to
vector of stmts and proceed with transform.

For the case where, top_stmt dominates original stmt but not all stmts:

if (cond)
  t1 = x / y;
else
{
  t2 = x % y;
  return;
}

t3 = x % y;

Assuming stmt is "t3 = x % y",
Walking stmt uses will set top_stmt to "t1 = x / y";

Walking immediate uses of top_stmt, we find that "t2 = x % y" is not
dominated by top_stmt,
and hence don't do the transform.

Does this sound reasonable ?

Thanks,
Prathamesh
>
> Richard.
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-24 16:50           ` Prathamesh Kulkarni
@ 2016-05-25  9:20             ` Richard Biener
  2016-05-25 13:33               ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2016-05-25  9:20 UTC (permalink / raw)
  To: Prathamesh Kulkarni
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

On Tue, 24 May 2016, Prathamesh Kulkarni wrote:

> On 24 May 2016 at 19:39, Richard Biener <rguenther@suse.de> wrote:
> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >
> >> On 24 May 2016 at 17:42, Richard Biener <rguenther@suse.de> wrote:
> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
> >> >> > <prathamesh.kulkarni@linaro.org> wrote:
> >> >> >> Hi,
> >> >> >> I have updated my patch for divmod (attached), which was originally
> >> >> >> based on Kugan's patch.
> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> >> >> >> having same operands to divmod representation, so we can cse computation of mod.
> >> >> >>
> >> >> >> t1 = a TRUNC_DIV_EXPR b;
> >> >> >> t2 = a TRUNC_MOD_EXPR b
> >> >> >> is transformed to:
> >> >> >> complex_tmp = DIVMOD (a, b);
> >> >> >> t1 = REALPART_EXPR (complex_tmp);
> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);
> >> >> >>
> >> >> >> * New hook divmod_expand_libfunc
> >> >> >> The rationale for introducing the hook is that different targets have
> >> >> >> incompatible calling conventions for divmod libfunc.
> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
> >> >> >> return quotient and store remainder in argument passed as pointer,
> >> >> >> while the arm version takes two arguments and returns both
> >> >> >> quotient and remainder having mode double the size of the operand mode.
> >> >> >> The port should hence override the hook expand_divmod_libfunc
> >> >> >> to generate call to target-specific divmod.
> >> >> >> Ports should define this hook if:
> >> >> >> a) The port does not have divmod or div insn for the given mode.
> >> >> >> b) The port defines divmod libfunc for the given mode.
> >> >> >> The default hook default_expand_divmod_libfunc() generates call
> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
> >> >> >> are of DImode.
> >> >> >>
> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
> >> >> >> cross-tested on arm*-*-*.
> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
> >> >> >> Does this patch look OK ?
> >> >> >
> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> >> >> > index 6b4601b..e4a021a 100644
> >> >> > --- a/gcc/targhooks.c
> >> >> > +++ b/gcc/targhooks.c
> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
> >> >> > machine_mode, optimization_type)
> >> >> >    return true;
> >> >> >  }
> >> >> >
> >> >> > +void
> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
> >> >> > +                              rtx op0, rtx op1,
> >> >> > +                              rtx *quot_p, rtx *rem_p)
> >> >> >
> >> >> > functions need a comment.
> >> >> >
> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
> >> >> > case we could avoid the target hook.
> >> >> Well I would prefer adding the hook because that's more easier -;)
> >> >> Would it be ok for now to go with the hook ?
> >> >> >
> >> >> > +      /* If target overrides expand_divmod_libfunc hook
> >> >> > +        then perform divmod by generating call to the target-specifc divmod
> >> >> > libfunc.  */
> >> >> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
> >> >> > +       return true;
> >> >> > +
> >> >> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
> >> >> > +      return (mode == DImode && unsignedp);
> >> >> >
> >> >> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
> >> >> > but still restrict this to DImode && unsigned?  Also if
> >> >> > targetm.expand_divmod_libfunc
> >> >> > is not the default we expect the target to handle all modes?
> >> >> Ah indeed, the check for DImode is unnecessary.
> >> >> However I suppose the check for unsignedp should be there,
> >> >> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
> >> >
> >> > The optab libfunc for sdivmod should be NULL in that case.
> >> Ah indeed, thanks.
> >> >
> >> >> >
> >> >> > That said - I expected the above piece to be simply a 'return true;' ;)
> >> >> >
> >> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
> >> >> > supports a specific operation (for example SImode divmod would use DImode
> >> >> > divmod by means of widening operands - for the unsigned case of course).
> >> >> Thanks for pointing out. So if a target does not support divmod
> >> >> libfunc for a mode
> >> >> but for a wider mode, then we could zero-extend operands to the wider-mode,
> >> >> perform divmod on the wider-mode, and then cast result back to the
> >> >> original mode.
> >> >> I haven't done that in this patch, would it be OK to do that as a follow up ?
> >> >
> >> > I think that you should conservatively handle the div_optab query, thus if
> >> > the target has a HW division in a wider mode don't use the divmod IFN.
> >> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
> >> > if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
> >> > out if that is available.
> >> Done.
> >> >
> >> >> > +  /* Disable the transform if either is a constant, since
> >> >> > division-by-constant
> >> >> > +     may have specialized expansion.  */
> >> >> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
> >> >> > +    return false;
> >> >> >
> >> >> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
> >> >> >
> >> >> > +  if (TYPE_OVERFLOW_TRAPS (type))
> >> >> > +    return false;
> >> >> >
> >> >> > why's that?  Generally please first test cheap things (trapping, constant-ness)
> >> >> > before checking expensive stuff (target_supports_divmod_p).
> >> >> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
> >> >> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
> >> >> "When looking at TRUNC_DIV_EXPR you should also exclude
> >> >> the case where TYPE_OVERFLOW_TRAPS (type) as that should
> >> >> expand using the [su]divv optabs (no trapping overflow
> >> >> divmod optab exists)."
> >> >
> >> > Ok, didn't remember that.
> >> >
> >> >> >
> >> >> > +static bool
> >> >> > +convert_to_divmod (gassign *stmt)
> >> >> > +{
> >> >> > +  if (!divmod_candidate_p (stmt))
> >> >> > +    return false;
> >> >> > +
> >> >> > +  tree op1 = gimple_assign_rhs1 (stmt);
> >> >> > +  tree op2 = gimple_assign_rhs2 (stmt);
> >> >> > +
> >> >> > +  vec<gimple *> stmts = vNULL;
> >> >> >
> >> >> > use an auto_vec <gimple *> - you currently leak it in at least one place.
> >> >> >
> >> >> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> >> >> > +       cfg_changed = true;
> >> >> >
> >> >> > note that this suggests you should check whether any of the stmts may throw
> >> >> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and
> >> >> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
> >> >> > the list of stmts to modify.
> >> >> >
> >> >> > Btw, I think you should not add 'stmt' immediately but when iterating over
> >> >> > all uses also gather uses in TRUNC_MOD_EXPR.
> >> >> >
> >> >> > Otherwise looks ok.
> >> >> Done changes in this version. I am gathering mod uses same time as div uses,
> >> >> so this imposes a constraint that mod dominates mod. I am not sure if
> >> >> that's desirable.
> >> >
> >> > I think you also need a mod_seen variable now that you don't necessarily
> >> > end up with 'stmt' in the vector of stmts.  I don't see how there is a
> >> > constraint that mod dominates mod - it's just that the top_stmt needs
> >> > to dominate all other uses that can be replaced with replacing top_stmt
> >> > with a divmod.  It's just that the actual stmt set we choose may now
> >> > depend on immediate uses order which on a second thought is bad
> >> > as immediate uses order could be affected by debug stmts ... hmm.
> >> >
> >> > To avoid this please re-add the code adding 'stmt' to stmts immediately
> >> > and add a use_stmt != stmt check to the immediate use processing loop
> >> > so that we don't end up adding it twice.
> >> Well I wonder what will happen for the following case:
> >> t1 = x / y;
> >> if (cond)
> >>   t2 = x % y;
> >> else
> >>   t3 = x % y;
> >>
> >> Assuming stmt == top_stmt is "t2 = x % y" and use_stmt is "t3 = x % y",
> >> use_stmt will not get added to stmts vector, since top_stmt and
> >> use_stmt are not in same bb,
> >> and bb's containing top_stmt and use_stmt don't dominate each other.
> >> Not sure if this is practical case (I assume fre will hoist mod
> >> outside if-else?)
> >>
> >> Now that we immediately add stmt to stmts vector, I suppose mod_seen
> >> shall not be required ?
> >
> > In that case mod_seen is not needed.  But the situation you say will
> > still happen so I wonder if we'd need a better way of iterating over
> > immediate uses, like first pushing all candidates into a worklist
> > vector and then iterating over that until we find no more candidates.
> >
> > You can then also handle the case of more than one group of stmts
> > (the pass currently doesn't iterate in any particular useful order
> > over BBs).
> IIUC, we want to perform the transform if:
> i) there exists top_stmt with code trunc_div_expr/trunc_mod_expr and
> having same operands as stmt.
> ii) top_stmt dominates all other stmts with code
> trunc_div_expr/trunc_mod_expr and having same operands as top_stmt.
> 
> Firstly, we try to get to top_stmt if it exists, by iterating over uses of stmt,
> and then iterate over all uses of top_stmt and add them to stmts vector
> only if top_stmt dominates all the stmts with same operands as top_stmt
> and have code trunc_div_expr/trunc_mod_expr.
> 
> /* Get to top_stmt.  */
> top_stmt = stmt;
> top_bb = gimple_bb (stmt);
> 
> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
> {
>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
>       && use_stmt has same operands as stmt)
>     {
>       if (gimple_bb (use_stmt) dominates top_bb)
>         {
>           top_bb = gimple_bb (use_stmt);
>           top_stmt = use_stmt;
>         }
>       else if (gimple_bb (use_stmt) == top_stmt
>                && gimple_uid (use_stmt) < top_stmt)
>         top_stmt = use_stmt;
>     }
> }
> 
> /* Speculatively consider top_stmt as dominating all other
> div_expr/mod_expr stmts with same operands as stmt.  */
> stmts.safe_push (top_stmt);
> 
> /* Walk uses of top_stmt to ensure that all stmts are dominated by top_stmt.  */
> top_op1 = gimple_assign_rhs1 (top_stmt);
> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
> {
>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
>       && use_stmt has same operands as top_stmt)
>     {
>       if (use_stmt == top_stmt)
>         continue;
> 
>       /* No top_stmt exits, do not proceed with transform  */
>       if (top_bb does not dominate gimple_bb (use_stmt))
>         return false;
> 
>       stmts.safe_push (use_stmt);
>     }
> }
> 
> For the case:
> t1 = x / y;
> if (cond)
>   t2 = x % y;
> else
>   t3 = x % y;
> 
> Assuming stmt is "t2 = x % y", it will walk uses of stmt and set
> top_stmt to "t1 = x / y"
> Then it will walk all uses of top_stmt:
> "t2 = x % y" -> dominated by top_stmt
> "t3 = x % y" -> dominated by top_stmt
> Since all stmts are dominated by top_stmt, we add all three stmts to
> vector of stmts and proceed with transform.
> 
> For the case where, top_stmt dominates original stmt but not all stmts:
> 
> if (cond)
>   t1 = x / y;
> else
> {
>   t2 = x % y;
>   return;
> }
> 
> t3 = x % y;
> 
> Assuming stmt is "t3 = x % y",
> Walking stmt uses will set top_stmt to "t1 = x / y";
> 
> Walking immediate uses of top_stmt, we find that "t2 = x % y" is not
> dominated by top_stmt,
> and hence don't do the transform.
> 
> Does this sound reasonable ?

Yes, that's reasonable.

Richard.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-25  9:20             ` Richard Biener
@ 2016-05-25 13:33               ` Prathamesh Kulkarni
  2016-05-27 12:05                 ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-05-25 13:33 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 13728 bytes --]

On 25 May 2016 at 12:52, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>
>> On 24 May 2016 at 19:39, Richard Biener <rguenther@suse.de> wrote:
>> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 24 May 2016 at 17:42, Richard Biener <rguenther@suse.de> wrote:
>> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
>> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>> >> >> > <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >> Hi,
>> >> >> >> I have updated my patch for divmod (attached), which was originally
>> >> >> >> based on Kugan's patch.
>> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> >> >> >> having same operands to divmod representation, so we can cse computation of mod.
>> >> >> >>
>> >> >> >> t1 = a TRUNC_DIV_EXPR b;
>> >> >> >> t2 = a TRUNC_MOD_EXPR b
>> >> >> >> is transformed to:
>> >> >> >> complex_tmp = DIVMOD (a, b);
>> >> >> >> t1 = REALPART_EXPR (complex_tmp);
>> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);
>> >> >> >>
>> >> >> >> * New hook divmod_expand_libfunc
>> >> >> >> The rationale for introducing the hook is that different targets have
>> >> >> >> incompatible calling conventions for divmod libfunc.
>> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
>> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> >> >> >> return quotient and store remainder in argument passed as pointer,
>> >> >> >> while the arm version takes two arguments and returns both
>> >> >> >> quotient and remainder having mode double the size of the operand mode.
>> >> >> >> The port should hence override the hook expand_divmod_libfunc
>> >> >> >> to generate call to target-specific divmod.
>> >> >> >> Ports should define this hook if:
>> >> >> >> a) The port does not have divmod or div insn for the given mode.
>> >> >> >> b) The port defines divmod libfunc for the given mode.
>> >> >> >> The default hook default_expand_divmod_libfunc() generates call
>> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> >> >> >> are of DImode.
>> >> >> >>
>> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> >> >> >> cross-tested on arm*-*-*.
>> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
>> >> >> >> Does this patch look OK ?
>> >> >> >
>> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> >> >> > index 6b4601b..e4a021a 100644
>> >> >> > --- a/gcc/targhooks.c
>> >> >> > +++ b/gcc/targhooks.c
>> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
>> >> >> > machine_mode, optimization_type)
>> >> >> >    return true;
>> >> >> >  }
>> >> >> >
>> >> >> > +void
>> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
>> >> >> > +                              rtx op0, rtx op1,
>> >> >> > +                              rtx *quot_p, rtx *rem_p)
>> >> >> >
>> >> >> > functions need a comment.
>> >> >> >
>> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
>> >> >> > case we could avoid the target hook.
>> >> >> Well I would prefer adding the hook because that's more easier -;)
>> >> >> Would it be ok for now to go with the hook ?
>> >> >> >
>> >> >> > +      /* If target overrides expand_divmod_libfunc hook
>> >> >> > +        then perform divmod by generating call to the target-specifc divmod
>> >> >> > libfunc.  */
>> >> >> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
>> >> >> > +       return true;
>> >> >> > +
>> >> >> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
>> >> >> > +      return (mode == DImode && unsignedp);
>> >> >> >
>> >> >> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
>> >> >> > but still restrict this to DImode && unsigned?  Also if
>> >> >> > targetm.expand_divmod_libfunc
>> >> >> > is not the default we expect the target to handle all modes?
>> >> >> Ah indeed, the check for DImode is unnecessary.
>> >> >> However I suppose the check for unsignedp should be there,
>> >> >> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
>> >> >
>> >> > The optab libfunc for sdivmod should be NULL in that case.
>> >> Ah indeed, thanks.
>> >> >
>> >> >> >
>> >> >> > That said - I expected the above piece to be simply a 'return true;' ;)
>> >> >> >
>> >> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
>> >> >> > supports a specific operation (for example SImode divmod would use DImode
>> >> >> > divmod by means of widening operands - for the unsigned case of course).
>> >> >> Thanks for pointing out. So if a target does not support divmod
>> >> >> libfunc for a mode
>> >> >> but for a wider mode, then we could zero-extend operands to the wider-mode,
>> >> >> perform divmod on the wider-mode, and then cast result back to the
>> >> >> original mode.
>> >> >> I haven't done that in this patch, would it be OK to do that as a follow up ?
>> >> >
>> >> > I think that you should conservatively handle the div_optab query, thus if
>> >> > the target has a HW division in a wider mode don't use the divmod IFN.
>> >> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
>> >> > if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
>> >> > out if that is available.
>> >> Done.
>> >> >
>> >> >> > +  /* Disable the transform if either is a constant, since
>> >> >> > division-by-constant
>> >> >> > +     may have specialized expansion.  */
>> >> >> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
>> >> >> > +    return false;
>> >> >> >
>> >> >> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
>> >> >> >
>> >> >> > +  if (TYPE_OVERFLOW_TRAPS (type))
>> >> >> > +    return false;
>> >> >> >
>> >> >> > why's that?  Generally please first test cheap things (trapping, constant-ness)
>> >> >> > before checking expensive stuff (target_supports_divmod_p).
>> >> >> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
>> >> >> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
>> >> >> "When looking at TRUNC_DIV_EXPR you should also exclude
>> >> >> the case where TYPE_OVERFLOW_TRAPS (type) as that should
>> >> >> expand using the [su]divv optabs (no trapping overflow
>> >> >> divmod optab exists)."
>> >> >
>> >> > Ok, didn't remember that.
>> >> >
>> >> >> >
>> >> >> > +static bool
>> >> >> > +convert_to_divmod (gassign *stmt)
>> >> >> > +{
>> >> >> > +  if (!divmod_candidate_p (stmt))
>> >> >> > +    return false;
>> >> >> > +
>> >> >> > +  tree op1 = gimple_assign_rhs1 (stmt);
>> >> >> > +  tree op2 = gimple_assign_rhs2 (stmt);
>> >> >> > +
>> >> >> > +  vec<gimple *> stmts = vNULL;
>> >> >> >
>> >> >> > use an auto_vec <gimple *> - you currently leak it in at least one place.
>> >> >> >
>> >> >> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> >> >> > +       cfg_changed = true;
>> >> >> >
>> >> >> > note that this suggests you should check whether any of the stmts may throw
>> >> >> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and
>> >> >> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
>> >> >> > the list of stmts to modify.
>> >> >> >
>> >> >> > Btw, I think you should not add 'stmt' immediately but when iterating over
>> >> >> > all uses also gather uses in TRUNC_MOD_EXPR.
>> >> >> >
>> >> >> > Otherwise looks ok.
>> >> >> Done changes in this version. I am gathering mod uses same time as div uses,
>> >> >> so this imposes a constraint that mod dominates mod. I am not sure if
>> >> >> that's desirable.
>> >> >
>> >> > I think you also need a mod_seen variable now that you don't necessarily
>> >> > end up with 'stmt' in the vector of stmts.  I don't see how there is a
>> >> > constraint that mod dominates mod - it's just that the top_stmt needs
>> >> > to dominate all other uses that can be replaced with replacing top_stmt
>> >> > with a divmod.  It's just that the actual stmt set we choose may now
>> >> > depend on immediate uses order which on a second thought is bad
>> >> > as immediate uses order could be affected by debug stmts ... hmm.
>> >> >
>> >> > To avoid this please re-add the code adding 'stmt' to stmts immediately
>> >> > and add a use_stmt != stmt check to the immediate use processing loop
>> >> > so that we don't end up adding it twice.
>> >> Well I wonder what will happen for the following case:
>> >> t1 = x / y;
>> >> if (cond)
>> >>   t2 = x % y;
>> >> else
>> >>   t3 = x % y;
>> >>
>> >> Assuming stmt == top_stmt is "t2 = x % y" and use_stmt is "t3 = x % y",
>> >> use_stmt will not get added to stmts vector, since top_stmt and
>> >> use_stmt are not in same bb,
>> >> and bb's containing top_stmt and use_stmt don't dominate each other.
>> >> Not sure if this is practical case (I assume fre will hoist mod
>> >> outside if-else?)
>> >>
>> >> Now that we immediately add stmt to stmts vector, I suppose mod_seen
>> >> shall not be required ?
>> >
>> > In that case mod_seen is not needed.  But the situation you say will
>> > still happen so I wonder if we'd need a better way of iterating over
>> > immediate uses, like first pushing all candidates into a worklist
>> > vector and then iterating over that until we find no more candidates.
>> >
>> > You can then also handle the case of more than one group of stmts
>> > (the pass currently doesn't iterate in any particular useful order
>> > over BBs).
>> IIUC, we want to perform the transform if:
>> i) there exists top_stmt with code trunc_div_expr/trunc_mod_expr and
>> having same operands as stmt.
>> ii) top_stmt dominates all other stmts with code
>> trunc_div_expr/trunc_mod_expr and having same operands as top_stmt.
>>
>> Firstly, we try to get to top_stmt if it exists, by iterating over uses of stmt,
>> and then iterate over all uses of top_stmt and add them to stmts vector
>> only if top_stmt dominates all the stmts with same operands as top_stmt
>> and have code trunc_div_expr/trunc_mod_expr.
>>
>> /* Get to top_stmt.  */
>> top_stmt = stmt;
>> top_bb = gimple_bb (stmt);
>>
>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
>> {
>>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
>>       && use_stmt has same operands as stmt)
>>     {
>>       if (gimple_bb (use_stmt) dominates top_bb)
>>         {
>>           top_bb = gimple_bb (use_stmt);
>>           top_stmt = use_stmt;
>>         }
>>       else if (gimple_bb (use_stmt) == top_stmt
>>                && gimple_uid (use_stmt) < top_stmt)
>>         top_stmt = use_stmt;
>>     }
>> }
>>
>> /* Speculatively consider top_stmt as dominating all other
>> div_expr/mod_expr stmts with same operands as stmt.  */
>> stmts.safe_push (top_stmt);
>>
>> /* Walk uses of top_stmt to ensure that all stmts are dominated by top_stmt.  */
>> top_op1 = gimple_assign_rhs1 (top_stmt);
>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
>> {
>>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
>>       && use_stmt has same operands as top_stmt)
>>     {
>>       if (use_stmt == top_stmt)
>>         continue;
>>
>>       /* No top_stmt exits, do not proceed with transform  */
>>       if (top_bb does not dominate gimple_bb (use_stmt))
>>         return false;
>>
>>       stmts.safe_push (use_stmt);
>>     }
>> }
>>
>> For the case:
>> t1 = x / y;
>> if (cond)
>>   t2 = x % y;
>> else
>>   t3 = x % y;
>>
>> Assuming stmt is "t2 = x % y", it will walk uses of stmt and set
>> top_stmt to "t1 = x / y"
>> Then it will walk all uses of top_stmt:
>> "t2 = x % y" -> dominated by top_stmt
>> "t3 = x % y" -> dominated by top_stmt
>> Since all stmts are dominated by top_stmt, we add all three stmts to
>> vector of stmts and proceed with transform.
>>
>> For the case where, top_stmt dominates original stmt but not all stmts:
>>
>> if (cond)
>>   t1 = x / y;
>> else
>> {
>>   t2 = x % y;
>>   return;
>> }
>>
>> t3 = x % y;
>>
>> Assuming stmt is "t3 = x % y",
>> Walking stmt uses will set top_stmt to "t1 = x / y";
>>
>> Walking immediate uses of top_stmt, we find that "t2 = x % y" is not
>> dominated by top_stmt,
>> and hence don't do the transform.
>>
>> Does this sound reasonable ?
>
> Yes, that's reasonable.
Thanks, I have attached patch that implements above approach.
Does it look OK ?

The patch does not still handle the following case:
int f(int x, int y)
{
  extern int cond;
  int q, r;

  if (cond)
    q = x % y;
  else
    q = x % y;

  r = x % y;
  return q + r;
}

In above case although the mod stmt is not dominated by either div
stmt, I suppose the transform
is still possible by inserting DIVMOD (x, y) before if-else ?

For the following test-case, I am surprised why CSE didn't take place before
widening_mul pass ?

int
f_1 (int x, int y)
{
  int q = x / y;
  int r1 = 0, r2 = 0;
  if (cond)
    r1 = x % y;
  else
    r2 = x % y;
  return q + r1 + r2;
}

The input to widening_mul pass is:
f_1 (int x, int y)
{
  int r2;
  int r1;
  int q;
  int cond.0_1;
  int _2;
  int _11;

  <bb 2>:
  q_7 = x_5(D) / y_6(D);
  cond.0_1 = cond;
  if (cond.0_1 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  r1_9 = x_5(D) % y_6(D);
  goto <bb 5>;

  <bb 4>:
  r2_10 = x_5(D) % y_6(D);

  <bb 5>:
  # r1_3 = PHI <r1_9(3), 0(4)>
  # r2_4 = PHI <0(3), r2_10(4)>
  _2 = r1_3 + q_7;
  _11 = _2 + r2_4;
  return _11;

}

Thanks,
Prathamesh
>
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > Richard.
>> >
>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

[-- Attachment #2: divmod-part1_4.diff --]
[-- Type: text/plain, Size: 14758 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8c7f2a1..111f19f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6963,6 +6963,12 @@ This is firstly introduced on ARM/AArch64 targets, please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool @var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx *@var{quot}, rtx *@var{rem})
+Define this hook if the port does not have hardware div and divmod insn for
+the given mode but has divmod libfunc, which is incompatible
+with libgcc2.c:__udivmoddi4
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f963a58..2c9a800 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4848,6 +4848,8 @@ them: try the first ones in this list first.
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index c867ddc..0cb59f7 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2276,6 +2276,48 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+    optab_libfunc for udivmod/sdivmod.  */
+
+static void 
+expand_DIVMOD (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  bool unsignedp = TYPE_UNSIGNED (type);
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx quotient, remainder;
+
+  /* Check if optab handler exists for [u]divmod.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+    {
+      quotient = gen_reg_rtx (mode);
+      remainder = gen_reg_rtx (mode);
+      expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
+    }
+  else
+    targetm.expand_divmod_libfunc (unsignedp, mode, op0, op1,
+				   &quotient, &remainder);
+
+  /* Wrap the return value (quotient, remainder) within COMPLEX_EXPR.  */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+		       make_tree (TREE_TYPE (arg0), quotient),
+		       make_tree (TREE_TYPE (arg1), remainder)),
+	       target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Return true if FN is supported for the types in TYPES when the
    optimization type is OPT_TYPE.  The types are those associated with
    the "type0" and "type1" fields of FN's direct_internal_fn_info
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index e729d85..56a80f1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -194,6 +194,9 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* Divmod function.  */
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/target.def b/gcc/target.def
index 6392e73..4496f9a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4948,6 +4948,16 @@ Normally, this is not needed.",
  bool, (const_tree field, machine_mode mode),
  default_member_type_forces_blk)
 
+/* See tree-ssa-math-opts.c:divmod_candidate_p for conditions that gate
+   the divmod transform.  */
+DEFHOOK
+(expand_divmod_libfunc,
+ "Define this hook if the port does not have hardware div and divmod insn for\n\
+the given mode but has divmod libfunc, which is incompatible\n\
+with libgcc2.c:__udivmoddi4",
+ void, (bool unsignedp, machine_mode mode, rtx, rtx, rtx *quot, rtx *rem),
+ default_expand_divmod_libfunc)
+
 /* Return the class for a secondary reload, and fill in extra information.  */
 DEFHOOK
 (secondary_reload,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 6b4601b..20327a6 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode, machine_mode, optimization_type)
   return true;
 }
 
+/* Generate call to
+   DImode __udivmoddi4 (DImode op0, DImode op1, DImode *rem).  */
+
+void
+default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
+			       rtx op0, rtx op1,
+			       rtx *quot_p, rtx *rem_p)
+{
+  gcc_assert (mode == DImode);
+  gcc_assert (unsignedp);
+
+  rtx libfunc = optab_libfunc (udivmod_optab, DImode);
+  gcc_assert (libfunc);
+
+  rtx remainder = assign_stack_temp (DImode, GET_MODE_SIZE (DImode));
+  rtx address = XEXP (remainder, 0);
+
+  rtx quotient = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+					  DImode, 3,
+					  op0, GET_MODE (op0),
+					  op1, GET_MODE (op1),
+					  address, GET_MODE (address));
+
+  *quot_p = quotient;
+  *rem_p = remainder;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 7687c39..dc5e8e7 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -254,4 +254,7 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 
+extern void default_expand_divmod_libfunc (bool, machine_mode,
+					   rtx, rtx, rtx *, rtx *);
+
 #endif /* GCC_TARGHOOKS_H */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 81688cd..7b6d983 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -112,6 +112,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
+#include "targhooks.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -184,6 +187,9 @@ static struct
 
   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -3784,6 +3790,233 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
   return true;
 }
 
+/* Return true if target has support for divmod.  */
+
+static bool
+target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode)
+{
+  /* If target supports hardware divmod insn, use it for divmod.  */
+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing)
+    return true;
+
+  /* Check if libfunc for divmod is available.  */
+  if (optab_libfunc (divmod_optab, mode) != NULL_RTX)
+    {
+      /* If optab_handler exists for div_optab, perhaps in a wider mode,
+	 we don't want to use the libfunc even if it exists for given mode.  */ 
+      for (machine_mode div_mode = mode;
+	   div_mode != VOIDmode;
+	   div_mode = GET_MODE_WIDER_MODE (div_mode))
+	if (optab_handler (div_optab, div_mode) != CODE_FOR_nothing)
+	  return false;
+ 
+      return true;
+    }
+
+  return false;
+}
+
+/* Check if stmt is candidate for divmod transform.  */
+
+static bool
+divmod_candidate_p (gassign *stmt)
+{
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  enum machine_mode mode = TYPE_MODE (type);
+  optab divmod_optab, div_optab;
+
+  if (TYPE_UNSIGNED (type))
+    {
+      divmod_optab = udivmod_optab;
+      div_optab = udiv_optab;
+    }
+  else
+    {
+      divmod_optab = sdivmod_optab;
+      div_optab = sdiv_optab;
+    }
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  /* Disable the transform if either is a constant, since division-by-constant
+     may have specialized expansion.  */
+  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
+    return false;
+
+  /* Exclude the case where TYPE_OVERFLOW_TRAPS (type) as that should
+     expand using the [su]divv optabs.  */
+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;
+  
+  if (!target_supports_divmod_p (divmod_optab, div_optab, mode))
+    return false;
+
+  return true;
+}
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   For conditions enabling the transform see divmod_candidate_p().
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a
+      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+      it to stmts vector.
+   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+      dominates other div/mod stmts with same operands) and update entries in
+      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+      IMAGPART_EXPR for mod).  */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  if (!divmod_candidate_p (stmt))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+  
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+  auto_vec<gimple *> stmts; 
+
+  gimple *top_stmt = NULL;
+  basic_block top_bb = NULL;
+
+  /* Try to set top_stmt to "topmost" stmt
+     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.  */
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  basic_block bb = gimple_bb (use_stmt);
+
+	  if (top_bb == NULL)
+	    {
+	      top_stmt = stmt;
+	      top_bb = bb;
+	    }
+	  else if (bb == top_bb && gimple_uid (use_stmt) < gimple_uid (top_stmt))
+	    top_stmt = use_stmt;
+	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+	    {
+	      top_bb = bb;
+	      top_stmt = use_stmt;
+	    }
+	}
+    }
+
+  if (top_stmt == NULL)
+    return false; 
+
+  bool div_seen = false;
+  bool mod_seen = false;
+
+  tree top_op1 = gimple_assign_rhs1 (top_stmt);
+  tree top_op2 = gimple_assign_rhs2 (top_stmt);
+
+  stmts.safe_push (top_stmt);
+  if (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR)
+    div_seen = true;
+  else if (gimple_assign_rhs_code (top_stmt) == TRUNC_MOD_EXPR)
+    mod_seen = true;
+
+  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */    
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (top_op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (top_op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (use_stmt == top_stmt)
+	    continue;	  
+
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
+	    {
+	      end_imm_use_stmt_traverse (&use_iter);
+	      return false;
+	    }
+
+	  stmts.safe_push (use_stmt);
+	  if (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR)
+	    div_seen = true;
+	  else if (gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	    mod_seen = true;
+	}
+    }
+
+  if (!(div_seen && mod_seen)) 
+    return false;
+
+  /* Create libcall to internal fn DIVMOD:
+     divmod_tmp = DIVMOD (op1, op2).  */
+
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (
+		build_complex_type (TREE_TYPE (op1)),
+		call_stmt, "divmod_tmp");
+  gimple_call_set_lhs (call_stmt, res);
+
+  /* Insert the call before top_stmt.  */
+  gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt);
+  gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT);
+
+  widen_mul_stats.divmod_calls_inserted++;		
+
+  /* Update all statements in stmts.
+     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = REALPART_EXPR<divmod_tmp>
+     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = IMAGPART_EXPR<divmod_tmp>.  */
+
+  bool cfg_changed = false;
+  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
+    {
+      tree new_rhs;
+
+      switch (gimple_assign_rhs_code (use_stmt))
+	{
+	  case TRUNC_DIV_EXPR:
+	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+	    break;
+
+	  case TRUNC_MOD_EXPR:
+	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	}
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
+      update_stmt (use_stmt);
+
+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+	cfg_changed = true;
+    }
+
+  stmts.release ();
+  return cfg_changed;
+}    
 
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
@@ -3828,6 +4061,8 @@ pass_optimize_widening_mul::execute (function *fun)
   bool cfg_changed = false;
 
   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
+  renumber_gimple_stmt_uids ();
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -3861,6 +4096,10 @@ pass_optimize_widening_mul::execute (function *fun)
 		    match_uaddsub_overflow (&gsi, stmt, code);
 		  break;
 
+		case TRUNC_MOD_EXPR:
+		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
+		  break;
+
 		default:;
 		}
 	    }
@@ -3907,6 +4146,8 @@ pass_optimize_widening_mul::execute (function *fun)
 			    widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
 			    widen_mul_stats.fmas_inserted);
+  statistics_counter_event (fun, "divmod calls inserted",
+			    widen_mul_stats.divmod_calls_inserted);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-25 13:33               ` Prathamesh Kulkarni
@ 2016-05-27 12:05                 ` Richard Biener
  2016-05-27 12:41                   ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2016-05-27 12:05 UTC (permalink / raw)
  To: Prathamesh Kulkarni
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

On Wed, 25 May 2016, Prathamesh Kulkarni wrote:

> On 25 May 2016 at 12:52, Richard Biener <rguenther@suse.de> wrote:
> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >
> >> On 24 May 2016 at 19:39, Richard Biener <rguenther@suse.de> wrote:
> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 24 May 2016 at 17:42, Richard Biener <rguenther@suse.de> wrote:
> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
> >> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
> >> >> >> > <prathamesh.kulkarni@linaro.org> wrote:
> >> >> >> >> Hi,
> >> >> >> >> I have updated my patch for divmod (attached), which was originally
> >> >> >> >> based on Kugan's patch.
> >> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> >> >> >> >> having same operands to divmod representation, so we can cse computation of mod.
> >> >> >> >>
> >> >> >> >> t1 = a TRUNC_DIV_EXPR b;
> >> >> >> >> t2 = a TRUNC_MOD_EXPR b
> >> >> >> >> is transformed to:
> >> >> >> >> complex_tmp = DIVMOD (a, b);
> >> >> >> >> t1 = REALPART_EXPR (complex_tmp);
> >> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);
> >> >> >> >>
> >> >> >> >> * New hook divmod_expand_libfunc
> >> >> >> >> The rationale for introducing the hook is that different targets have
> >> >> >> >> incompatible calling conventions for divmod libfunc.
> >> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
> >> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
> >> >> >> >> return quotient and store remainder in argument passed as pointer,
> >> >> >> >> while the arm version takes two arguments and returns both
> >> >> >> >> quotient and remainder having mode double the size of the operand mode.
> >> >> >> >> The port should hence override the hook expand_divmod_libfunc
> >> >> >> >> to generate call to target-specific divmod.
> >> >> >> >> Ports should define this hook if:
> >> >> >> >> a) The port does not have divmod or div insn for the given mode.
> >> >> >> >> b) The port defines divmod libfunc for the given mode.
> >> >> >> >> The default hook default_expand_divmod_libfunc() generates call
> >> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
> >> >> >> >> are of DImode.
> >> >> >> >>
> >> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
> >> >> >> >> cross-tested on arm*-*-*.
> >> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
> >> >> >> >> Does this patch look OK ?
> >> >> >> >
> >> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> >> >> >> > index 6b4601b..e4a021a 100644
> >> >> >> > --- a/gcc/targhooks.c
> >> >> >> > +++ b/gcc/targhooks.c
> >> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
> >> >> >> > machine_mode, optimization_type)
> >> >> >> >    return true;
> >> >> >> >  }
> >> >> >> >
> >> >> >> > +void
> >> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
> >> >> >> > +                              rtx op0, rtx op1,
> >> >> >> > +                              rtx *quot_p, rtx *rem_p)
> >> >> >> >
> >> >> >> > functions need a comment.
> >> >> >> >
> >> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
> >> >> >> > case we could avoid the target hook.
> >> >> >> Well I would prefer adding the hook because that's more easier -;)
> >> >> >> Would it be ok for now to go with the hook ?
> >> >> >> >
> >> >> >> > +      /* If target overrides expand_divmod_libfunc hook
> >> >> >> > +        then perform divmod by generating call to the target-specifc divmod
> >> >> >> > libfunc.  */
> >> >> >> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
> >> >> >> > +       return true;
> >> >> >> > +
> >> >> >> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
> >> >> >> > +      return (mode == DImode && unsignedp);
> >> >> >> >
> >> >> >> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
> >> >> >> > but still restrict this to DImode && unsigned?  Also if
> >> >> >> > targetm.expand_divmod_libfunc
> >> >> >> > is not the default we expect the target to handle all modes?
> >> >> >> Ah indeed, the check for DImode is unnecessary.
> >> >> >> However I suppose the check for unsignedp should be there,
> >> >> >> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
> >> >> >
> >> >> > The optab libfunc for sdivmod should be NULL in that case.
> >> >> Ah indeed, thanks.
> >> >> >
> >> >> >> >
> >> >> >> > That said - I expected the above piece to be simply a 'return true;' ;)
> >> >> >> >
> >> >> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
> >> >> >> > supports a specific operation (for example SImode divmod would use DImode
> >> >> >> > divmod by means of widening operands - for the unsigned case of course).
> >> >> >> Thanks for pointing out. So if a target does not support divmod
> >> >> >> libfunc for a mode
> >> >> >> but for a wider mode, then we could zero-extend operands to the wider-mode,
> >> >> >> perform divmod on the wider-mode, and then cast result back to the
> >> >> >> original mode.
> >> >> >> I haven't done that in this patch, would it be OK to do that as a follow up ?
> >> >> >
> >> >> > I think that you should conservatively handle the div_optab query, thus if
> >> >> > the target has a HW division in a wider mode don't use the divmod IFN.
> >> >> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
> >> >> > if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
> >> >> > out if that is available.
> >> >> Done.
> >> >> >
> >> >> >> > +  /* Disable the transform if either is a constant, since
> >> >> >> > division-by-constant
> >> >> >> > +     may have specialized expansion.  */
> >> >> >> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
> >> >> >> > +    return false;
> >> >> >> >
> >> >> >> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
> >> >> >> >
> >> >> >> > +  if (TYPE_OVERFLOW_TRAPS (type))
> >> >> >> > +    return false;
> >> >> >> >
> >> >> >> > why's that?  Generally please first test cheap things (trapping, constant-ness)
> >> >> >> > before checking expensive stuff (target_supports_divmod_p).
> >> >> >> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
> >> >> >> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
> >> >> >> "When looking at TRUNC_DIV_EXPR you should also exclude
> >> >> >> the case where TYPE_OVERFLOW_TRAPS (type) as that should
> >> >> >> expand using the [su]divv optabs (no trapping overflow
> >> >> >> divmod optab exists)."
> >> >> >
> >> >> > Ok, didn't remember that.
> >> >> >
> >> >> >> >
> >> >> >> > +static bool
> >> >> >> > +convert_to_divmod (gassign *stmt)
> >> >> >> > +{
> >> >> >> > +  if (!divmod_candidate_p (stmt))
> >> >> >> > +    return false;
> >> >> >> > +
> >> >> >> > +  tree op1 = gimple_assign_rhs1 (stmt);
> >> >> >> > +  tree op2 = gimple_assign_rhs2 (stmt);
> >> >> >> > +
> >> >> >> > +  vec<gimple *> stmts = vNULL;
> >> >> >> >
> >> >> >> > use an auto_vec <gimple *> - you currently leak it in at least one place.
> >> >> >> >
> >> >> >> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> >> >> >> > +       cfg_changed = true;
> >> >> >> >
> >> >> >> > note that this suggests you should check whether any of the stmts may throw
> >> >> >> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and
> >> >> >> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
> >> >> >> > the list of stmts to modify.
> >> >> >> >
> >> >> >> > Btw, I think you should not add 'stmt' immediately but when iterating over
> >> >> >> > all uses also gather uses in TRUNC_MOD_EXPR.
> >> >> >> >
> >> >> >> > Otherwise looks ok.
> >> >> >> Done changes in this version. I am gathering mod uses same time as div uses,
> >> >> >> so this imposes a constraint that mod dominates mod. I am not sure if
> >> >> >> that's desirable.
> >> >> >
> >> >> > I think you also need a mod_seen variable now that you don't necessarily
> >> >> > end up with 'stmt' in the vector of stmts.  I don't see how there is a
> >> >> > constraint that mod dominates mod - it's just that the top_stmt needs
> >> >> > to dominate all other uses that can be replaced with replacing top_stmt
> >> >> > with a divmod.  It's just that the actual stmt set we choose may now
> >> >> > depend on immediate uses order which on a second thought is bad
> >> >> > as immediate uses order could be affected by debug stmts ... hmm.
> >> >> >
> >> >> > To avoid this please re-add the code adding 'stmt' to stmts immediately
> >> >> > and add a use_stmt != stmt check to the immediate use processing loop
> >> >> > so that we don't end up adding it twice.
> >> >> Well I wonder what will happen for the following case:
> >> >> t1 = x / y;
> >> >> if (cond)
> >> >>   t2 = x % y;
> >> >> else
> >> >>   t3 = x % y;
> >> >>
> >> >> Assuming stmt == top_stmt is "t2 = x % y" and use_stmt is "t3 = x % y",
> >> >> use_stmt will not get added to stmts vector, since top_stmt and
> >> >> use_stmt are not in same bb,
> >> >> and bb's containing top_stmt and use_stmt don't dominate each other.
> >> >> Not sure if this is practical case (I assume fre will hoist mod
> >> >> outside if-else?)
> >> >>
> >> >> Now that we immediately add stmt to stmts vector, I suppose mod_seen
> >> >> shall not be required ?
> >> >
> >> > In that case mod_seen is not needed.  But the situation you say will
> >> > still happen so I wonder if we'd need a better way of iterating over
> >> > immediate uses, like first pushing all candidates into a worklist
> >> > vector and then iterating over that until we find no more candidates.
> >> >
> >> > You can then also handle the case of more than one group of stmts
> >> > (the pass currently doesn't iterate in any particular useful order
> >> > over BBs).
> >> IIUC, we want to perform the transform if:
> >> i) there exists top_stmt with code trunc_div_expr/trunc_mod_expr and
> >> having same operands as stmt.
> >> ii) top_stmt dominates all other stmts with code
> >> trunc_div_expr/trunc_mod_expr and having same operands as top_stmt.
> >>
> >> Firstly, we try to get to top_stmt if it exists, by iterating over uses of stmt,
> >> and then iterate over all uses of top_stmt and add them to stmts vector
> >> only if top_stmt dominates all the stmts with same operands as top_stmt
> >> and have code trunc_div_expr/trunc_mod_expr.
> >>
> >> /* Get to top_stmt.  */
> >> top_stmt = stmt;
> >> top_bb = gimple_bb (stmt);
> >>
> >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
> >> {
> >>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
> >>       && use_stmt has same operands as stmt)
> >>     {
> >>       if (gimple_bb (use_stmt) dominates top_bb)
> >>         {
> >>           top_bb = gimple_bb (use_stmt);
> >>           top_stmt = use_stmt;
> >>         }
> >>       else if (gimple_bb (use_stmt) == top_stmt
> >>                && gimple_uid (use_stmt) < top_stmt)
> >>         top_stmt = use_stmt;
> >>     }
> >> }
> >>
> >> /* Speculatively consider top_stmt as dominating all other
> >> div_expr/mod_expr stmts with same operands as stmt.  */
> >> stmts.safe_push (top_stmt);
> >>
> >> /* Walk uses of top_stmt to ensure that all stmts are dominated by top_stmt.  */
> >> top_op1 = gimple_assign_rhs1 (top_stmt);
> >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
> >> {
> >>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
> >>       && use_stmt has same operands as top_stmt)
> >>     {
> >>       if (use_stmt == top_stmt)
> >>         continue;
> >>
> >>       /* No top_stmt exits, do not proceed with transform  */
> >>       if (top_bb does not dominate gimple_bb (use_stmt))
> >>         return false;
> >>
> >>       stmts.safe_push (use_stmt);
> >>     }
> >> }
> >>
> >> For the case:
> >> t1 = x / y;
> >> if (cond)
> >>   t2 = x % y;
> >> else
> >>   t3 = x % y;
> >>
> >> Assuming stmt is "t2 = x % y", it will walk uses of stmt and set
> >> top_stmt to "t1 = x / y"
> >> Then it will walk all uses of top_stmt:
> >> "t2 = x % y" -> dominated by top_stmt
> >> "t3 = x % y" -> dominated by top_stmt
> >> Since all stmts are dominated by top_stmt, we add all three stmts to
> >> vector of stmts and proceed with transform.
> >>
> >> For the case where, top_stmt dominates original stmt but not all stmts:
> >>
> >> if (cond)
> >>   t1 = x / y;
> >> else
> >> {
> >>   t2 = x % y;
> >>   return;
> >> }
> >>
> >> t3 = x % y;
> >>
> >> Assuming stmt is "t3 = x % y",
> >> Walking stmt uses will set top_stmt to "t1 = x / y";
> >>
> >> Walking immediate uses of top_stmt, we find that "t2 = x % y" is not
> >> dominated by top_stmt,
> >> and hence don't do the transform.
> >>
> >> Does this sound reasonable ?
> >
> > Yes, that's reasonable.
> Thanks, I have attached patch that implements above approach.
> Does it look OK ?

Please start the top-stmt search with

  top_stmt = stmt;
  top_bb = gimple_bb (stmt);

this makes sure to process all stmts via the IL walk in case
the uses have multiple independent "dominated" trees.  This also
simplifies the loop body (no need to check for NULL).  This also
makes mod_seen always true and you can compute div_seen in that
loop as well.

Otherwise looks ok now.

> The patch does not still handle the following case:
> int f(int x, int y)
> {
>   extern int cond;
>   int q, r;
> 
>   if (cond)
>     q = x % y;
>   else
>     q = x % y;
> 
>   r = x % y;
>   return q + r;
> }
> 
> In above case although the mod stmt is not dominated by either div
> stmt, I suppose the transform
> is still possible by inserting DIVMOD (x, y) before if-else ?

Yeah, same for sincos where doing this requires some LCM algorithm.

> For the following test-case, I am surprised why CSE didn't take place before
> widening_mul pass ?
> 
> int
> f_1 (int x, int y)
> {
>   int q = x / y;
>   int r1 = 0, r2 = 0;
>   if (cond)
>     r1 = x % y;
>   else
>     r2 = x % y;
>   return q + r1 + r2;
> }

This is not CSE but code hoisting which is not implemented on GIMPLE
(see PR23286)

> The input to widening_mul pass is:
> f_1 (int x, int y)
> {
>   int r2;
>   int r1;
>   int q;
>   int cond.0_1;
>   int _2;
>   int _11;
> 
>   <bb 2>:
>   q_7 = x_5(D) / y_6(D);
>   cond.0_1 = cond;
>   if (cond.0_1 != 0)
>     goto <bb 3>;
>   else
>     goto <bb 4>;
> 
>   <bb 3>:
>   r1_9 = x_5(D) % y_6(D);
>   goto <bb 5>;
> 
>   <bb 4>:
>   r2_10 = x_5(D) % y_6(D);
> 
>   <bb 5>:
>   # r1_3 = PHI <r1_9(3), 0(4)>
>   # r2_4 = PHI <0(3), r2_10(4)>
>   _2 = r1_3 + q_7;
>   _11 = _2 + r2_4;
>   return _11;
> 
> }
> 
> Thanks,
> Prathamesh
> >
> > Richard.
> >
> >> Thanks,
> >> Prathamesh
> >> >
> >> > Richard.
> >> >
> >>
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-27 12:05                 ` Richard Biener
@ 2016-05-27 12:41                   ` Prathamesh Kulkarni
  2016-05-27 13:04                     ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-05-27 12:41 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 16636 bytes --]

On 27 May 2016 at 15:45, Richard Biener <rguenther@suse.de> wrote:
> On Wed, 25 May 2016, Prathamesh Kulkarni wrote:
>
>> On 25 May 2016 at 12:52, Richard Biener <rguenther@suse.de> wrote:
>> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 24 May 2016 at 19:39, Richard Biener <rguenther@suse.de> wrote:
>> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 24 May 2016 at 17:42, Richard Biener <rguenther@suse.de> wrote:
>> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >> >
>> >> >> >> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
>> >> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>> >> >> >> > <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >> >> Hi,
>> >> >> >> >> I have updated my patch for divmod (attached), which was originally
>> >> >> >> >> based on Kugan's patch.
>> >> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> >> >> >> >> having same operands to divmod representation, so we can cse computation of mod.
>> >> >> >> >>
>> >> >> >> >> t1 = a TRUNC_DIV_EXPR b;
>> >> >> >> >> t2 = a TRUNC_MOD_EXPR b
>> >> >> >> >> is transformed to:
>> >> >> >> >> complex_tmp = DIVMOD (a, b);
>> >> >> >> >> t1 = REALPART_EXPR (complex_tmp);
>> >> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);
>> >> >> >> >>
>> >> >> >> >> * New hook divmod_expand_libfunc
>> >> >> >> >> The rationale for introducing the hook is that different targets have
>> >> >> >> >> incompatible calling conventions for divmod libfunc.
>> >> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
>> >> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> >> >> >> >> return quotient and store remainder in argument passed as pointer,
>> >> >> >> >> while the arm version takes two arguments and returns both
>> >> >> >> >> quotient and remainder having mode double the size of the operand mode.
>> >> >> >> >> The port should hence override the hook expand_divmod_libfunc
>> >> >> >> >> to generate call to target-specific divmod.
>> >> >> >> >> Ports should define this hook if:
>> >> >> >> >> a) The port does not have divmod or div insn for the given mode.
>> >> >> >> >> b) The port defines divmod libfunc for the given mode.
>> >> >> >> >> The default hook default_expand_divmod_libfunc() generates call
>> >> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> >> >> >> >> are of DImode.
>> >> >> >> >>
>> >> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> >> >> >> >> cross-tested on arm*-*-*.
>> >> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
>> >> >> >> >> Does this patch look OK ?
>> >> >> >> >
>> >> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> >> >> >> > index 6b4601b..e4a021a 100644
>> >> >> >> > --- a/gcc/targhooks.c
>> >> >> >> > +++ b/gcc/targhooks.c
>> >> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
>> >> >> >> > machine_mode, optimization_type)
>> >> >> >> >    return true;
>> >> >> >> >  }
>> >> >> >> >
>> >> >> >> > +void
>> >> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
>> >> >> >> > +                              rtx op0, rtx op1,
>> >> >> >> > +                              rtx *quot_p, rtx *rem_p)
>> >> >> >> >
>> >> >> >> > functions need a comment.
>> >> >> >> >
>> >> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
>> >> >> >> > case we could avoid the target hook.
>> >> >> >> Well I would prefer adding the hook because that's more easier -;)
>> >> >> >> Would it be ok for now to go with the hook ?
>> >> >> >> >
>> >> >> >> > +      /* If target overrides expand_divmod_libfunc hook
>> >> >> >> > +        then perform divmod by generating call to the target-specifc divmod
>> >> >> >> > libfunc.  */
>> >> >> >> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
>> >> >> >> > +       return true;
>> >> >> >> > +
>> >> >> >> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
>> >> >> >> > +      return (mode == DImode && unsignedp);
>> >> >> >> >
>> >> >> >> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
>> >> >> >> > but still restrict this to DImode && unsigned?  Also if
>> >> >> >> > targetm.expand_divmod_libfunc
>> >> >> >> > is not the default we expect the target to handle all modes?
>> >> >> >> Ah indeed, the check for DImode is unnecessary.
>> >> >> >> However I suppose the check for unsignedp should be there,
>> >> >> >> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
>> >> >> >
>> >> >> > The optab libfunc for sdivmod should be NULL in that case.
>> >> >> Ah indeed, thanks.
>> >> >> >
>> >> >> >> >
>> >> >> >> > That said - I expected the above piece to be simply a 'return true;' ;)
>> >> >> >> >
>> >> >> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
>> >> >> >> > supports a specific operation (for example SImode divmod would use DImode
>> >> >> >> > divmod by means of widening operands - for the unsigned case of course).
>> >> >> >> Thanks for pointing out. So if a target does not support divmod
>> >> >> >> libfunc for a mode
>> >> >> >> but for a wider mode, then we could zero-extend operands to the wider-mode,
>> >> >> >> perform divmod on the wider-mode, and then cast result back to the
>> >> >> >> original mode.
>> >> >> >> I haven't done that in this patch, would it be OK to do that as a follow up ?
>> >> >> >
>> >> >> > I think that you should conservatively handle the div_optab query, thus if
>> >> >> > the target has a HW division in a wider mode don't use the divmod IFN.
>> >> >> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
>> >> >> > if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
>> >> >> > out if that is available.
>> >> >> Done.
>> >> >> >
>> >> >> >> > +  /* Disable the transform if either is a constant, since
>> >> >> >> > division-by-constant
>> >> >> >> > +     may have specialized expansion.  */
>> >> >> >> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
>> >> >> >> > +    return false;
>> >> >> >> >
>> >> >> >> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
>> >> >> >> >
>> >> >> >> > +  if (TYPE_OVERFLOW_TRAPS (type))
>> >> >> >> > +    return false;
>> >> >> >> >
>> >> >> >> > why's that?  Generally please first test cheap things (trapping, constant-ness)
>> >> >> >> > before checking expensive stuff (target_supports_divmod_p).
>> >> >> >> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
>> >> >> >> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
>> >> >> >> "When looking at TRUNC_DIV_EXPR you should also exclude
>> >> >> >> the case where TYPE_OVERFLOW_TRAPS (type) as that should
>> >> >> >> expand using the [su]divv optabs (no trapping overflow
>> >> >> >> divmod optab exists)."
>> >> >> >
>> >> >> > Ok, didn't remember that.
>> >> >> >
>> >> >> >> >
>> >> >> >> > +static bool
>> >> >> >> > +convert_to_divmod (gassign *stmt)
>> >> >> >> > +{
>> >> >> >> > +  if (!divmod_candidate_p (stmt))
>> >> >> >> > +    return false;
>> >> >> >> > +
>> >> >> >> > +  tree op1 = gimple_assign_rhs1 (stmt);
>> >> >> >> > +  tree op2 = gimple_assign_rhs2 (stmt);
>> >> >> >> > +
>> >> >> >> > +  vec<gimple *> stmts = vNULL;
>> >> >> >> >
>> >> >> >> > use an auto_vec <gimple *> - you currently leak it in at least one place.
>> >> >> >> >
>> >> >> >> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> >> >> >> > +       cfg_changed = true;
>> >> >> >> >
>> >> >> >> > note that this suggests you should check whether any of the stmts may throw
>> >> >> >> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and
>> >> >> >> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
>> >> >> >> > the list of stmts to modify.
>> >> >> >> >
>> >> >> >> > Btw, I think you should not add 'stmt' immediately but when iterating over
>> >> >> >> > all uses also gather uses in TRUNC_MOD_EXPR.
>> >> >> >> >
>> >> >> >> > Otherwise looks ok.
>> >> >> >> Done changes in this version. I am gathering mod uses same time as div uses,
>> >> >> >> so this imposes a constraint that mod dominates mod. I am not sure if
>> >> >> >> that's desirable.
>> >> >> >
>> >> >> > I think you also need a mod_seen variable now that you don't necessarily
>> >> >> > end up with 'stmt' in the vector of stmts.  I don't see how there is a
>> >> >> > constraint that mod dominates mod - it's just that the top_stmt needs
>> >> >> > to dominate all other uses that can be replaced with replacing top_stmt
>> >> >> > with a divmod.  It's just that the actual stmt set we choose may now
>> >> >> > depend on immediate uses order which on a second thought is bad
>> >> >> > as immediate uses order could be affected by debug stmts ... hmm.
>> >> >> >
>> >> >> > To avoid this please re-add the code adding 'stmt' to stmts immediately
>> >> >> > and add a use_stmt != stmt check to the immediate use processing loop
>> >> >> > so that we don't end up adding it twice.
>> >> >> Well I wonder what will happen for the following case:
>> >> >> t1 = x / y;
>> >> >> if (cond)
>> >> >>   t2 = x % y;
>> >> >> else
>> >> >>   t3 = x % y;
>> >> >>
>> >> >> Assuming stmt == top_stmt is "t2 = x % y" and use_stmt is "t3 = x % y",
>> >> >> use_stmt will not get added to stmts vector, since top_stmt and
>> >> >> use_stmt are not in same bb,
>> >> >> and bb's containing top_stmt and use_stmt don't dominate each other.
>> >> >> Not sure if this is practical case (I assume fre will hoist mod
>> >> >> outside if-else?)
>> >> >>
>> >> >> Now that we immediately add stmt to stmts vector, I suppose mod_seen
>> >> >> shall not be required ?
>> >> >
>> >> > In that case mod_seen is not needed.  But the situation you say will
>> >> > still happen so I wonder if we'd need a better way of iterating over
>> >> > immediate uses, like first pushing all candidates into a worklist
>> >> > vector and then iterating over that until we find no more candidates.
>> >> >
>> >> > You can then also handle the case of more than one group of stmts
>> >> > (the pass currently doesn't iterate in any particular useful order
>> >> > over BBs).
>> >> IIUC, we want to perform the transform if:
>> >> i) there exists top_stmt with code trunc_div_expr/trunc_mod_expr and
>> >> having same operands as stmt.
>> >> ii) top_stmt dominates all other stmts with code
>> >> trunc_div_expr/trunc_mod_expr and having same operands as top_stmt.
>> >>
>> >> Firstly, we try to get to top_stmt if it exists, by iterating over uses of stmt,
>> >> and then iterate over all uses of top_stmt and add them to stmts vector
>> >> only if top_stmt dominates all the stmts with same operands as top_stmt
>> >> and have code trunc_div_expr/trunc_mod_expr.
>> >>
>> >> /* Get to top_stmt.  */
>> >> top_stmt = stmt;
>> >> top_bb = gimple_bb (stmt);
>> >>
>> >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
>> >> {
>> >>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
>> >>       && use_stmt has same operands as stmt)
>> >>     {
>> >>       if (gimple_bb (use_stmt) dominates top_bb)
>> >>         {
>> >>           top_bb = gimple_bb (use_stmt);
>> >>           top_stmt = use_stmt;
>> >>         }
>> >>       else if (gimple_bb (use_stmt) == top_stmt
>> >>                && gimple_uid (use_stmt) < top_stmt)
>> >>         top_stmt = use_stmt;
>> >>     }
>> >> }
>> >>
>> >> /* Speculatively consider top_stmt as dominating all other
>> >> div_expr/mod_expr stmts with same operands as stmt.  */
>> >> stmts.safe_push (top_stmt);
>> >>
>> >> /* Walk uses of top_stmt to ensure that all stmts are dominated by top_stmt.  */
>> >> top_op1 = gimple_assign_rhs1 (top_stmt);
>> >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
>> >> {
>> >>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
>> >>       && use_stmt has same operands as top_stmt)
>> >>     {
>> >>       if (use_stmt == top_stmt)
>> >>         continue;
>> >>
>> >>       /* No top_stmt exits, do not proceed with transform  */
>> >>       if (top_bb does not dominate gimple_bb (use_stmt))
>> >>         return false;
>> >>
>> >>       stmts.safe_push (use_stmt);
>> >>     }
>> >> }
>> >>
>> >> For the case:
>> >> t1 = x / y;
>> >> if (cond)
>> >>   t2 = x % y;
>> >> else
>> >>   t3 = x % y;
>> >>
>> >> Assuming stmt is "t2 = x % y", it will walk uses of stmt and set
>> >> top_stmt to "t1 = x / y"
>> >> Then it will walk all uses of top_stmt:
>> >> "t2 = x % y" -> dominated by top_stmt
>> >> "t3 = x % y" -> dominated by top_stmt
>> >> Since all stmts are dominated by top_stmt, we add all three stmts to
>> >> vector of stmts and proceed with transform.
>> >>
>> >> For the case where, top_stmt dominates original stmt but not all stmts:
>> >>
>> >> if (cond)
>> >>   t1 = x / y;
>> >> else
>> >> {
>> >>   t2 = x % y;
>> >>   return;
>> >> }
>> >>
>> >> t3 = x % y;
>> >>
>> >> Assuming stmt is "t3 = x % y",
>> >> Walking stmt uses will set top_stmt to "t1 = x / y";
>> >>
>> >> Walking immediate uses of top_stmt, we find that "t2 = x % y" is not
>> >> dominated by top_stmt,
>> >> and hence don't do the transform.
>> >>
>> >> Does this sound reasonable ?
>> >
>> > Yes, that's reasonable.
>> Thanks, I have attached patch that implements above approach.
>> Does it look OK ?
>
> Please start the top-stmt search with
>
>   top_stmt = stmt;
>   top_bb = gimple_bb (stmt);
>
> this makes sure to process all stmts via the IL walk in case
> the uses have multiple independent "dominated" trees.  This also
> simplifies the loop body (no need to check for NULL).  This also
> makes mod_seen always true and you can compute div_seen in that
> loop as well.
Done.
Um I don't understand why setting top_stmt to NULL won't process all stmts ?
AFAIU it will have one extra iteration compared to initializing top_stmt to stmt
(since first iteration would initialize top_stmt to stmt assuming stmt
does not throw) ?
>
> Otherwise looks ok now.
>
>> The patch does not still handle the following case:
>> int f(int x, int y)
>> {
>>   extern int cond;
>>   int q, r;
>>
>>   if (cond)
>>     q = x % y;
>>   else
>>     q = x % y;
>>
>>   r = x % y;
>>   return q + r;
>> }
>>
>> In above case although the mod stmt is not dominated by either div
>> stmt, I suppose the transform
>> is still possible by inserting DIVMOD (x, y) before if-else ?
>
> Yeah, same for sincos where doing this requires some LCM algorithm.
Well I don't have a good approach for this.
I was thinking, before doing the divmod transform, we could walk
GIMPLE_COND of "diamond" shape
(having both arms), and check "then" bb and "else" bb have same div or
mod stmts and in that case put an artificial
same stmt above GIMPLE_COND.

So the above case would be transformed to:

int tmp = x / y;  // artificial top_stmt
if (cond)
  q = x / y;
else
  q = x / y;

r = x % y;
return q + r;

and then the divmod transform will see "tmp = x / y" as the topmost stmt.
Since top_stmt is artificially introduced, we will replace that with DIVMOD ifn
rather than inserting DIVMOD ifn above top_stmt as in other cases.
>
>> For the following test-case, I am surprised why CSE didn't take place before
>> widening_mul pass ?
>>
>> int
>> f_1 (int x, int y)
>> {
>>   int q = x / y;
>>   int r1 = 0, r2 = 0;
>>   if (cond)
>>     r1 = x % y;
>>   else
>>     r2 = x % y;
>>   return q + r1 + r2;
>> }
>
> This is not CSE but code hoisting which is not implemented on GIMPLE
> (see PR23286)
Ah right, thanks for pointing out the PR.

Thanks,
Prathamesh
>
>> The input to widening_mul pass is:
>> f_1 (int x, int y)
>> {
>>   int r2;
>>   int r1;
>>   int q;
>>   int cond.0_1;
>>   int _2;
>>   int _11;
>>
>>   <bb 2>:
>>   q_7 = x_5(D) / y_6(D);
>>   cond.0_1 = cond;
>>   if (cond.0_1 != 0)
>>     goto <bb 3>;
>>   else
>>     goto <bb 4>;
>>
>>   <bb 3>:
>>   r1_9 = x_5(D) % y_6(D);
>>   goto <bb 5>;
>>
>>   <bb 4>:
>>   r2_10 = x_5(D) % y_6(D);
>>
>>   <bb 5>:
>>   # r1_3 = PHI <r1_9(3), 0(4)>
>>   # r2_4 = PHI <0(3), r2_10(4)>
>>   _2 = r1_3 + q_7;
>>   _11 = _2 + r2_4;
>>   return _11;
>>
>> }
>>
>> Thanks,
>> Prathamesh
>> >
>> > Richard.
>> >
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> > Richard.
>> >> >
>> >>
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

[-- Attachment #2: divmod-part1_5.diff --]
[-- Type: text/plain, Size: 14461 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8c7f2a1..111f19f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6963,6 +6963,12 @@ This is firstly introduced on ARM/AArch64 targets, please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool @var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx *@var{quot}, rtx *@var{rem})
+Define this hook if the port does not have hardware div and divmod insn for
+the given mode but has divmod libfunc, which is incompatible
+with libgcc2.c:__udivmoddi4
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f963a58..2c9a800 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4848,6 +4848,8 @@ them: try the first ones in this list first.
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index c867ddc..0cb59f7 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2276,6 +2276,48 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+    optab_libfunc for udivmod/sdivmod.  */
+
+static void 
+expand_DIVMOD (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  bool unsignedp = TYPE_UNSIGNED (type);
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx quotient, remainder;
+
+  /* Check if optab handler exists for [u]divmod.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+    {
+      quotient = gen_reg_rtx (mode);
+      remainder = gen_reg_rtx (mode);
+      expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
+    }
+  else
+    targetm.expand_divmod_libfunc (unsignedp, mode, op0, op1,
+				   &quotient, &remainder);
+
+  /* Wrap the return value (quotient, remainder) within COMPLEX_EXPR.  */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+		       make_tree (TREE_TYPE (arg0), quotient),
+		       make_tree (TREE_TYPE (arg1), remainder)),
+	       target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Return true if FN is supported for the types in TYPES when the
    optimization type is OPT_TYPE.  The types are those associated with
    the "type0" and "type1" fields of FN's direct_internal_fn_info
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index e729d85..56a80f1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -194,6 +194,9 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* Divmod function.  */
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/target.def b/gcc/target.def
index 6392e73..4496f9a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4948,6 +4948,16 @@ Normally, this is not needed.",
  bool, (const_tree field, machine_mode mode),
  default_member_type_forces_blk)
 
+/* See tree-ssa-math-opts.c:divmod_candidate_p for conditions that gate
+   the divmod transform.  */
+DEFHOOK
+(expand_divmod_libfunc,
+ "Define this hook if the port does not have hardware div and divmod insn for\n\
+the given mode but has divmod libfunc, which is incompatible\n\
+with libgcc2.c:__udivmoddi4",
+ void, (bool unsignedp, machine_mode mode, rtx, rtx, rtx *quot, rtx *rem),
+ default_expand_divmod_libfunc)
+
 /* Return the class for a secondary reload, and fill in extra information.  */
 DEFHOOK
 (secondary_reload,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 6b4601b..20327a6 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode, machine_mode, optimization_type)
   return true;
 }
 
+/* Generate call to
+   DImode __udivmoddi4 (DImode op0, DImode op1, DImode *rem).  */
+
+void
+default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
+			       rtx op0, rtx op1,
+			       rtx *quot_p, rtx *rem_p)
+{
+  gcc_assert (mode == DImode);
+  gcc_assert (unsignedp);
+
+  rtx libfunc = optab_libfunc (udivmod_optab, DImode);
+  gcc_assert (libfunc);
+
+  rtx remainder = assign_stack_temp (DImode, GET_MODE_SIZE (DImode));
+  rtx address = XEXP (remainder, 0);
+
+  rtx quotient = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+					  DImode, 3,
+					  op0, GET_MODE (op0),
+					  op1, GET_MODE (op1),
+					  address, GET_MODE (address));
+
+  *quot_p = quotient;
+  *rem_p = remainder;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 7687c39..dc5e8e7 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -254,4 +254,7 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 
+extern void default_expand_divmod_libfunc (bool, machine_mode,
+					   rtx, rtx, rtx *, rtx *);
+
 #endif /* GCC_TARGHOOKS_H */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 81688cd..aaa9173 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -112,6 +112,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
+#include "targhooks.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -184,6 +187,9 @@ static struct
 
   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -3784,6 +3790,220 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
   return true;
 }
 
+/* Return true if target has support for divmod.  */
+
+static bool
+target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode)
+{
+  /* If target supports hardware divmod insn, use it for divmod.  */
+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing)
+    return true;
+
+  /* Check if libfunc for divmod is available.  */
+  if (optab_libfunc (divmod_optab, mode) != NULL_RTX)
+    {
+      /* If optab_handler exists for div_optab, perhaps in a wider mode,
+	 we don't want to use the libfunc even if it exists for given mode.  */ 
+      for (machine_mode div_mode = mode;
+	   div_mode != VOIDmode;
+	   div_mode = GET_MODE_WIDER_MODE (div_mode))
+	if (optab_handler (div_optab, div_mode) != CODE_FOR_nothing)
+	  return false;
+ 
+      return true;
+    }
+
+  return false;
+}
+
+/* Check if stmt is candidate for divmod transform.  */
+
+static bool
+divmod_candidate_p (gassign *stmt)
+{
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  enum machine_mode mode = TYPE_MODE (type);
+  optab divmod_optab, div_optab;
+
+  if (TYPE_UNSIGNED (type))
+    {
+      divmod_optab = udivmod_optab;
+      div_optab = udiv_optab;
+    }
+  else
+    {
+      divmod_optab = sdivmod_optab;
+      div_optab = sdiv_optab;
+    }
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  /* Disable the transform if either is a constant, since division-by-constant
+     may have specialized expansion.  */
+  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
+    return false;
+
+  /* Exclude the case where TYPE_OVERFLOW_TRAPS (type) as that should
+     expand using the [su]divv optabs.  */
+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;
+  
+  if (!target_supports_divmod_p (divmod_optab, div_optab, mode))
+    return false;
+
+  return true;
+}
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   For conditions enabling the transform see divmod_candidate_p().
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a
+      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+      it to stmts vector.
+   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+      dominates other div/mod stmts with same operands) and update entries in
+      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+      IMAGPART_EXPR for mod).  */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  if (!divmod_candidate_p (stmt))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+  
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+  auto_vec<gimple *> stmts; 
+
+  gimple *top_stmt = stmt; 
+  basic_block top_bb = gimple_bb (stmt);
+
+  /* Try to set top_stmt to "topmost" stmt
+     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.  */
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  basic_block bb = gimple_bb (use_stmt);
+
+	  if (bb == top_bb && gimple_uid (use_stmt) < gimple_uid (top_stmt))
+	    top_stmt = use_stmt;
+	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+	    {
+	      top_bb = bb;
+	      top_stmt = use_stmt;
+	    }
+	}
+    }
+
+  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
+    return false;
+
+  tree top_op1 = gimple_assign_rhs1 (top_stmt);
+  tree top_op2 = gimple_assign_rhs2 (top_stmt);
+
+  stmts.safe_push (top_stmt);
+  bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
+
+  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */    
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (top_op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (top_op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (use_stmt == top_stmt)
+	    continue;
+
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
+	    {
+	      end_imm_use_stmt_traverse (&use_iter);
+	      return false;
+	    }
+
+	  stmts.safe_push (use_stmt);
+	  if (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR)
+	    div_seen = true;
+	}
+    }
+
+  if (!div_seen)
+    return false;
+
+  /* Create libcall to internal fn DIVMOD:
+     divmod_tmp = DIVMOD (op1, op2).  */
+
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (
+		build_complex_type (TREE_TYPE (op1)),
+		call_stmt, "divmod_tmp");
+  gimple_call_set_lhs (call_stmt, res);
+
+  /* Insert the call before top_stmt.  */
+  gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt);
+  gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT);
+
+  widen_mul_stats.divmod_calls_inserted++;		
+
+  /* Update all statements in stmts.
+     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = REALPART_EXPR<divmod_tmp>
+     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = IMAGPART_EXPR<divmod_tmp>.  */
+
+  bool cfg_changed = false;
+  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
+    {
+      tree new_rhs;
+
+      switch (gimple_assign_rhs_code (use_stmt))
+	{
+	  case TRUNC_DIV_EXPR:
+	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+	    break;
+
+	  case TRUNC_MOD_EXPR:
+	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	}
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
+      update_stmt (use_stmt);
+
+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+	cfg_changed = true;
+    }
+
+  stmts.release ();
+  return cfg_changed;
+}    
 
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
@@ -3828,6 +4048,8 @@ pass_optimize_widening_mul::execute (function *fun)
   bool cfg_changed = false;
 
   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
+  renumber_gimple_stmt_uids ();
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -3861,6 +4083,10 @@ pass_optimize_widening_mul::execute (function *fun)
 		    match_uaddsub_overflow (&gsi, stmt, code);
 		  break;
 
+		case TRUNC_MOD_EXPR:
+		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
+		  break;
+
 		default:;
 		}
 	    }
@@ -3907,6 +4133,8 @@ pass_optimize_widening_mul::execute (function *fun)
 			    widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
 			    widen_mul_stats.fmas_inserted);
+  statistics_counter_event (fun, "divmod calls inserted",
+			    widen_mul_stats.divmod_calls_inserted);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-27 12:41                   ` Prathamesh Kulkarni
@ 2016-05-27 13:04                     ` Richard Biener
  2016-05-30  9:56                       ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2016-05-27 13:04 UTC (permalink / raw)
  To: Prathamesh Kulkarni
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

On Fri, 27 May 2016, Prathamesh Kulkarni wrote:

> On 27 May 2016 at 15:45, Richard Biener <rguenther@suse.de> wrote:
> > On Wed, 25 May 2016, Prathamesh Kulkarni wrote:
> >
> >> On 25 May 2016 at 12:52, Richard Biener <rguenther@suse.de> wrote:
> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 24 May 2016 at 19:39, Richard Biener <rguenther@suse.de> wrote:
> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 24 May 2016 at 17:42, Richard Biener <rguenther@suse.de> wrote:
> >> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >> >> >> >
> >> >> >> >> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
> >> >> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
> >> >> >> >> > <prathamesh.kulkarni@linaro.org> wrote:
> >> >> >> >> >> Hi,
> >> >> >> >> >> I have updated my patch for divmod (attached), which was originally
> >> >> >> >> >> based on Kugan's patch.
> >> >> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> >> >> >> >> >> having same operands to divmod representation, so we can cse computation of mod.
> >> >> >> >> >>
> >> >> >> >> >> t1 = a TRUNC_DIV_EXPR b;
> >> >> >> >> >> t2 = a TRUNC_MOD_EXPR b
> >> >> >> >> >> is transformed to:
> >> >> >> >> >> complex_tmp = DIVMOD (a, b);
> >> >> >> >> >> t1 = REALPART_EXPR (complex_tmp);
> >> >> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);
> >> >> >> >> >>
> >> >> >> >> >> * New hook divmod_expand_libfunc
> >> >> >> >> >> The rationale for introducing the hook is that different targets have
> >> >> >> >> >> incompatible calling conventions for divmod libfunc.
> >> >> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
> >> >> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
> >> >> >> >> >> return quotient and store remainder in argument passed as pointer,
> >> >> >> >> >> while the arm version takes two arguments and returns both
> >> >> >> >> >> quotient and remainder having mode double the size of the operand mode.
> >> >> >> >> >> The port should hence override the hook expand_divmod_libfunc
> >> >> >> >> >> to generate call to target-specific divmod.
> >> >> >> >> >> Ports should define this hook if:
> >> >> >> >> >> a) The port does not have divmod or div insn for the given mode.
> >> >> >> >> >> b) The port defines divmod libfunc for the given mode.
> >> >> >> >> >> The default hook default_expand_divmod_libfunc() generates call
> >> >> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
> >> >> >> >> >> are of DImode.
> >> >> >> >> >>
> >> >> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
> >> >> >> >> >> cross-tested on arm*-*-*.
> >> >> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
> >> >> >> >> >> Does this patch look OK ?
> >> >> >> >> >
> >> >> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> >> >> >> >> > index 6b4601b..e4a021a 100644
> >> >> >> >> > --- a/gcc/targhooks.c
> >> >> >> >> > +++ b/gcc/targhooks.c
> >> >> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
> >> >> >> >> > machine_mode, optimization_type)
> >> >> >> >> >    return true;
> >> >> >> >> >  }
> >> >> >> >> >
> >> >> >> >> > +void
> >> >> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
> >> >> >> >> > +                              rtx op0, rtx op1,
> >> >> >> >> > +                              rtx *quot_p, rtx *rem_p)
> >> >> >> >> >
> >> >> >> >> > functions need a comment.
> >> >> >> >> >
> >> >> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
> >> >> >> >> > case we could avoid the target hook.
> >> >> >> >> Well I would prefer adding the hook because that's more easier -;)
> >> >> >> >> Would it be ok for now to go with the hook ?
> >> >> >> >> >
> >> >> >> >> > +      /* If target overrides expand_divmod_libfunc hook
> >> >> >> >> > +        then perform divmod by generating call to the target-specifc divmod
> >> >> >> >> > libfunc.  */
> >> >> >> >> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
> >> >> >> >> > +       return true;
> >> >> >> >> > +
> >> >> >> >> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
> >> >> >> >> > +      return (mode == DImode && unsignedp);
> >> >> >> >> >
> >> >> >> >> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
> >> >> >> >> > but still restrict this to DImode && unsigned?  Also if
> >> >> >> >> > targetm.expand_divmod_libfunc
> >> >> >> >> > is not the default we expect the target to handle all modes?
> >> >> >> >> Ah indeed, the check for DImode is unnecessary.
> >> >> >> >> However I suppose the check for unsignedp should be there,
> >> >> >> >> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
> >> >> >> >
> >> >> >> > The optab libfunc for sdivmod should be NULL in that case.
> >> >> >> Ah indeed, thanks.
> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > That said - I expected the above piece to be simply a 'return true;' ;)
> >> >> >> >> >
> >> >> >> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
> >> >> >> >> > supports a specific operation (for example SImode divmod would use DImode
> >> >> >> >> > divmod by means of widening operands - for the unsigned case of course).
> >> >> >> >> Thanks for pointing out. So if a target does not support divmod
> >> >> >> >> libfunc for a mode
> >> >> >> >> but for a wider mode, then we could zero-extend operands to the wider-mode,
> >> >> >> >> perform divmod on the wider-mode, and then cast result back to the
> >> >> >> >> original mode.
> >> >> >> >> I haven't done that in this patch, would it be OK to do that as a follow up ?
> >> >> >> >
> >> >> >> > I think that you should conservatively handle the div_optab query, thus if
> >> >> >> > the target has a HW division in a wider mode don't use the divmod IFN.
> >> >> >> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
> >> >> >> > if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
> >> >> >> > out if that is available.
> >> >> >> Done.
> >> >> >> >
> >> >> >> >> > +  /* Disable the transform if either is a constant, since
> >> >> >> >> > division-by-constant
> >> >> >> >> > +     may have specialized expansion.  */
> >> >> >> >> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
> >> >> >> >> > +    return false;
> >> >> >> >> >
> >> >> >> >> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
> >> >> >> >> >
> >> >> >> >> > +  if (TYPE_OVERFLOW_TRAPS (type))
> >> >> >> >> > +    return false;
> >> >> >> >> >
> >> >> >> >> > why's that?  Generally please first test cheap things (trapping, constant-ness)
> >> >> >> >> > before checking expensive stuff (target_supports_divmod_p).
> >> >> >> >> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
> >> >> >> >> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
> >> >> >> >> "When looking at TRUNC_DIV_EXPR you should also exclude
> >> >> >> >> the case where TYPE_OVERFLOW_TRAPS (type) as that should
> >> >> >> >> expand using the [su]divv optabs (no trapping overflow
> >> >> >> >> divmod optab exists)."
> >> >> >> >
> >> >> >> > Ok, didn't remember that.
> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > +static bool
> >> >> >> >> > +convert_to_divmod (gassign *stmt)
> >> >> >> >> > +{
> >> >> >> >> > +  if (!divmod_candidate_p (stmt))
> >> >> >> >> > +    return false;
> >> >> >> >> > +
> >> >> >> >> > +  tree op1 = gimple_assign_rhs1 (stmt);
> >> >> >> >> > +  tree op2 = gimple_assign_rhs2 (stmt);
> >> >> >> >> > +
> >> >> >> >> > +  vec<gimple *> stmts = vNULL;
> >> >> >> >> >
> >> >> >> >> > use an auto_vec <gimple *> - you currently leak it in at least one place.
> >> >> >> >> >
> >> >> >> >> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> >> >> >> >> > +       cfg_changed = true;
> >> >> >> >> >
> >> >> >> >> > note that this suggests you should check whether any of the stmts may throw
> >> >> >> >> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and
> >> >> >> >> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
> >> >> >> >> > the list of stmts to modify.
> >> >> >> >> >
> >> >> >> >> > Btw, I think you should not add 'stmt' immediately but when iterating over
> >> >> >> >> > all uses also gather uses in TRUNC_MOD_EXPR.
> >> >> >> >> >
> >> >> >> >> > Otherwise looks ok.
> >> >> >> >> Done changes in this version. I am gathering mod uses same time as div uses,
> >> >> >> >> so this imposes a constraint that mod dominates mod. I am not sure if
> >> >> >> >> that's desirable.
> >> >> >> >
> >> >> >> > I think you also need a mod_seen variable now that you don't necessarily
> >> >> >> > end up with 'stmt' in the vector of stmts.  I don't see how there is a
> >> >> >> > constraint that mod dominates mod - it's just that the top_stmt needs
> >> >> >> > to dominate all other uses that can be replaced with replacing top_stmt
> >> >> >> > with a divmod.  It's just that the actual stmt set we choose may now
> >> >> >> > depend on immediate uses order which on a second thought is bad
> >> >> >> > as immediate uses order could be affected by debug stmts ... hmm.
> >> >> >> >
> >> >> >> > To avoid this please re-add the code adding 'stmt' to stmts immediately
> >> >> >> > and add a use_stmt != stmt check to the immediate use processing loop
> >> >> >> > so that we don't end up adding it twice.
> >> >> >> Well I wonder what will happen for the following case:
> >> >> >> t1 = x / y;
> >> >> >> if (cond)
> >> >> >>   t2 = x % y;
> >> >> >> else
> >> >> >>   t3 = x % y;
> >> >> >>
> >> >> >> Assuming stmt == top_stmt is "t2 = x % y" and use_stmt is "t3 = x % y",
> >> >> >> use_stmt will not get added to stmts vector, since top_stmt and
> >> >> >> use_stmt are not in same bb,
> >> >> >> and bb's containing top_stmt and use_stmt don't dominate each other.
> >> >> >> Not sure if this is practical case (I assume fre will hoist mod
> >> >> >> outside if-else?)
> >> >> >>
> >> >> >> Now that we immediately add stmt to stmts vector, I suppose mod_seen
> >> >> >> shall not be required ?
> >> >> >
> >> >> > In that case mod_seen is not needed.  But the situation you say will
> >> >> > still happen so I wonder if we'd need a better way of iterating over
> >> >> > immediate uses, like first pushing all candidates into a worklist
> >> >> > vector and then iterating over that until we find no more candidates.
> >> >> >
> >> >> > You can then also handle the case of more than one group of stmts
> >> >> > (the pass currently doesn't iterate in any particular useful order
> >> >> > over BBs).
> >> >> IIUC, we want to perform the transform if:
> >> >> i) there exists top_stmt with code trunc_div_expr/trunc_mod_expr and
> >> >> having same operands as stmt.
> >> >> ii) top_stmt dominates all other stmts with code
> >> >> trunc_div_expr/trunc_mod_expr and having same operands as top_stmt.
> >> >>
> >> >> Firstly, we try to get to top_stmt if it exists, by iterating over uses of stmt,
> >> >> and then iterate over all uses of top_stmt and add them to stmts vector
> >> >> only if top_stmt dominates all the stmts with same operands as top_stmt
> >> >> and have code trunc_div_expr/trunc_mod_expr.
> >> >>
> >> >> /* Get to top_stmt.  */
> >> >> top_stmt = stmt;
> >> >> top_bb = gimple_bb (stmt);
> >> >>
> >> >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
> >> >> {
> >> >>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
> >> >>       && use_stmt has same operands as stmt)
> >> >>     {
> >> >>       if (gimple_bb (use_stmt) dominates top_bb)
> >> >>         {
> >> >>           top_bb = gimple_bb (use_stmt);
> >> >>           top_stmt = use_stmt;
> >> >>         }
> >> >>       else if (gimple_bb (use_stmt) == top_stmt
> >> >>                && gimple_uid (use_stmt) < top_stmt)
> >> >>         top_stmt = use_stmt;
> >> >>     }
> >> >> }
> >> >>
> >> >> /* Speculatively consider top_stmt as dominating all other
> >> >> div_expr/mod_expr stmts with same operands as stmt.  */
> >> >> stmts.safe_push (top_stmt);
> >> >>
> >> >> /* Walk uses of top_stmt to ensure that all stmts are dominated by top_stmt.  */
> >> >> top_op1 = gimple_assign_rhs1 (top_stmt);
> >> >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
> >> >> {
> >> >>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
> >> >>       && use_stmt has same operands as top_stmt)
> >> >>     {
> >> >>       if (use_stmt == top_stmt)
> >> >>         continue;
> >> >>
> >> >>       /* No top_stmt exits, do not proceed with transform  */
> >> >>       if (top_bb does not dominate gimple_bb (use_stmt))
> >> >>         return false;
> >> >>
> >> >>       stmts.safe_push (use_stmt);
> >> >>     }
> >> >> }
> >> >>
> >> >> For the case:
> >> >> t1 = x / y;
> >> >> if (cond)
> >> >>   t2 = x % y;
> >> >> else
> >> >>   t3 = x % y;
> >> >>
> >> >> Assuming stmt is "t2 = x % y", it will walk uses of stmt and set
> >> >> top_stmt to "t1 = x / y"
> >> >> Then it will walk all uses of top_stmt:
> >> >> "t2 = x % y" -> dominated by top_stmt
> >> >> "t3 = x % y" -> dominated by top_stmt
> >> >> Since all stmts are dominated by top_stmt, we add all three stmts to
> >> >> vector of stmts and proceed with transform.
> >> >>
> >> >> For the case where, top_stmt dominates original stmt but not all stmts:
> >> >>
> >> >> if (cond)
> >> >>   t1 = x / y;
> >> >> else
> >> >> {
> >> >>   t2 = x % y;
> >> >>   return;
> >> >> }
> >> >>
> >> >> t3 = x % y;
> >> >>
> >> >> Assuming stmt is "t3 = x % y",
> >> >> Walking stmt uses will set top_stmt to "t1 = x / y";
> >> >>
> >> >> Walking immediate uses of top_stmt, we find that "t2 = x % y" is not
> >> >> dominated by top_stmt,
> >> >> and hence don't do the transform.
> >> >>
> >> >> Does this sound reasonable ?
> >> >
> >> > Yes, that's reasonable.
> >> Thanks, I have attached patch that implements above approach.
> >> Does it look OK ?
> >
> > Please start the top-stmt search with
> >
> >   top_stmt = stmt;
> >   top_bb = gimple_bb (stmt);
> >
> > this makes sure to process all stmts via the IL walk in case
> > the uses have multiple independent "dominated" trees.  This also
> > simplifies the loop body (no need to check for NULL).  This also
> > makes mod_seen always true and you can compute div_seen in that
> > loop as well.
> Done. Um I don't understand why setting top_stmt to NULL won't process 
> all stmts ? AFAIU it will have one extra iteration compared to 
> initializing top_stmt to stmt (since first iteration would initialize 
> top_stmt to stmt assuming stmt does not throw) ?

If you have

  if (cond)
    {
      r = x % y;
      q = x / y;
    }
 else
    { 
      r = x % y;
      q = x / y;
   }

then the loop over the function might end up transforming the else
block when visiting the then block modulo and thus it will never
transform the then block.  Because you walk immediate uses which
do not guarantee that you end up with a top_stmt related to the
IL point you were coming from - the first iteration does _not_
necessarily have use_stmt == stmt.

> >
> > Otherwise looks ok now.
> >
> >> The patch does not still handle the following case:
> >> int f(int x, int y)
> >> {
> >>   extern int cond;
> >>   int q, r;
> >>
> >>   if (cond)
> >>     q = x % y;
> >>   else
> >>     q = x % y;
> >>
> >>   r = x % y;
> >>   return q + r;
> >> }
> >>
> >> In above case although the mod stmt is not dominated by either div
> >> stmt, I suppose the transform
> >> is still possible by inserting DIVMOD (x, y) before if-else ?
> >
> > Yeah, same for sincos where doing this requires some LCM algorithm.
> Well I don't have a good approach for this.
> I was thinking, before doing the divmod transform, we could walk
> GIMPLE_COND of "diamond" shape
> (having both arms), and check "then" bb and "else" bb have same div or
> mod stmts and in that case put an artificial
> same stmt above GIMPLE_COND.
> 
> So the above case would be transformed to:
> 
> int tmp = x / y;  // artificial top_stmt
> if (cond)
>   q = x / y;
> else
>   q = x / y;
> 
> r = x % y;
> return q + r;
> 
> and then the divmod transform will see "tmp = x / y" as the topmost stmt.
> Since top_stmt is artificially introduced, we will replace that with DIVMOD ifn
> rather than inserting DIVMOD ifn above top_stmt as in other cases.

Yeah, but it is really a general missed optimization that should be not
required for this transform.

Richard.

> >
> >> For the following test-case, I am surprised why CSE didn't take place before
> >> widening_mul pass ?
> >>
> >> int
> >> f_1 (int x, int y)
> >> {
> >>   int q = x / y;
> >>   int r1 = 0, r2 = 0;
> >>   if (cond)
> >>     r1 = x % y;
> >>   else
> >>     r2 = x % y;
> >>   return q + r1 + r2;
> >> }
> >
> > This is not CSE but code hoisting which is not implemented on GIMPLE
> > (see PR23286)
> Ah right, thanks for pointing out the PR.
> 
> Thanks,
> Prathamesh
> >
> >> The input to widening_mul pass is:
> >> f_1 (int x, int y)
> >> {
> >>   int r2;
> >>   int r1;
> >>   int q;
> >>   int cond.0_1;
> >>   int _2;
> >>   int _11;
> >>
> >>   <bb 2>:
> >>   q_7 = x_5(D) / y_6(D);
> >>   cond.0_1 = cond;
> >>   if (cond.0_1 != 0)
> >>     goto <bb 3>;
> >>   else
> >>     goto <bb 4>;
> >>
> >>   <bb 3>:
> >>   r1_9 = x_5(D) % y_6(D);
> >>   goto <bb 5>;
> >>
> >>   <bb 4>:
> >>   r2_10 = x_5(D) % y_6(D);
> >>
> >>   <bb 5>:
> >>   # r1_3 = PHI <r1_9(3), 0(4)>
> >>   # r2_4 = PHI <0(3), r2_10(4)>
> >>   _2 = r1_3 + q_7;
> >>   _11 = _2 + r2_4;
> >>   return _11;
> >>
> >> }
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> > Richard.
> >> >
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> > Richard.
> >> >> >
> >> >>
> >> >>
> >> >
> >> > --
> >> > Richard Biener <rguenther@suse.de>
> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-27 13:04                     ` Richard Biener
@ 2016-05-30  9:56                       ` Prathamesh Kulkarni
  2016-05-30 10:36                         ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-05-30  9:56 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 19903 bytes --]

On 27 May 2016 at 17:31, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 27 May 2016, Prathamesh Kulkarni wrote:
>
>> On 27 May 2016 at 15:45, Richard Biener <rguenther@suse.de> wrote:
>> > On Wed, 25 May 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 25 May 2016 at 12:52, Richard Biener <rguenther@suse.de> wrote:
>> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 24 May 2016 at 19:39, Richard Biener <rguenther@suse.de> wrote:
>> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >> >
>> >> >> >> On 24 May 2016 at 17:42, Richard Biener <rguenther@suse.de> wrote:
>> >> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >> >> >
>> >> >> >> >> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
>> >> >> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>> >> >> >> >> > <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >> >> >> Hi,
>> >> >> >> >> >> I have updated my patch for divmod (attached), which was originally
>> >> >> >> >> >> based on Kugan's patch.
>> >> >> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> >> >> >> >> >> having same operands to divmod representation, so we can cse computation of mod.
>> >> >> >> >> >>
>> >> >> >> >> >> t1 = a TRUNC_DIV_EXPR b;
>> >> >> >> >> >> t2 = a TRUNC_MOD_EXPR b
>> >> >> >> >> >> is transformed to:
>> >> >> >> >> >> complex_tmp = DIVMOD (a, b);
>> >> >> >> >> >> t1 = REALPART_EXPR (complex_tmp);
>> >> >> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);
>> >> >> >> >> >>
>> >> >> >> >> >> * New hook divmod_expand_libfunc
>> >> >> >> >> >> The rationale for introducing the hook is that different targets have
>> >> >> >> >> >> incompatible calling conventions for divmod libfunc.
>> >> >> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
>> >> >> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> >> >> >> >> >> return quotient and store remainder in argument passed as pointer,
>> >> >> >> >> >> while the arm version takes two arguments and returns both
>> >> >> >> >> >> quotient and remainder having mode double the size of the operand mode.
>> >> >> >> >> >> The port should hence override the hook expand_divmod_libfunc
>> >> >> >> >> >> to generate call to target-specific divmod.
>> >> >> >> >> >> Ports should define this hook if:
>> >> >> >> >> >> a) The port does not have divmod or div insn for the given mode.
>> >> >> >> >> >> b) The port defines divmod libfunc for the given mode.
>> >> >> >> >> >> The default hook default_expand_divmod_libfunc() generates call
>> >> >> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> >> >> >> >> >> are of DImode.
>> >> >> >> >> >>
>> >> >> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> >> >> >> >> >> cross-tested on arm*-*-*.
>> >> >> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
>> >> >> >> >> >> Does this patch look OK ?
>> >> >> >> >> >
>> >> >> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> >> >> >> >> > index 6b4601b..e4a021a 100644
>> >> >> >> >> > --- a/gcc/targhooks.c
>> >> >> >> >> > +++ b/gcc/targhooks.c
>> >> >> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
>> >> >> >> >> > machine_mode, optimization_type)
>> >> >> >> >> >    return true;
>> >> >> >> >> >  }
>> >> >> >> >> >
>> >> >> >> >> > +void
>> >> >> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
>> >> >> >> >> > +                              rtx op0, rtx op1,
>> >> >> >> >> > +                              rtx *quot_p, rtx *rem_p)
>> >> >> >> >> >
>> >> >> >> >> > functions need a comment.
>> >> >> >> >> >
>> >> >> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
>> >> >> >> >> > case we could avoid the target hook.
>> >> >> >> >> Well I would prefer adding the hook because that's more easier -;)
>> >> >> >> >> Would it be ok for now to go with the hook ?
>> >> >> >> >> >
>> >> >> >> >> > +      /* If target overrides expand_divmod_libfunc hook
>> >> >> >> >> > +        then perform divmod by generating call to the target-specifc divmod
>> >> >> >> >> > libfunc.  */
>> >> >> >> >> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
>> >> >> >> >> > +       return true;
>> >> >> >> >> > +
>> >> >> >> >> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
>> >> >> >> >> > +      return (mode == DImode && unsignedp);
>> >> >> >> >> >
>> >> >> >> >> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
>> >> >> >> >> > but still restrict this to DImode && unsigned?  Also if
>> >> >> >> >> > targetm.expand_divmod_libfunc
>> >> >> >> >> > is not the default we expect the target to handle all modes?
>> >> >> >> >> Ah indeed, the check for DImode is unnecessary.
>> >> >> >> >> However I suppose the check for unsignedp should be there,
>> >> >> >> >> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
>> >> >> >> >
>> >> >> >> > The optab libfunc for sdivmod should be NULL in that case.
>> >> >> >> Ah indeed, thanks.
>> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > That said - I expected the above piece to be simply a 'return true;' ;)
>> >> >> >> >> >
>> >> >> >> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
>> >> >> >> >> > supports a specific operation (for example SImode divmod would use DImode
>> >> >> >> >> > divmod by means of widening operands - for the unsigned case of course).
>> >> >> >> >> Thanks for pointing out. So if a target does not support divmod
>> >> >> >> >> libfunc for a mode
>> >> >> >> >> but for a wider mode, then we could zero-extend operands to the wider-mode,
>> >> >> >> >> perform divmod on the wider-mode, and then cast result back to the
>> >> >> >> >> original mode.
>> >> >> >> >> I haven't done that in this patch, would it be OK to do that as a follow up ?
>> >> >> >> >
>> >> >> >> > I think that you should conservatively handle the div_optab query, thus if
>> >> >> >> > the target has a HW division in a wider mode don't use the divmod IFN.
>> >> >> >> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
>> >> >> >> > if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
>> >> >> >> > out if that is available.
>> >> >> >> Done.
>> >> >> >> >
>> >> >> >> >> > +  /* Disable the transform if either is a constant, since
>> >> >> >> >> > division-by-constant
>> >> >> >> >> > +     may have specialized expansion.  */
>> >> >> >> >> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
>> >> >> >> >> > +    return false;
>> >> >> >> >> >
>> >> >> >> >> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
>> >> >> >> >> >
>> >> >> >> >> > +  if (TYPE_OVERFLOW_TRAPS (type))
>> >> >> >> >> > +    return false;
>> >> >> >> >> >
>> >> >> >> >> > why's that?  Generally please first test cheap things (trapping, constant-ness)
>> >> >> >> >> > before checking expensive stuff (target_supports_divmod_p).
>> >> >> >> >> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
>> >> >> >> >> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
>> >> >> >> >> "When looking at TRUNC_DIV_EXPR you should also exclude
>> >> >> >> >> the case where TYPE_OVERFLOW_TRAPS (type) as that should
>> >> >> >> >> expand using the [su]divv optabs (no trapping overflow
>> >> >> >> >> divmod optab exists)."
>> >> >> >> >
>> >> >> >> > Ok, didn't remember that.
>> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > +static bool
>> >> >> >> >> > +convert_to_divmod (gassign *stmt)
>> >> >> >> >> > +{
>> >> >> >> >> > +  if (!divmod_candidate_p (stmt))
>> >> >> >> >> > +    return false;
>> >> >> >> >> > +
>> >> >> >> >> > +  tree op1 = gimple_assign_rhs1 (stmt);
>> >> >> >> >> > +  tree op2 = gimple_assign_rhs2 (stmt);
>> >> >> >> >> > +
>> >> >> >> >> > +  vec<gimple *> stmts = vNULL;
>> >> >> >> >> >
>> >> >> >> >> > use an auto_vec <gimple *> - you currently leak it in at least one place.
>> >> >> >> >> >
>> >> >> >> >> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> >> >> >> >> > +       cfg_changed = true;
>> >> >> >> >> >
>> >> >> >> >> > note that this suggests you should check whether any of the stmts may throw
>> >> >> >> >> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and
>> >> >> >> >> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
>> >> >> >> >> > the list of stmts to modify.
>> >> >> >> >> >
>> >> >> >> >> > Btw, I think you should not add 'stmt' immediately but when iterating over
>> >> >> >> >> > all uses also gather uses in TRUNC_MOD_EXPR.
>> >> >> >> >> >
>> >> >> >> >> > Otherwise looks ok.
>> >> >> >> >> Done changes in this version. I am gathering mod uses same time as div uses,
>> >> >> >> >> so this imposes a constraint that mod dominates mod. I am not sure if
>> >> >> >> >> that's desirable.
>> >> >> >> >
>> >> >> >> > I think you also need a mod_seen variable now that you don't necessarily
>> >> >> >> > end up with 'stmt' in the vector of stmts.  I don't see how there is a
>> >> >> >> > constraint that mod dominates mod - it's just that the top_stmt needs
>> >> >> >> > to dominate all other uses that can be replaced with replacing top_stmt
>> >> >> >> > with a divmod.  It's just that the actual stmt set we choose may now
>> >> >> >> > depend on immediate uses order which on a second thought is bad
>> >> >> >> > as immediate uses order could be affected by debug stmts ... hmm.
>> >> >> >> >
>> >> >> >> > To avoid this please re-add the code adding 'stmt' to stmts immediately
>> >> >> >> > and add a use_stmt != stmt check to the immediate use processing loop
>> >> >> >> > so that we don't end up adding it twice.
>> >> >> >> Well I wonder what will happen for the following case:
>> >> >> >> t1 = x / y;
>> >> >> >> if (cond)
>> >> >> >>   t2 = x % y;
>> >> >> >> else
>> >> >> >>   t3 = x % y;
>> >> >> >>
>> >> >> >> Assuming stmt == top_stmt is "t2 = x % y" and use_stmt is "t3 = x % y",
>> >> >> >> use_stmt will not get added to stmts vector, since top_stmt and
>> >> >> >> use_stmt are not in same bb,
>> >> >> >> and bb's containing top_stmt and use_stmt don't dominate each other.
>> >> >> >> Not sure if this is practical case (I assume fre will hoist mod
>> >> >> >> outside if-else?)
>> >> >> >>
>> >> >> >> Now that we immediately add stmt to stmts vector, I suppose mod_seen
>> >> >> >> shall not be required ?
>> >> >> >
>> >> >> > In that case mod_seen is not needed.  But the situation you say will
>> >> >> > still happen so I wonder if we'd need a better way of iterating over
>> >> >> > immediate uses, like first pushing all candidates into a worklist
>> >> >> > vector and then iterating over that until we find no more candidates.
>> >> >> >
>> >> >> > You can then also handle the case of more than one group of stmts
>> >> >> > (the pass currently doesn't iterate in any particular useful order
>> >> >> > over BBs).
>> >> >> IIUC, we want to perform the transform if:
>> >> >> i) there exists top_stmt with code trunc_div_expr/trunc_mod_expr and
>> >> >> having same operands as stmt.
>> >> >> ii) top_stmt dominates all other stmts with code
>> >> >> trunc_div_expr/trunc_mod_expr and having same operands as top_stmt.
>> >> >>
>> >> >> Firstly, we try to get to top_stmt if it exists, by iterating over uses of stmt,
>> >> >> and then iterate over all uses of top_stmt and add them to stmts vector
>> >> >> only if top_stmt dominates all the stmts with same operands as top_stmt
>> >> >> and have code trunc_div_expr/trunc_mod_expr.
>> >> >>
>> >> >> /* Get to top_stmt.  */
>> >> >> top_stmt = stmt;
>> >> >> top_bb = gimple_bb (stmt);
>> >> >>
>> >> >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
>> >> >> {
>> >> >>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
>> >> >>       && use_stmt has same operands as stmt)
>> >> >>     {
>> >> >>       if (gimple_bb (use_stmt) dominates top_bb)
>> >> >>         {
>> >> >>           top_bb = gimple_bb (use_stmt);
>> >> >>           top_stmt = use_stmt;
>> >> >>         }
>> >> >>       else if (gimple_bb (use_stmt) == top_stmt
>> >> >>                && gimple_uid (use_stmt) < top_stmt)
>> >> >>         top_stmt = use_stmt;
>> >> >>     }
>> >> >> }
>> >> >>
>> >> >> /* Speculatively consider top_stmt as dominating all other
>> >> >> div_expr/mod_expr stmts with same operands as stmt.  */
>> >> >> stmts.safe_push (top_stmt);
>> >> >>
>> >> >> /* Walk uses of top_stmt to ensure that all stmts are dominated by top_stmt.  */
>> >> >> top_op1 = gimple_assign_rhs1 (top_stmt);
>> >> >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
>> >> >> {
>> >> >>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
>> >> >>       && use_stmt has same operands as top_stmt)
>> >> >>     {
>> >> >>       if (use_stmt == top_stmt)
>> >> >>         continue;
>> >> >>
>> >> >>       /* No top_stmt exits, do not proceed with transform  */
>> >> >>       if (top_bb does not dominate gimple_bb (use_stmt))
>> >> >>         return false;
>> >> >>
>> >> >>       stmts.safe_push (use_stmt);
>> >> >>     }
>> >> >> }
>> >> >>
>> >> >> For the case:
>> >> >> t1 = x / y;
>> >> >> if (cond)
>> >> >>   t2 = x % y;
>> >> >> else
>> >> >>   t3 = x % y;
>> >> >>
>> >> >> Assuming stmt is "t2 = x % y", it will walk uses of stmt and set
>> >> >> top_stmt to "t1 = x / y"
>> >> >> Then it will walk all uses of top_stmt:
>> >> >> "t2 = x % y" -> dominated by top_stmt
>> >> >> "t3 = x % y" -> dominated by top_stmt
>> >> >> Since all stmts are dominated by top_stmt, we add all three stmts to
>> >> >> vector of stmts and proceed with transform.
>> >> >>
>> >> >> For the case where, top_stmt dominates original stmt but not all stmts:
>> >> >>
>> >> >> if (cond)
>> >> >>   t1 = x / y;
>> >> >> else
>> >> >> {
>> >> >>   t2 = x % y;
>> >> >>   return;
>> >> >> }
>> >> >>
>> >> >> t3 = x % y;
>> >> >>
>> >> >> Assuming stmt is "t3 = x % y",
>> >> >> Walking stmt uses will set top_stmt to "t1 = x / y";
>> >> >>
>> >> >> Walking immediate uses of top_stmt, we find that "t2 = x % y" is not
>> >> >> dominated by top_stmt,
>> >> >> and hence don't do the transform.
>> >> >>
>> >> >> Does this sound reasonable ?
>> >> >
>> >> > Yes, that's reasonable.
>> >> Thanks, I have attached patch that implements above approach.
>> >> Does it look OK ?
>> >
>> > Please start the top-stmt search with
>> >
>> >   top_stmt = stmt;
>> >   top_bb = gimple_bb (stmt);
>> >
>> > this makes sure to process all stmts via the IL walk in case
>> > the uses have multiple independent "dominated" trees.  This also
>> > simplifies the loop body (no need to check for NULL).  This also
>> > makes mod_seen always true and you can compute div_seen in that
>> > loop as well.
>> Done. Um I don't understand why setting top_stmt to NULL won't process
>> all stmts ? AFAIU it will have one extra iteration compared to
>> initializing top_stmt to stmt (since first iteration would initialize
>> top_stmt to stmt assuming stmt does not throw) ?
>
> If you have
>
>   if (cond)
>     {
>       r = x % y;
>       q = x / y;
>     }
>  else
>     {
>       r = x % y;
>       q = x / y;
>    }
>
> then the loop over the function might end up transforming the else
> block when visiting the then block modulo and thus it will never
> transform the then block.  Because you walk immediate uses which
> do not guarantee that you end up with a top_stmt related to the
> IL point you were coming from - the first iteration does _not_
> necessarily have use_stmt == stmt.
Thanks for the explanation,
I overlooked the fact that for first iteration use_stmt may not equal stmt -;)
>
>> >
>> > Otherwise looks ok now.
>> >
>> >> The patch does not still handle the following case:
>> >> int f(int x, int y)
>> >> {
>> >>   extern int cond;
>> >>   int q, r;
>> >>
>> >>   if (cond)
>> >>     q = x % y;
>> >>   else
>> >>     q = x % y;
>> >>
>> >>   r = x % y;
>> >>   return q + r;
>> >> }
>> >>
>> >> In above case although the mod stmt is not dominated by either div
>> >> stmt, I suppose the transform
>> >> is still possible by inserting DIVMOD (x, y) before if-else ?
>> >
>> > Yeah, same for sincos where doing this requires some LCM algorithm.
>> Well I don't have a good approach for this.
>> I was thinking, before doing the divmod transform, we could walk
>> GIMPLE_COND of "diamond" shape
>> (having both arms), and check "then" bb and "else" bb have same div or
>> mod stmts and in that case put an artificial
>> same stmt above GIMPLE_COND.
>>
>> So the above case would be transformed to:
>>
>> int tmp = x / y;  // artificial top_stmt
>> if (cond)
>>   q = x / y;
>> else
>>   q = x / y;
>>
>> r = x % y;
>> return q + r;
>>
>> and then the divmod transform will see "tmp = x / y" as the topmost stmt.
>> Since top_stmt is artificially introduced, we will replace that with DIVMOD ifn
>> rather than inserting DIVMOD ifn above top_stmt as in other cases.
>
> Yeah, but it is really a general missed optimization that should be not
> required for this transform.
The attached patch ICE's during bootstrap for x86_64, and is reproducible with
following case with -m32 -O2:

typedef long long type;

type f(type x, type y)
{
  type q = x / y;
  type r = x % y;
  return q + r;
}

The ICE happens because the test-case hits
gcc_assert (unsignedp);
in default_expand_divmod_libfunc ().

Surprisingly, optab_libfunc (sdivmod_optab, DImode) returns optab_libfunc
with name "__divmoddi4" although __divmoddi4() is nowhere defined in
libgcc for x86.
(I verified that by forcing the patch to generate call to __divmoddi4,
which results in undefined reference to __divmoddi4).

This happens because in optabs.def we have:
OPTAB_NL(sdivmod_optab, "divmod$a4", UNKNOWN, "divmod", '4', gen_int_libfunc)

and gen_int_libfunc generates "__divmoddi4" on first call to optab_libfunc
and sets optab_libfunc (sdivmod_optab, DImode) to "__divmoddi4".
I wonder if we should remove gen_int_libfunc entry in optabs.def for
sdivmod_optab ?

Thanks,
Prathamesh
>
> Richard.
>
>> >
>> >> For the following test-case, I am surprised why CSE didn't take place before
>> >> widening_mul pass ?
>> >>
>> >> int
>> >> f_1 (int x, int y)
>> >> {
>> >>   int q = x / y;
>> >>   int r1 = 0, r2 = 0;
>> >>   if (cond)
>> >>     r1 = x % y;
>> >>   else
>> >>     r2 = x % y;
>> >>   return q + r1 + r2;
>> >> }
>> >
>> > This is not CSE but code hoisting which is not implemented on GIMPLE
>> > (see PR23286)
>> Ah right, thanks for pointing out the PR.
>>
>> Thanks,
>> Prathamesh
>> >
>> >> The input to widening_mul pass is:
>> >> f_1 (int x, int y)
>> >> {
>> >>   int r2;
>> >>   int r1;
>> >>   int q;
>> >>   int cond.0_1;
>> >>   int _2;
>> >>   int _11;
>> >>
>> >>   <bb 2>:
>> >>   q_7 = x_5(D) / y_6(D);
>> >>   cond.0_1 = cond;
>> >>   if (cond.0_1 != 0)
>> >>     goto <bb 3>;
>> >>   else
>> >>     goto <bb 4>;
>> >>
>> >>   <bb 3>:
>> >>   r1_9 = x_5(D) % y_6(D);
>> >>   goto <bb 5>;
>> >>
>> >>   <bb 4>:
>> >>   r2_10 = x_5(D) % y_6(D);
>> >>
>> >>   <bb 5>:
>> >>   # r1_3 = PHI <r1_9(3), 0(4)>
>> >>   # r2_4 = PHI <0(3), r2_10(4)>
>> >>   _2 = r1_3 + q_7;
>> >>   _11 = _2 + r2_4;
>> >>   return _11;
>> >>
>> >> }
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> > Richard.
>> >> >
>> >> >> Thanks,
>> >> >> Prathamesh
>> >> >> >
>> >> >> > Richard.
>> >> >> >
>> >> >>
>> >> >>
>> >> >
>> >> > --
>> >> > Richard Biener <rguenther@suse.de>
>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

[-- Attachment #2: divmod_5_5.diff --]
[-- Type: text/plain, Size: 28088 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 12060ba..1310006 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -61,6 +61,7 @@
 #include "builtins.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "optabs-libfuncs.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -300,6 +301,7 @@ static unsigned HOST_WIDE_INT arm_asan_shadow_offset (void);
 static void arm_sched_fusion_priority (rtx_insn *, int, int *, int*);
 static bool arm_can_output_mi_thunk (const_tree, HOST_WIDE_INT, HOST_WIDE_INT,
 				     const_tree);
+static void arm_expand_divmod_libfunc (bool, machine_mode, rtx, rtx, rtx *, rtx *);
 
 \f
 /* Table of machine attributes.  */
@@ -730,6 +732,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_SCHED_FUSION_PRIORITY
 #define TARGET_SCHED_FUSION_PRIORITY arm_sched_fusion_priority
 
+#undef TARGET_EXPAND_DIVMOD_LIBFUNC
+#define TARGET_EXPAND_DIVMOD_LIBFUNC arm_expand_divmod_libfunc
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 \f
 /* Obstack for minipool constant handling.  */
@@ -30354,6 +30359,37 @@ arm_sched_fusion_priority (rtx_insn *insn, int max_pri,
   return;
 }
 
+/* Expand call to __aeabi_[mode]divmod (op0, op1).  */
+
+static void
+arm_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
+			   rtx op0, rtx op1,
+			   rtx *quot_p, rtx *rem_p)
+{
+  if (mode == SImode)
+    gcc_assert (!TARGET_IDIV);
+
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+  rtx libfunc = optab_libfunc (tab, mode);
+  gcc_assert (libfunc);
+
+  machine_mode libval_mode = smallest_mode_for_size (2 * GET_MODE_BITSIZE (mode),
+						     MODE_INT);
+
+  rtx libval = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+					libval_mode, 2,
+					op0, GET_MODE (op0),
+					op1, GET_MODE (op1));
+
+  rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0);
+  rtx remainder = simplify_gen_subreg (mode, libval, libval_mode, GET_MODE_SIZE (mode));
+
+  gcc_assert (quotient);
+  gcc_assert (remainder);
+  
+  *quot_p = quotient;
+  *rem_p = remainder;
+}
 
 /* Construct and return a PARALLEL RTX vector with elements numbering the
    lanes of either the high (HIGH == TRUE) or low (HIGH == FALSE) half of
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8c7f2a1..111f19f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6963,6 +6963,12 @@ This is firstly introduced on ARM/AArch64 targets, please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool @var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx *@var{quot}, rtx *@var{rem})
+Define this hook if the port does not have hardware div and divmod insn for
+the given mode but has divmod libfunc, which is incompatible
+with libgcc2.c:__udivmoddi4
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f963a58..2c9a800 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4848,6 +4848,8 @@ them: try the first ones in this list first.
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index c867ddc..0cb59f7 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2276,6 +2276,48 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+    optab_libfunc for udivmod/sdivmod.  */
+
+static void 
+expand_DIVMOD (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  bool unsignedp = TYPE_UNSIGNED (type);
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx quotient, remainder;
+
+  /* Check if optab handler exists for [u]divmod.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+    {
+      quotient = gen_reg_rtx (mode);
+      remainder = gen_reg_rtx (mode);
+      expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
+    }
+  else
+    targetm.expand_divmod_libfunc (unsignedp, mode, op0, op1,
+				   &quotient, &remainder);
+
+  /* Wrap the return value (quotient, remainder) within COMPLEX_EXPR.  */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+		       make_tree (TREE_TYPE (arg0), quotient),
+		       make_tree (TREE_TYPE (arg1), remainder)),
+	       target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Return true if FN is supported for the types in TYPES when the
    optimization type is OPT_TYPE.  The types are those associated with
    the "type0" and "type1" fields of FN's direct_internal_fn_info
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index e729d85..56a80f1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -194,6 +194,9 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* Divmod function.  */
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/target.def b/gcc/target.def
index 6392e73..4496f9a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4948,6 +4948,16 @@ Normally, this is not needed.",
  bool, (const_tree field, machine_mode mode),
  default_member_type_forces_blk)
 
+/* See tree-ssa-math-opts.c:divmod_candidate_p for conditions that gate
+   the divmod transform.  */
+DEFHOOK
+(expand_divmod_libfunc,
+ "Define this hook if the port does not have hardware div and divmod insn for\n\
+the given mode but has divmod libfunc, which is incompatible\n\
+with libgcc2.c:__udivmoddi4",
+ void, (bool unsignedp, machine_mode mode, rtx, rtx, rtx *quot, rtx *rem),
+ default_expand_divmod_libfunc)
+
 /* Return the class for a secondary reload, and fill in extra information.  */
 DEFHOOK
 (secondary_reload,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 6b4601b..20327a6 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode, machine_mode, optimization_type)
   return true;
 }
 
+/* Generate call to
+   DImode __udivmoddi4 (DImode op0, DImode op1, DImode *rem).  */
+
+void
+default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
+			       rtx op0, rtx op1,
+			       rtx *quot_p, rtx *rem_p)
+{
+  gcc_assert (mode == DImode);
+  gcc_assert (unsignedp);
+
+  rtx libfunc = optab_libfunc (udivmod_optab, DImode);
+  gcc_assert (libfunc);
+
+  rtx remainder = assign_stack_temp (DImode, GET_MODE_SIZE (DImode));
+  rtx address = XEXP (remainder, 0);
+
+  rtx quotient = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+					  DImode, 3,
+					  op0, GET_MODE (op0),
+					  op1, GET_MODE (op1),
+					  address, GET_MODE (address));
+
+  *quot_p = quotient;
+  *rem_p = remainder;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 7687c39..dc5e8e7 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -254,4 +254,7 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 
+extern void default_expand_divmod_libfunc (bool, machine_mode,
+					   rtx, rtx, rtx *, rtx *);
+
 #endif /* GCC_TARGHOOKS_H */
diff --git a/gcc/testsuite/gcc.dg/divmod-1-simode.c b/gcc/testsuite/gcc.dg/divmod-1-simode.c
new file mode 100644
index 0000000..7405f66
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-1-simode.c
@@ -0,0 +1,22 @@
+/* { dg-require-effective-target divmod_simode } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+/* div dominates mod.  */
+
+extern int cond;
+void foo(void);
+
+#define FOO(smalltype, bigtype, no)  \
+bigtype f_##no(smalltype x, bigtype y) \
+{					 \
+  bigtype q = x / y;                     \
+  if (cond)                              \
+    foo ();                              \
+  bigtype r = x % y;                     \
+  return q + r;                          \
+}
+
+FOO(int, int, 1)
+FOO(int, unsigned, 2)
+FOO(unsigned, unsigned, 5)
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 3 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/divmod-1.c b/gcc/testsuite/gcc.dg/divmod-1.c
new file mode 100644
index 0000000..40aec74
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-1.c
@@ -0,0 +1,26 @@
+/* { dg-require-effective-target divmod } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+/* div dominates mod.  */
+
+extern int cond;
+void foo(void);
+
+#define FOO(smalltype, bigtype, no)	 \
+bigtype f_##no(smalltype x, bigtype y)   \
+{					 \
+  bigtype q = x / y;                     \
+  if (cond)                              \
+    foo ();                              \
+  bigtype r = x % y;                     \
+  return q + r;                          \
+}
+
+FOO(int, long long, 3)
+FOO(int, unsigned long long, 4)
+FOO(unsigned, long long, 6)
+FOO(unsigned, unsigned long long, 7)
+FOO(long long, long long, 8)
+FOO(long long, unsigned long long, 9)
+FOO(unsigned long long, unsigned long long, 10)
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 7 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/divmod-2-simode.c b/gcc/testsuite/gcc.dg/divmod-2-simode.c
new file mode 100644
index 0000000..7c8313b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-2-simode.c
@@ -0,0 +1,22 @@
+/* { dg-require-effective-target divmod_simode } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+/* mod dominates div.  */
+
+extern int cond;
+void foo(void);
+
+#define FOO(smalltype, bigtype, no)  \
+bigtype f_##no(smalltype x, bigtype y) \
+{					 \
+  bigtype r = x % y;                     \
+  if (cond)                              \
+    foo ();                              \
+  bigtype q = x / y;                     \
+  return q + r;                          \
+}
+
+FOO(int, int, 1)
+FOO(int, unsigned, 2)
+FOO(unsigned, unsigned, 5)
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 3 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/divmod-2.c b/gcc/testsuite/gcc.dg/divmod-2.c
new file mode 100644
index 0000000..6a2216c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-2.c
@@ -0,0 +1,26 @@
+/* { dg-require-effective-target divmod } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+/* mod dominates div.  */
+
+extern int cond;
+void foo(void);
+
+#define FOO(smalltype, bigtype, no)	 \
+bigtype f_##no(smalltype x, bigtype y)   \
+{					 \
+  bigtype r = x % y;                     \
+  if (cond)                              \
+    foo ();                              \
+  bigtype q = x / y;                     \
+  return q + r;                          \
+}
+
+FOO(int, long long, 3)
+FOO(int, unsigned long long, 4)
+FOO(unsigned, long long, 6)
+FOO(unsigned, unsigned long long, 7)
+FOO(long long, long long, 8)
+FOO(long long, unsigned long long, 9)
+FOO(unsigned long long, unsigned long long, 10)
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 7 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/divmod-3-simode.c b/gcc/testsuite/gcc.dg/divmod-3-simode.c
new file mode 100644
index 0000000..6f0f63d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-3-simode.c
@@ -0,0 +1,20 @@
+/* { dg-require-effective-target divmod_simode } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+/* div comes before mod in same bb.  */ 
+
+extern int cond;
+void foo(void);
+
+#define FOO(smalltype, bigtype, no)  	 \
+bigtype f_##no(smalltype x, bigtype y)	 \
+{					 \
+  bigtype q = x / y;                     \
+  bigtype r = x % y;                     \
+  return q + r;                          \
+}
+
+FOO(int, int, 1)
+FOO(int, unsigned, 2)
+FOO(unsigned, unsigned, 5)
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 3 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/divmod-3.c b/gcc/testsuite/gcc.dg/divmod-3.c
new file mode 100644
index 0000000..9fe6f64
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-3.c
@@ -0,0 +1,24 @@
+/* { dg-require-effective-target divmod } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+/* div comes before mod in same bb.  */ 
+
+extern int cond;
+void foo(void);
+
+#define FOO(smalltype, bigtype, no)      \
+bigtype f_##no(smalltype x, bigtype y)   \
+{					 \
+  bigtype q = x / y;                     \
+  bigtype r = x % y;                     \
+  return q + r;                          \
+}
+
+FOO(int, long long, 3)
+FOO(int, unsigned long long, 4)
+FOO(unsigned, long long, 6)
+FOO(unsigned, unsigned long long, 7)
+FOO(long long, long long, 8)
+FOO(long long, unsigned long long, 9)
+FOO(unsigned long long, unsigned long long, 10)
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 7 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/divmod-4-simode.c b/gcc/testsuite/gcc.dg/divmod-4-simode.c
new file mode 100644
index 0000000..9c326f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-4-simode.c
@@ -0,0 +1,20 @@
+/* { dg-require-effective-target divmod_simode } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+/* mod comes before div in same bb.  */ 
+
+extern int cond;
+void foo(void);
+
+#define FOO(smalltype, bigtype, no)      \
+bigtype f_##no(smalltype x, bigtype y)   \
+{					 \
+  bigtype r = x % y;                     \
+  bigtype q = x / y;                     \
+  return q + r;                          \
+}
+
+FOO(int, int, 1)
+FOO(int, unsigned, 2)
+FOO(unsigned, unsigned, 5)
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 3 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/divmod-4.c b/gcc/testsuite/gcc.dg/divmod-4.c
new file mode 100644
index 0000000..a5686cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-4.c
@@ -0,0 +1,24 @@
+/* { dg-require-effective-target divmod } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+/* mod comes before div in same bb.  */ 
+
+extern int cond;
+void foo(void);
+
+#define FOO(smalltype, bigtype, no)	\
+bigtype f_##no(smalltype x, bigtype y)  \
+{					\
+  bigtype r = x % y;                    \
+  bigtype q = x / y;                    \
+  return q + r;                         \
+}
+
+FOO(int, long long, 3)
+FOO(int, unsigned long long, 4)
+FOO(unsigned, long long, 6)
+FOO(unsigned, unsigned long long, 7)
+FOO(long long, long long, 8)
+FOO(long long, unsigned long long, 9)
+FOO(unsigned long long, unsigned long long, 10)
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 7 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/divmod-5.c b/gcc/testsuite/gcc.dg/divmod-5.c
new file mode 100644
index 0000000..8a8cee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-5.c
@@ -0,0 +1,19 @@
+/* { dg-require-effective-target divmod_simode } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+/* div and mod are not in same bb and
+   bb's containing div and mod don't dominate each other.  */
+
+int f(int x, int y)
+{
+  int q = 0;
+  int r = 0;
+  extern int cond;
+
+  if (cond)
+    q = x / y;
+
+  r = x % y;
+  return q + r;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/divmod-6-simode.c b/gcc/testsuite/gcc.dg/divmod-6-simode.c
new file mode 100644
index 0000000..3bf6fa3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-6-simode.c
@@ -0,0 +1,24 @@
+/* { dg-require-effective-target divmod_simode } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+
+
+extern int cond;
+void foo(void);
+
+#define FOO(smalltype, bigtype, no)  \
+bigtype f_##no(smalltype x, bigtype y) \
+{					 \
+  bigtype q = x / y;                     \
+  bigtype r1 = 0, r2 = 0;                \
+  if (cond)                              \
+    r1 = x % y;                          \
+  else                                   \
+    r2 = x % y;                          \
+  return q + r1 + r2;                    \
+}
+
+FOO(int, int, 1)
+FOO(int, unsigned, 2)
+FOO(unsigned, unsigned, 5)
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 3 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/divmod-6.c b/gcc/testsuite/gcc.dg/divmod-6.c
new file mode 100644
index 0000000..70e4321
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-6.c
@@ -0,0 +1,27 @@
+/* { dg-require-effective-target divmod } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+
+extern int cond;
+void foo(void);
+
+#define FOO(smalltype, bigtype, no)	 \
+bigtype f_##no(smalltype x, bigtype y)   \
+{					 \
+  bigtype q = x / y;                     \
+  bigtype r1 = 0, r2 = 0;                \
+  if (cond)                              \
+    r1 = x % y;                          \
+  else                                   \
+    r2 = x % y;                          \
+  return q + r1 + r2;                    \
+}
+
+FOO(int, long long, 3)
+FOO(int, unsigned long long, 4)
+FOO(unsigned, long long, 6)
+FOO(unsigned, unsigned long long, 7)
+FOO(long long, long long, 8)
+FOO(long long, unsigned long long, 9)
+FOO(unsigned long long, unsigned long long, 10)
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 7 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/divmod-7.c b/gcc/testsuite/gcc.dg/divmod-7.c
new file mode 100644
index 0000000..a6e7fcd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/divmod-7.c
@@ -0,0 +1,21 @@
+/* { dg-require-effective-target divmod_simode } */
+/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */
+
+int f(int x, int y)
+{
+  int q = 0, r1 = 0, r2 = 0;
+  extern int cond;
+
+  if (cond)
+    q = x / y;
+  else
+    {
+      r1 = x % y;
+      return q + r1;
+    }
+
+  r2 = x % y;
+  return q + r2;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 04ca176..ad7c487 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6986,3 +6986,32 @@ proc check_effective_target_offload_hsa { } {
 	int main () {return 0;}
     } "-foffload=hsa" ]
 }
+
+# For ARM configs defining __ARM_ARCH_EXT_IDIV__, disable divmod_simode test-cases.
+
+proc check_effective_target_arm_divmod_simode { } {
+    return [check_no_compiler_messages arm_divmod assembly {
+	#ifdef __ARM_ARCH_EXT_IDIV__
+	#error has div insn
+	#endif
+	int i;
+    }]
+}
+
+proc check_effective_target_divmod { } {
+    #TODO: Add checks for all targets that have either hardware divmod insn
+    # or define libfunc for divmod.
+    if { [istarget arm*-*-*]
+	 || [istarget x86_64-*-*] } {
+	return 1
+    }
+    return 0
+}
+
+proc check_effective_target_divmod_simode { } {
+    if { [istarget arm*-*-*] } {
+	return [check_effective_target_arm_divmod_simode]
+    }
+
+    return [check_effective_target_divmod]
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 81688cd..4e5cd2b 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -112,6 +112,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
+#include "targhooks.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -184,6 +187,9 @@ static struct
 
   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -3784,6 +3790,222 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
   return true;
 }
 
+/* Return true if target has support for divmod.  */
+
+static bool
+target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode) 
+{
+  /* If target supports hardware divmod insn, use it for divmod.  */
+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing)
+    return true;
+
+  /* Check if libfunc for divmod is available.  */
+  if (optab_libfunc (divmod_optab, mode) != NULL_RTX)
+    {
+      /* If optab_handler exists for div_optab, perhaps in a wider mode,
+	 we don't want to use the libfunc even if it exists for given mode.  */ 
+      for (machine_mode div_mode = mode;
+	   div_mode != VOIDmode;
+	   div_mode = GET_MODE_WIDER_MODE (div_mode))
+	if (optab_handler (div_optab, div_mode) != CODE_FOR_nothing)
+	  return false;
+
+      return true; 
+    }
+
+  return false;
+}
+
+/* Check if stmt is candidate for divmod transform.  */
+
+static bool
+divmod_candidate_p (gassign *stmt)
+{
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  enum machine_mode mode = TYPE_MODE (type);
+  optab divmod_optab, div_optab;
+
+  if (TYPE_UNSIGNED (type))
+    {
+      divmod_optab = udivmod_optab;
+      div_optab = udiv_optab;
+    }
+  else
+    {
+      divmod_optab = sdivmod_optab;
+      div_optab = sdiv_optab;
+    }
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  /* Disable the transform if either is a constant, since division-by-constant
+     may have specialized expansion.  */
+  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
+    return false;
+
+  /* Exclude the case where TYPE_OVERFLOW_TRAPS (type) as that should
+     expand using the [su]divv optabs.  */
+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;
+  
+  if (!target_supports_divmod_p (divmod_optab, div_optab, mode)) 
+    return false;
+
+  return true;
+}
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   For conditions enabling the transform see divmod_candidate_p().
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a
+      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+      it to stmts vector.
+   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+      dominates other div/mod stmts with same operands) and update entries in
+      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+      IMAGPART_EXPR for mod).  */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  if (!divmod_candidate_p (stmt))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+  
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+  auto_vec<gimple *> stmts; 
+
+  gimple *top_stmt = stmt; 
+  basic_block top_bb = gimple_bb (stmt);
+
+  /* Try to set top_stmt to "topmost" stmt
+     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.  */
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  basic_block bb = gimple_bb (use_stmt);
+
+	  if (bb == top_bb)
+	    {
+	      if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
+		top_stmt = use_stmt;
+	    }
+	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+	    {
+	      top_bb = bb;
+	      top_stmt = use_stmt;
+	    }
+	}
+    }
+
+  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
+    return false;
+
+  tree top_op1 = gimple_assign_rhs1 (top_stmt);
+  tree top_op2 = gimple_assign_rhs2 (top_stmt);
+
+  stmts.safe_push (top_stmt);
+  bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
+
+  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */    
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (top_op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (top_op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (use_stmt == top_stmt)
+	    continue;
+
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
+	    {
+	      end_imm_use_stmt_traverse (&use_iter);
+	      return false;
+	    }
+
+	  stmts.safe_push (use_stmt);
+	  if (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR)
+	    div_seen = true;
+	}
+    }
+
+  if (!div_seen)
+    return false;
+
+  /* Create libcall to internal fn DIVMOD:
+     divmod_tmp = DIVMOD (op1, op2).  */
+
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (
+		build_complex_type (TREE_TYPE (op1)),
+		call_stmt, "divmod_tmp");
+  gimple_call_set_lhs (call_stmt, res);
+
+  /* Insert the call before top_stmt.  */
+  gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt);
+  gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT);
+
+  widen_mul_stats.divmod_calls_inserted++;		
+
+  /* Update all statements in stmts.
+     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = REALPART_EXPR<divmod_tmp>
+     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = IMAGPART_EXPR<divmod_tmp>.  */
+
+  bool cfg_changed = false;
+  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
+    {
+      tree new_rhs;
+
+      switch (gimple_assign_rhs_code (use_stmt))
+	{
+	  case TRUNC_DIV_EXPR:
+	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+	    break;
+
+	  case TRUNC_MOD_EXPR:
+	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	}
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
+      update_stmt (use_stmt);
+
+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+	cfg_changed = true;
+    }
+
+  return cfg_changed;
+}    
 
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
@@ -3828,6 +4050,8 @@ pass_optimize_widening_mul::execute (function *fun)
   bool cfg_changed = false;
 
   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
+  renumber_gimple_stmt_uids ();
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -3861,6 +4085,10 @@ pass_optimize_widening_mul::execute (function *fun)
 		    match_uaddsub_overflow (&gsi, stmt, code);
 		  break;
 
+		case TRUNC_MOD_EXPR:
+		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
+		  break;
+
 		default:;
 		}
 	    }
@@ -3907,6 +4135,8 @@ pass_optimize_widening_mul::execute (function *fun)
 			    widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
 			    widen_mul_stats.fmas_inserted);
+  statistics_counter_event (fun, "divmod calls inserted",
+			    widen_mul_stats.divmod_calls_inserted);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-30  9:56                       ` Prathamesh Kulkarni
@ 2016-05-30 10:36                         ` Richard Biener
  2016-05-31 14:20                           ` Prathamesh Kulkarni
                                             ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Richard Biener @ 2016-05-30 10:36 UTC (permalink / raw)
  To: Prathamesh Kulkarni
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson, Joseph S. Myers

On Mon, 30 May 2016, Prathamesh Kulkarni wrote:

> The attached patch ICE's during bootstrap for x86_64, and is reproducible with
> following case with -m32 -O2:
> 
> typedef long long type;
> 
> type f(type x, type y)
> {
>   type q = x / y;
>   type r = x % y;
>   return q + r;
> }
> 
> The ICE happens because the test-case hits
> gcc_assert (unsignedp);
> in default_expand_divmod_libfunc ().

That's of course your function (and ICE).

> Surprisingly, optab_libfunc (sdivmod_optab, DImode) returns optab_libfunc
> with name "__divmoddi4" although __divmoddi4() is nowhere defined in
> libgcc for x86.
> (I verified that by forcing the patch to generate call to __divmoddi4,
> which results in undefined reference to __divmoddi4).
> 
> This happens because in optabs.def we have:
> OPTAB_NL(sdivmod_optab, "divmod$a4", UNKNOWN, "divmod", '4', gen_int_libfunc)
> 
> and gen_int_libfunc generates "__divmoddi4" on first call to optab_libfunc
> and sets optab_libfunc (sdivmod_optab, DImode) to "__divmoddi4".
> I wonder if we should remove gen_int_libfunc entry in optabs.def for
> sdivmod_optab ?

Hum, not sure - you might want to look at expand_divmod (though that
always just computes one part of the result in the end).

Joseph - do you know sth about why there's not a full set of divmod
libfuncs in libgcc?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-30 10:36                         ` Richard Biener
@ 2016-05-31 14:20                           ` Prathamesh Kulkarni
  2016-06-01 11:09                             ` Richard Biener
  2016-06-03 21:10                           ` Joseph Myers
  2016-06-03 23:31                           ` Jim Wilson
  2 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-05-31 14:20 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson, Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 1676 bytes --]

On 30 May 2016 at 13:15, Richard Biener <rguenther@suse.de> wrote:
> On Mon, 30 May 2016, Prathamesh Kulkarni wrote:
>
>> The attached patch ICE's during bootstrap for x86_64, and is reproducible with
>> following case with -m32 -O2:
>>
>> typedef long long type;
>>
>> type f(type x, type y)
>> {
>>   type q = x / y;
>>   type r = x % y;
>>   return q + r;
>> }
>>
>> The ICE happens because the test-case hits
>> gcc_assert (unsignedp);
>> in default_expand_divmod_libfunc ().
>
> That's of course your function (and ICE).
>
>> Surprisingly, optab_libfunc (sdivmod_optab, DImode) returns optab_libfunc
>> with name "__divmoddi4" although __divmoddi4() is nowhere defined in
>> libgcc for x86.
>> (I verified that by forcing the patch to generate call to __divmoddi4,
>> which results in undefined reference to __divmoddi4).
>>
>> This happens because in optabs.def we have:
>> OPTAB_NL(sdivmod_optab, "divmod$a4", UNKNOWN, "divmod", '4', gen_int_libfunc)
>>
>> and gen_int_libfunc generates "__divmoddi4" on first call to optab_libfunc
>> and sets optab_libfunc (sdivmod_optab, DImode) to "__divmoddi4".
>> I wonder if we should remove gen_int_libfunc entry in optabs.def for
>> sdivmod_optab ?
>
> Hum, not sure - you might want to look at expand_divmod (though that
> always just computes one part of the result in the end).
As a workaround, would it be OK to check if libfunc is __udivmoddi4
if expand_divmod_libfunc is default, as in attached patch ?
This prevents ICE for the above test-case.
Bootstrap+test on x86_64 in progress.

Thanks,
Prathamesh
>
> Joseph - do you know sth about why there's not a full set of divmod
> libfuncs in libgcc?
>
> Thanks,
> Richard.

[-- Attachment #2: divmod-part1_6.diff --]
[-- Type: text/plain, Size: 14892 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8c7f2a1..111f19f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6963,6 +6963,12 @@ This is firstly introduced on ARM/AArch64 targets, please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool @var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx *@var{quot}, rtx *@var{rem})
+Define this hook if the port does not have hardware div and divmod insn for
+the given mode but has divmod libfunc, which is incompatible
+with libgcc2.c:__udivmoddi4
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f963a58..2c9a800 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4848,6 +4848,8 @@ them: try the first ones in this list first.
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index c867ddc..0cb59f7 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2276,6 +2276,48 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+    optab_libfunc for udivmod/sdivmod.  */
+
+static void 
+expand_DIVMOD (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  bool unsignedp = TYPE_UNSIGNED (type);
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx quotient, remainder;
+
+  /* Check if optab handler exists for [u]divmod.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+    {
+      quotient = gen_reg_rtx (mode);
+      remainder = gen_reg_rtx (mode);
+      expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
+    }
+  else
+    targetm.expand_divmod_libfunc (unsignedp, mode, op0, op1,
+				   &quotient, &remainder);
+
+  /* Wrap the return value (quotient, remainder) within COMPLEX_EXPR.  */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+		       make_tree (TREE_TYPE (arg0), quotient),
+		       make_tree (TREE_TYPE (arg1), remainder)),
+	       target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Return true if FN is supported for the types in TYPES when the
    optimization type is OPT_TYPE.  The types are those associated with
    the "type0" and "type1" fields of FN's direct_internal_fn_info
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index e729d85..56a80f1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -194,6 +194,9 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* Divmod function.  */
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/target.def b/gcc/target.def
index 6392e73..4496f9a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4948,6 +4948,16 @@ Normally, this is not needed.",
  bool, (const_tree field, machine_mode mode),
  default_member_type_forces_blk)
 
+/* See tree-ssa-math-opts.c:divmod_candidate_p for conditions that gate
+   the divmod transform.  */
+DEFHOOK
+(expand_divmod_libfunc,
+ "Define this hook if the port does not have hardware div and divmod insn for\n\
+the given mode but has divmod libfunc, which is incompatible\n\
+with libgcc2.c:__udivmoddi4",
+ void, (bool unsignedp, machine_mode mode, rtx, rtx, rtx *quot, rtx *rem),
+ default_expand_divmod_libfunc)
+
 /* Return the class for a secondary reload, and fill in extra information.  */
 DEFHOOK
 (secondary_reload,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 6b4601b..20327a6 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode, machine_mode, optimization_type)
   return true;
 }
 
+/* Generate call to
+   DImode __udivmoddi4 (DImode op0, DImode op1, DImode *rem).  */
+
+void
+default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
+			       rtx op0, rtx op1,
+			       rtx *quot_p, rtx *rem_p)
+{
+  gcc_assert (mode == DImode);
+  gcc_assert (unsignedp);
+
+  rtx libfunc = optab_libfunc (udivmod_optab, DImode);
+  gcc_assert (libfunc);
+
+  rtx remainder = assign_stack_temp (DImode, GET_MODE_SIZE (DImode));
+  rtx address = XEXP (remainder, 0);
+
+  rtx quotient = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+					  DImode, 3,
+					  op0, GET_MODE (op0),
+					  op1, GET_MODE (op1),
+					  address, GET_MODE (address));
+
+  *quot_p = quotient;
+  *rem_p = remainder;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 7687c39..dc5e8e7 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -254,4 +254,7 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 
+extern void default_expand_divmod_libfunc (bool, machine_mode,
+					   rtx, rtx, rtx *, rtx *);
+
 #endif /* GCC_TARGHOOKS_H */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 81688cd..9bde79f 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -112,6 +112,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
+#include "targhooks.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -184,6 +187,9 @@ static struct
 
   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -3784,6 +3790,228 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
   return true;
 }
 
+/* Return true if target has support for divmod.  */
+
+static bool
+target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode) 
+{
+  /* If target supports hardware divmod insn, use it for divmod.  */
+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing)
+    return true;
+
+  /* Check if libfunc for divmod is available.  */
+  rtx libfunc = optab_libfunc (divmod_optab, mode);
+  if (libfunc != NULL_RTX)
+    {
+      /* If optab_handler exists for div_optab, perhaps in a wider mode,
+	 we don't want to use the libfunc even if it exists for given mode.  */ 
+      for (machine_mode div_mode = mode;
+	   div_mode != VOIDmode;
+	   div_mode = GET_MODE_WIDER_MODE (div_mode))
+	if (optab_handler (div_optab, div_mode) != CODE_FOR_nothing)
+	  return false;
+
+      /* FIXME: This is a hack to workaround an issue with optab_libfunc().
+	 optab_libfunc (sdivmod_optab, DImode) returns libfunc "__divmoddi4",
+	 although __divmoddi4() does not exist in libgcc. For now, enable the
+	 transform only if libfunc is guaranteed to be __udivmoddi4.  */
+      return (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc
+	     || !strcmp (XSTR (libfunc, 0), "__udivmoddi4"));
+    }
+
+  return false;
+}
+
+/* Check if stmt is candidate for divmod transform.  */
+
+static bool
+divmod_candidate_p (gassign *stmt)
+{
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  enum machine_mode mode = TYPE_MODE (type);
+  optab divmod_optab, div_optab;
+
+  if (TYPE_UNSIGNED (type))
+    {
+      divmod_optab = udivmod_optab;
+      div_optab = udiv_optab;
+    }
+  else
+    {
+      divmod_optab = sdivmod_optab;
+      div_optab = sdiv_optab;
+    }
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  /* Disable the transform if either is a constant, since division-by-constant
+     may have specialized expansion.  */
+  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
+    return false;
+
+  /* Exclude the case where TYPE_OVERFLOW_TRAPS (type) as that should
+     expand using the [su]divv optabs.  */
+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;
+  
+  if (!target_supports_divmod_p (divmod_optab, div_optab, mode)) 
+    return false;
+
+  return true;
+}
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   For conditions enabling the transform see divmod_candidate_p().
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a
+      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+      it to stmts vector.
+   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+      dominates other div/mod stmts with same operands) and update entries in
+      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+      IMAGPART_EXPR for mod).  */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  if (!divmod_candidate_p (stmt))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+  
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+  auto_vec<gimple *> stmts; 
+
+  gimple *top_stmt = stmt; 
+  basic_block top_bb = gimple_bb (stmt);
+
+  /* Try to set top_stmt to "topmost" stmt
+     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.  */
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  basic_block bb = gimple_bb (use_stmt);
+
+	  if (bb == top_bb)
+	    {
+	      if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
+		top_stmt = use_stmt;
+	    }
+	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+	    {
+	      top_bb = bb;
+	      top_stmt = use_stmt;
+	    }
+	}
+    }
+
+  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
+    return false;
+
+  tree top_op1 = gimple_assign_rhs1 (top_stmt);
+  tree top_op2 = gimple_assign_rhs2 (top_stmt);
+
+  stmts.safe_push (top_stmt);
+  bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
+
+  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */    
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (top_op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (top_op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (use_stmt == top_stmt)
+	    continue;
+
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
+	    {
+	      end_imm_use_stmt_traverse (&use_iter);
+	      return false;
+	    }
+
+	  stmts.safe_push (use_stmt);
+	  if (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR)
+	    div_seen = true;
+	}
+    }
+
+  if (!div_seen)
+    return false;
+
+  /* Create libcall to internal fn DIVMOD:
+     divmod_tmp = DIVMOD (op1, op2).  */
+
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (
+		build_complex_type (TREE_TYPE (op1)),
+		call_stmt, "divmod_tmp");
+  gimple_call_set_lhs (call_stmt, res);
+
+  /* Insert the call before top_stmt.  */
+  gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt);
+  gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT);
+
+  widen_mul_stats.divmod_calls_inserted++;		
+
+  /* Update all statements in stmts.
+     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = REALPART_EXPR<divmod_tmp>
+     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = IMAGPART_EXPR<divmod_tmp>.  */
+
+  bool cfg_changed = false;
+  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
+    {
+      tree new_rhs;
+
+      switch (gimple_assign_rhs_code (use_stmt))
+	{
+	  case TRUNC_DIV_EXPR:
+	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+	    break;
+
+	  case TRUNC_MOD_EXPR:
+	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	}
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
+      update_stmt (use_stmt);
+
+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+	cfg_changed = true;
+    }
+
+  return cfg_changed;
+}    
 
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
@@ -3828,6 +4056,8 @@ pass_optimize_widening_mul::execute (function *fun)
   bool cfg_changed = false;
 
   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
+  renumber_gimple_stmt_uids ();
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -3861,6 +4091,10 @@ pass_optimize_widening_mul::execute (function *fun)
 		    match_uaddsub_overflow (&gsi, stmt, code);
 		  break;
 
+		case TRUNC_MOD_EXPR:
+		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
+		  break;
+
 		default:;
 		}
 	    }
@@ -3907,6 +4141,8 @@ pass_optimize_widening_mul::execute (function *fun)
 			    widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
 			    widen_mul_stats.fmas_inserted);
+  statistics_counter_event (fun, "divmod calls inserted",
+			    widen_mul_stats.divmod_calls_inserted);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-31 14:20                           ` Prathamesh Kulkarni
@ 2016-06-01 11:09                             ` Richard Biener
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Biener @ 2016-06-01 11:09 UTC (permalink / raw)
  To: Prathamesh Kulkarni
  Cc: Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Jim Wilson, Joseph S. Myers

On Tue, 31 May 2016, Prathamesh Kulkarni wrote:

> On 30 May 2016 at 13:15, Richard Biener <rguenther@suse.de> wrote:
> > On Mon, 30 May 2016, Prathamesh Kulkarni wrote:
> >
> >> The attached patch ICE's during bootstrap for x86_64, and is reproducible with
> >> following case with -m32 -O2:
> >>
> >> typedef long long type;
> >>
> >> type f(type x, type y)
> >> {
> >>   type q = x / y;
> >>   type r = x % y;
> >>   return q + r;
> >> }
> >>
> >> The ICE happens because the test-case hits
> >> gcc_assert (unsignedp);
> >> in default_expand_divmod_libfunc ().
> >
> > That's of course your function (and ICE).
> >
> >> Surprisingly, optab_libfunc (sdivmod_optab, DImode) returns optab_libfunc
> >> with name "__divmoddi4" although __divmoddi4() is nowhere defined in
> >> libgcc for x86.
> >> (I verified that by forcing the patch to generate call to __divmoddi4,
> >> which results in undefined reference to __divmoddi4).
> >>
> >> This happens because in optabs.def we have:
> >> OPTAB_NL(sdivmod_optab, "divmod$a4", UNKNOWN, "divmod", '4', gen_int_libfunc)
> >>
> >> and gen_int_libfunc generates "__divmoddi4" on first call to optab_libfunc
> >> and sets optab_libfunc (sdivmod_optab, DImode) to "__divmoddi4".
> >> I wonder if we should remove gen_int_libfunc entry in optabs.def for
> >> sdivmod_optab ?
> >
> > Hum, not sure - you might want to look at expand_divmod (though that
> > always just computes one part of the result in the end).
> As a workaround, would it be OK to check if libfunc is __udivmoddi4
> if expand_divmod_libfunc is default, as in attached patch ?

Humm.  I suppose it is because until now we never expanded divmod
directly but only the modulo and division libfuncs are all 
implemented in terms of udivmoddi4.

This means that the signed divmod libfunc expander needs to use
that as well - not sure if that's efficiently possible though.
Which would mean to simply never consider divmoddi4 as available.

Richard.

> This prevents ICE for the above test-case.
> Bootstrap+test on x86_64 in progress.
> 
> Thanks,
> Prathamesh
> >
> > Joseph - do you know sth about why there's not a full set of divmod
> > libfuncs in libgcc?
> >
> > Thanks,
> > Richard.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-30 10:36                         ` Richard Biener
  2016-05-31 14:20                           ` Prathamesh Kulkarni
@ 2016-06-03 21:10                           ` Joseph Myers
  2016-06-03 23:31                           ` Jim Wilson
  2 siblings, 0 replies; 23+ messages in thread
From: Joseph Myers @ 2016-06-03 21:10 UTC (permalink / raw)
  To: Richard Biener
  Cc: Prathamesh Kulkarni, Richard Biener, gcc Patches,
	Ramana Radhakrishnan, Kugan Vivekanandarajah, Jim Wilson

On Mon, 30 May 2016, Richard Biener wrote:

> Joseph - do you know sth about why there's not a full set of divmod
> libfuncs in libgcc?

I'm not familiar with the choice of divmod libfuncs.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-05-30 10:36                         ` Richard Biener
  2016-05-31 14:20                           ` Prathamesh Kulkarni
  2016-06-03 21:10                           ` Joseph Myers
@ 2016-06-03 23:31                           ` Jim Wilson
  2016-06-08 14:23                             ` Richard Biener
  2 siblings, 1 reply; 23+ messages in thread
From: Jim Wilson @ 2016-06-03 23:31 UTC (permalink / raw)
  To: Richard Biener
  Cc: Prathamesh Kulkarni, Richard Biener, gcc Patches,
	Ramana Radhakrishnan, Kugan Vivekanandarajah, Joseph S. Myers

On Mon, May 30, 2016 at 12:45 AM, Richard Biener <rguenther@suse.de> wrote:
> Joseph - do you know sth about why there's not a full set of divmod
> libfuncs in libgcc?

Because udivmoddi4 isn't a libfunc, it is a helper function for the
div and mov libfuncs.  Since we can compute the signed div and mod
results from udivmoddi4, there was no need to also add a signed
version of it.  It was given a libfunc style name so that we had the
option of making it a libfunc in the future, but that never happened.
There was no support for calling any divmod libfunc until it was added
as a special case to call an ARM library (not libgcc) function.  This
happened here

2004-08-09  Mark Mitchell  <mark@codesourcery.com>

        * config.gcc (arm*-*-eabi*): New target.
        * defaults.h (TARGET_LIBGCC_FUNCS): New macro.
        (TARGET_LIB_INT_CMP_BIASED): Likewise.
        * expmed.c (expand_divmod): Try a two-valued divmod function as a
        last resort.
        ...
        * config/arm/arm.c (arm_init_libfuncs): New function.
        (arm_compute_initial_eliminatino_offset): Return HOST_WIDE_INT.
        (TARGET_INIT_LIBFUNCS): Define it.
        ...

Later, two ports added their own divmod libfuncs, but I don't see any
evidence that they were ever used, since there is no support for
calling divmod other than the expand_divmod last resort code that only
triggers for ARM.

It is only now that Prathamesh is adding gimple support for divmod
operations that we need to worry about getting this right, without
breaking the existing ARM library support or the existing udivmoddi4
support.

Jim

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-06-03 23:31                           ` Jim Wilson
@ 2016-06-08 14:23                             ` Richard Biener
  2016-07-28 13:36                               ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2016-06-08 14:23 UTC (permalink / raw)
  To: Jim Wilson
  Cc: Prathamesh Kulkarni, Richard Biener, gcc Patches,
	Ramana Radhakrishnan, Kugan Vivekanandarajah, Joseph S. Myers

On Fri, 3 Jun 2016, Jim Wilson wrote:

> On Mon, May 30, 2016 at 12:45 AM, Richard Biener <rguenther@suse.de> wrote:
> > Joseph - do you know sth about why there's not a full set of divmod
> > libfuncs in libgcc?
> 
> Because udivmoddi4 isn't a libfunc, it is a helper function for the
> div and mov libfuncs.  Since we can compute the signed div and mod
> results from udivmoddi4, there was no need to also add a signed
> version of it.  It was given a libfunc style name so that we had the
> option of making it a libfunc in the future, but that never happened.
> There was no support for calling any divmod libfunc until it was added
> as a special case to call an ARM library (not libgcc) function.  This
> happened here
> 
> 2004-08-09  Mark Mitchell  <mark@codesourcery.com>
> 
>         * config.gcc (arm*-*-eabi*): New target.
>         * defaults.h (TARGET_LIBGCC_FUNCS): New macro.
>         (TARGET_LIB_INT_CMP_BIASED): Likewise.
>         * expmed.c (expand_divmod): Try a two-valued divmod function as a
>         last resort.
>         ...
>         * config/arm/arm.c (arm_init_libfuncs): New function.
>         (arm_compute_initial_eliminatino_offset): Return HOST_WIDE_INT.
>         (TARGET_INIT_LIBFUNCS): Define it.
>         ...
> 
> Later, two ports added their own divmod libfuncs, but I don't see any
> evidence that they were ever used, since there is no support for
> calling divmod other than the expand_divmod last resort code that only
> triggers for ARM.
> 
> It is only now that Prathamesh is adding gimple support for divmod
> operations that we need to worry about getting this right, without
> breaking the existing ARM library support or the existing udivmoddi4
> support.

Ok, so as he is primarily targeting the special arm divmod libcall
I suppose we can live with special-casing libcall handling to
udivmoddi3.  It would be nice to not lie about divmod availablilty
as libcall though... - it looks like the libcall is also guarded
on TARGET_HAS_NO_HW_DIVIDE (unless it was available historically
like on x86).

So not sure where to go from here.

Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-06-08 14:23                             ` Richard Biener
@ 2016-07-28 13:36                               ` Prathamesh Kulkarni
  2016-08-09 10:54                                 ` Prathamesh Kulkarni
  2016-08-13 11:26                                 ` Prathamesh Kulkarni
  0 siblings, 2 replies; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-07-28 13:36 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jim Wilson, Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 2743 bytes --]

On 8 June 2016 at 19:53, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 3 Jun 2016, Jim Wilson wrote:
>
>> On Mon, May 30, 2016 at 12:45 AM, Richard Biener <rguenther@suse.de> wrote:
>> > Joseph - do you know sth about why there's not a full set of divmod
>> > libfuncs in libgcc?
>>
>> Because udivmoddi4 isn't a libfunc, it is a helper function for the
>> div and mov libfuncs.  Since we can compute the signed div and mod
>> results from udivmoddi4, there was no need to also add a signed
>> version of it.  It was given a libfunc style name so that we had the
>> option of making it a libfunc in the future, but that never happened.
>> There was no support for calling any divmod libfunc until it was added
>> as a special case to call an ARM library (not libgcc) function.  This
>> happened here
>>
>> 2004-08-09  Mark Mitchell  <mark@codesourcery.com>
>>
>>         * config.gcc (arm*-*-eabi*): New target.
>>         * defaults.h (TARGET_LIBGCC_FUNCS): New macro.
>>         (TARGET_LIB_INT_CMP_BIASED): Likewise.
>>         * expmed.c (expand_divmod): Try a two-valued divmod function as a
>>         last resort.
>>         ...
>>         * config/arm/arm.c (arm_init_libfuncs): New function.
>>         (arm_compute_initial_eliminatino_offset): Return HOST_WIDE_INT.
>>         (TARGET_INIT_LIBFUNCS): Define it.
>>         ...
>>
>> Later, two ports added their own divmod libfuncs, but I don't see any
>> evidence that they were ever used, since there is no support for
>> calling divmod other than the expand_divmod last resort code that only
>> triggers for ARM.
>>
>> It is only now that Prathamesh is adding gimple support for divmod
>> operations that we need to worry about getting this right, without
>> breaking the existing ARM library support or the existing udivmoddi4
>> support.
>
> Ok, so as he is primarily targeting the special arm divmod libcall
> I suppose we can live with special-casing libcall handling to
> udivmoddi3.  It would be nice to not lie about divmod availablilty
> as libcall though... - it looks like the libcall is also guarded
> on TARGET_HAS_NO_HW_DIVIDE (unless it was available historically
> like on x86).
>
> So not sure where to go from here.
Hi,
I have attached patch, which is rebased on trunk.
Needed to update divmod-7.c, which now gets transformed to divmod
thanks to your code-hoisting patch -;)
We still have the issue of optab_libfunc() returning non-existent
libcalls. As in previous patch, I am checking
explicitly for "__udivmoddi4", with a FIXME note.
I hope that's okay for now ?

Bootstrapped and tested on x86_64-unknown-linux-gnu,
armv8l-unknown-linux-gnueabihf.
Bootstrap+test in progress on i686-linux-gnu.
Cross-tested on arm*-*-*.

Thanks,
Prathamesh
>
> Richard.

[-- Attachment #2: divmod-1-pass.diff --]
[-- Type: text/plain, Size: 15054 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 83bd9ab..e4815cf 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7010,6 +7010,12 @@ This is firstly introduced on ARM/AArch64 targets, please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn

+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool @var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx *@var{quot}, rtx *@var{rem})
+Define this hook if the port does not have hardware div and divmod insn for
+the given mode but has divmod libfunc, which is incompatible
+with libgcc2.c:__udivmoddi4
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index a72c3d8..3efaf4d 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4864,6 +4864,8 @@ them: try the first ones in this list first.

 @hook TARGET_SCHED_FUSION_PRIORITY

+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 49f3495..18876ce 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2326,6 +2326,48 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p

+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+    optab_libfunc for udivmod/sdivmod.  */
+
+static void
+expand_DIVMOD (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  bool unsignedp = TYPE_UNSIGNED (type);
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx quotient, remainder;
+
+  /* Check if optab handler exists for [u]divmod.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+    {
+      quotient = gen_reg_rtx (mode);
+      remainder = gen_reg_rtx (mode);
+      expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
+    }
+  else
+    targetm.expand_divmod_libfunc (unsignedp, mode, op0, op1,
+				   &quotient, &remainder);
+
+  /* Wrap the return value (quotient, remainder) within COMPLEX_EXPR.  */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+		       make_tree (TREE_TYPE (arg0), quotient),
+		       make_tree (TREE_TYPE (arg1), remainder)),
+	       target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Return true if FN is supported for the types in TYPES when the
    optimization type is OPT_TYPE.  The types are those associated with
    the "type0" and "type1" fields of FN's direct_internal_fn_info
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 6701cd9..b221ef9 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -195,6 +195,9 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ATOMIC_COMPARE_EXCHANGE, ECF_LEAF | ECF_NOTHROW, NULL)

+/* Divmod function  */
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/target.def b/gcc/target.def
index 27f9ac2..120a653 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5001,6 +5001,16 @@ Normally, this is not needed.",
  bool, (const_tree field, machine_mode mode),
  default_member_type_forces_blk)

+/* See tree-ssa-math-opts.c:divmod_candidate_p for conditions that gate
+   the divmod transform.  */
+DEFHOOK
+(expand_divmod_libfunc,
+ "Define this hook if the port does not have hardware div and divmod insn for\n\
+the given mode but has divmod libfunc, which is incompatible\n\
+with libgcc2.c:__udivmoddi4",
+ void, (bool unsignedp, machine_mode mode, rtx, rtx, rtx *quot, rtx *rem),
+ default_expand_divmod_libfunc)
+
 /* Return the class for a secondary reload, and fill in extra information.  */
 DEFHOOK
 (secondary_reload,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 69037c1..f506a83 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -2008,4 +2008,33 @@ default_max_noce_ifcvt_seq_cost (edge e)
     return BRANCH_COST (true, predictable_p) * COSTS_N_INSNS (3);
 }

+/* Generate call to
+   DImode __udivmoddi4 (DImode op0, DImode op1, DImode *rem).  */
+
+void
+default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
+                              rtx op0, rtx op1,
+                              rtx *quot_p, rtx *rem_p)
+{
+  gcc_assert (mode == DImode);
+  gcc_assert (unsignedp);
+
+  rtx libfunc = optab_libfunc (udivmod_optab, DImode);
+  gcc_assert (libfunc);
+  gcc_assert (!strcmp (XSTR (libfunc, 0), "__udivmoddi4"));
+
+  rtx remainder = assign_stack_temp (DImode, GET_MODE_SIZE (DImode));
+  rtx address = XEXP (remainder, 0);
+
+  rtx quotient = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+                                         DImode, 3,
+                                         op0, GET_MODE (op0),
+                                         op1, GET_MODE (op1),
+                                         address, GET_MODE (address));
+
+  *quot_p = quotient;
+  *rem_p = remainder;
+}
+
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 2e7ca72..0c0dbe2 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -258,4 +258,7 @@ extern bool default_optab_supported_p (int, machine_mode, machine_mode,

 extern unsigned int default_max_noce_ifcvt_seq_cost (edge);

+extern void default_expand_divmod_libfunc (bool, machine_mode, rtx, rtx,
+					   rtx *, rtx *);
+
 #endif /* GCC_TARGHOOKS_H */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index b93bcf3..ad32744 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -112,6 +112,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
+#include "targhooks.h"

 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -184,6 +187,9 @@ static struct

   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;

 /* The instance of "struct occurrence" representing the highest
@@ -3793,6 +3799,228 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
   return true;
 }

+/* Return true if target has support for divmod.  */
+
+static bool
+target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode)
+{
+  /* If target supports hardware divmod insn, use it for divmod.  */
+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing)
+    return true;
+
+  /* Check if libfunc for divmod is available.  */
+  rtx libfunc = optab_libfunc (divmod_optab, mode);
+  if (libfunc != NULL_RTX)
+    {
+      /* If optab_handler exists for div_optab, perhaps in a wider mode,
+	 we don't want to use the libfunc even if it exists for given mode.  */
+      for (machine_mode div_mode = mode;
+	   div_mode != VOIDmode;
+	   div_mode = GET_MODE_WIDER_MODE (div_mode))
+	if (optab_handler (div_optab, div_mode) != CODE_FOR_nothing)
+	  return false;
+
+      /* FIXME: This is a hack to workaround an issue with optab_libfunc().
+	 optab_libfunc (sdivmod_optab, DImode) returns libfunc "__divmoddi4",
+	 although __divmoddi4() does not exist in libgcc. For now, enable the
+	 transform only if libfunc is guaranteed to be __udivmoddi4.  */
+      return (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc
+	     || !strcmp (XSTR (libfunc, 0), "__udivmoddi4"));
+    }
+
+  return false;
+}
+
+/* Check if stmt is candidate for divmod transform.  */
+
+static bool
+divmod_candidate_p (gassign *stmt)
+{
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  enum machine_mode mode = TYPE_MODE (type);
+  optab divmod_optab, div_optab;
+
+  if (TYPE_UNSIGNED (type))
+    {
+      divmod_optab = udivmod_optab;
+      div_optab = udiv_optab;
+    }
+  else
+    {
+      divmod_optab = sdivmod_optab;
+      div_optab = sdiv_optab;
+    }
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  /* Disable the transform if either is a constant, since division-by-constant
+     may have specialized expansion.  */
+  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
+    return false;
+
+  /* Exclude the case where TYPE_OVERFLOW_TRAPS (type) as that should
+     expand using the [su]divv optabs.  */
+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;
+
+  if (!target_supports_divmod_p (divmod_optab, div_optab, mode))
+    return false;
+
+  return true;
+}
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   For conditions enabling the transform see divmod_candidate_p().
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a
+      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+      it to stmts vector.
+   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+      dominates other div/mod stmts with same operands) and update entries in
+      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+      IMAGPART_EXPR for mod).  */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  if (!divmod_candidate_p (stmt))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+  auto_vec<gimple *> stmts;
+
+  gimple *top_stmt = stmt;
+  basic_block top_bb = gimple_bb (stmt);
+
+  /* Try to set top_stmt to "topmost" stmt
+     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.  */
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  basic_block bb = gimple_bb (use_stmt);
+
+	  if (bb == top_bb)
+	    {
+	      if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
+		top_stmt = use_stmt;
+	    }
+	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+	    {
+	      top_bb = bb;
+	      top_stmt = use_stmt;
+	    }
+	}
+    }
+
+  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
+    return false;
+
+  tree top_op1 = gimple_assign_rhs1 (top_stmt);
+  tree top_op2 = gimple_assign_rhs2 (top_stmt);
+
+  stmts.safe_push (top_stmt);
+  bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
+
+  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (top_op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (top_op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (use_stmt == top_stmt)
+	    continue;
+
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
+	    {
+	      end_imm_use_stmt_traverse (&use_iter);
+	      return false;
+	    }
+
+	  stmts.safe_push (use_stmt);
+	  if (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR)
+	    div_seen = true;
+	}
+    }
+
+  if (!div_seen)
+    return false;
+
+  /* Create libcall to internal fn DIVMOD:
+     divmod_tmp = DIVMOD (op1, op2).  */
+
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (
+		build_complex_type (TREE_TYPE (op1)),
+		call_stmt, "divmod_tmp");
+  gimple_call_set_lhs (call_stmt, res);
+
+  /* Insert the call before top_stmt.  */
+  gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt);
+  gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT);
+
+  widen_mul_stats.divmod_calls_inserted++;
+
+  /* Update all statements in stmts.
+     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = REALPART_EXPR<divmod_tmp>
+     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = IMAGPART_EXPR<divmod_tmp>.  */
+
+  bool cfg_changed = false;
+  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
+    {
+      tree new_rhs;
+
+      switch (gimple_assign_rhs_code (use_stmt))
+	{
+	  case TRUNC_DIV_EXPR:
+	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+	    break;
+
+	  case TRUNC_MOD_EXPR:
+	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	}
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
+      update_stmt (use_stmt);
+
+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+	cfg_changed = true;
+    }
+
+  return cfg_changed;
+}

 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
@@ -3837,6 +4065,8 @@ pass_optimize_widening_mul::execute (function *fun)
   bool cfg_changed = false;

   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
+  renumber_gimple_stmt_uids ();

   FOR_EACH_BB_FN (bb, fun)
     {
@@ -3870,6 +4100,10 @@ pass_optimize_widening_mul::execute (function *fun)
 		    match_uaddsub_overflow (&gsi, stmt, code);
 		  break;

+		case TRUNC_MOD_EXPR:
+		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
+		  break;
+
 		default:;
 		}
 	    }
@@ -3916,6 +4150,8 @@ pass_optimize_widening_mul::execute (function *fun)
 			    widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
 			    widen_mul_stats.fmas_inserted);
+  statistics_counter_event (fun, "divmod calls inserted",
+			    widen_mul_stats.divmod_calls_inserted);

   return cfg_changed ? TODO_cleanup_cfg : 0;
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-07-28 13:36                               ` Prathamesh Kulkarni
@ 2016-08-09 10:54                                 ` Prathamesh Kulkarni
  2016-08-13 11:26                                 ` Prathamesh Kulkarni
  1 sibling, 0 replies; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-08-09 10:54 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jim Wilson, Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Joseph S. Myers

ping https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01867.html

Thanks,
Prathamesh

On 28 July 2016 at 19:05, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 8 June 2016 at 19:53, Richard Biener <rguenther@suse.de> wrote:
>> On Fri, 3 Jun 2016, Jim Wilson wrote:
>>
>>> On Mon, May 30, 2016 at 12:45 AM, Richard Biener <rguenther@suse.de> wrote:
>>> > Joseph - do you know sth about why there's not a full set of divmod
>>> > libfuncs in libgcc?
>>>
>>> Because udivmoddi4 isn't a libfunc, it is a helper function for the
>>> div and mov libfuncs.  Since we can compute the signed div and mod
>>> results from udivmoddi4, there was no need to also add a signed
>>> version of it.  It was given a libfunc style name so that we had the
>>> option of making it a libfunc in the future, but that never happened.
>>> There was no support for calling any divmod libfunc until it was added
>>> as a special case to call an ARM library (not libgcc) function.  This
>>> happened here
>>>
>>> 2004-08-09  Mark Mitchell  <mark@codesourcery.com>
>>>
>>>         * config.gcc (arm*-*-eabi*): New target.
>>>         * defaults.h (TARGET_LIBGCC_FUNCS): New macro.
>>>         (TARGET_LIB_INT_CMP_BIASED): Likewise.
>>>         * expmed.c (expand_divmod): Try a two-valued divmod function as a
>>>         last resort.
>>>         ...
>>>         * config/arm/arm.c (arm_init_libfuncs): New function.
>>>         (arm_compute_initial_eliminatino_offset): Return HOST_WIDE_INT.
>>>         (TARGET_INIT_LIBFUNCS): Define it.
>>>         ...
>>>
>>> Later, two ports added their own divmod libfuncs, but I don't see any
>>> evidence that they were ever used, since there is no support for
>>> calling divmod other than the expand_divmod last resort code that only
>>> triggers for ARM.
>>>
>>> It is only now that Prathamesh is adding gimple support for divmod
>>> operations that we need to worry about getting this right, without
>>> breaking the existing ARM library support or the existing udivmoddi4
>>> support.
>>
>> Ok, so as he is primarily targeting the special arm divmod libcall
>> I suppose we can live with special-casing libcall handling to
>> udivmoddi3.  It would be nice to not lie about divmod availablilty
>> as libcall though... - it looks like the libcall is also guarded
>> on TARGET_HAS_NO_HW_DIVIDE (unless it was available historically
>> like on x86).
>>
>> So not sure where to go from here.
> Hi,
> I have attached patch, which is rebased on trunk.
> Needed to update divmod-7.c, which now gets transformed to divmod
> thanks to your code-hoisting patch -;)
> We still have the issue of optab_libfunc() returning non-existent
> libcalls. As in previous patch, I am checking
> explicitly for "__udivmoddi4", with a FIXME note.
> I hope that's okay for now ?
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu,
> armv8l-unknown-linux-gnueabihf.
> Bootstrap+test in progress on i686-linux-gnu.
> Cross-tested on arm*-*-*.
>
> Thanks,
> Prathamesh
>>
>> Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-07-28 13:36                               ` Prathamesh Kulkarni
  2016-08-09 10:54                                 ` Prathamesh Kulkarni
@ 2016-08-13 11:26                                 ` Prathamesh Kulkarni
  2016-08-13 11:43                                   ` Prathamesh Kulkarni
  1 sibling, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-08-13 11:26 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jim Wilson, Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 3688 bytes --]

On 28 July 2016 at 19:05, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 8 June 2016 at 19:53, Richard Biener <rguenther@suse.de> wrote:
>> On Fri, 3 Jun 2016, Jim Wilson wrote:
>>
>>> On Mon, May 30, 2016 at 12:45 AM, Richard Biener <rguenther@suse.de> wrote:
>>> > Joseph - do you know sth about why there's not a full set of divmod
>>> > libfuncs in libgcc?
>>>
>>> Because udivmoddi4 isn't a libfunc, it is a helper function for the
>>> div and mov libfuncs.  Since we can compute the signed div and mod
>>> results from udivmoddi4, there was no need to also add a signed
>>> version of it.  It was given a libfunc style name so that we had the
>>> option of making it a libfunc in the future, but that never happened.
>>> There was no support for calling any divmod libfunc until it was added
>>> as a special case to call an ARM library (not libgcc) function.  This
>>> happened here
>>>
>>> 2004-08-09  Mark Mitchell  <mark@codesourcery.com>
>>>
>>>         * config.gcc (arm*-*-eabi*): New target.
>>>         * defaults.h (TARGET_LIBGCC_FUNCS): New macro.
>>>         (TARGET_LIB_INT_CMP_BIASED): Likewise.
>>>         * expmed.c (expand_divmod): Try a two-valued divmod function as a
>>>         last resort.
>>>         ...
>>>         * config/arm/arm.c (arm_init_libfuncs): New function.
>>>         (arm_compute_initial_eliminatino_offset): Return HOST_WIDE_INT.
>>>         (TARGET_INIT_LIBFUNCS): Define it.
>>>         ...
>>>
>>> Later, two ports added their own divmod libfuncs, but I don't see any
>>> evidence that they were ever used, since there is no support for
>>> calling divmod other than the expand_divmod last resort code that only
>>> triggers for ARM.
>>>
>>> It is only now that Prathamesh is adding gimple support for divmod
>>> operations that we need to worry about getting this right, without
>>> breaking the existing ARM library support or the existing udivmoddi4
>>> support.
>>
>> Ok, so as he is primarily targeting the special arm divmod libcall
>> I suppose we can live with special-casing libcall handling to
>> udivmoddi3.  It would be nice to not lie about divmod availablilty
>> as libcall though... - it looks like the libcall is also guarded
>> on TARGET_HAS_NO_HW_DIVIDE (unless it was available historically
>> like on x86).
>>
>> So not sure where to go from here.
> Hi,
> I have attached patch, which is rebased on trunk.
> Needed to update divmod-7.c, which now gets transformed to divmod
> thanks to your code-hoisting patch -;)
> We still have the issue of optab_libfunc() returning non-existent
> libcalls. As in previous patch, I am checking
> explicitly for "__udivmoddi4", with a FIXME note.
> I hope that's okay for now ?
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu,
> armv8l-unknown-linux-gnueabihf.
> Bootstrap+test in progress on i686-linux-gnu.
> Cross-tested on arm*-*-*.
Hi Richard,
I have following two approaches to workaround optab_libfunc issue:

a) Not lie about divmod libfunc availability by setting libcall entry to NULL
for sdivmod_optab in optabs.def.
Patch posted for that here:
https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01015.html
Although it doesn't cause any regressions with the gcc testsuite,
I am not sure if this change is correct.

b) Perform the transform only if target-specific divmod is available,
ie, drop targeting
__udivmoddi4. I have attached (untested) patch for that.
When/If we have the optab_libfunc issue resolved, we can later target "generic"
divmod libfunc.

Do either of these approaches look reasonable ?

PS: I am on vacation next week, will get back to working on patch
after returning.

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
>>
>> Richard.

[-- Attachment #2: remove-default-divmod-libfunc.diff --]
[-- Type: text/plain, Size: 3022 bytes --]

diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index f506a83..618c810 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -2012,28 +2012,14 @@ default_max_noce_ifcvt_seq_cost (edge e)
    DImode __udivmoddi4 (DImode op0, DImode op1, DImode *rem).  */
 
 void
-default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
-                              rtx op0, rtx op1,
-                              rtx *quot_p, rtx *rem_p)
+default_expand_divmod_libfunc (bool unsignedp ATTRIBUTE_UNUSED,
+			       machine_mode mode ATTRIBUTE_UNUSED,
+			       rtx op0 ATTRIBUTE_UNUSED,
+			       rtx op1 ATTRIBUTE_UNUSED,
+                               rtx *quot_p ATTRIBUTE_UNUSED,
+			       rtx *rem_p ATTRIBUTE_UNUSED)
 {
-  gcc_assert (mode == DImode);
-  gcc_assert (unsignedp);
-
-  rtx libfunc = optab_libfunc (udivmod_optab, DImode);
-  gcc_assert (libfunc);
-  gcc_assert (!strcmp (XSTR (libfunc, 0), "__udivmoddi4"));
-
-  rtx remainder = assign_stack_temp (DImode, GET_MODE_SIZE (DImode));
-  rtx address = XEXP (remainder, 0);
-
-  rtx quotient = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
-                                         DImode, 3,
-                                         op0, GET_MODE (op0),
-                                         op1, GET_MODE (op1),
-                                         address, GET_MODE (address));
-
-  *quot_p = quotient;
-  *rem_p = remainder;
+  gcc_unreachable (); 
 }
 
 
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index ad32744..fda00c7 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -3808,9 +3808,12 @@ target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode
   if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing)
     return true;
 
-  /* Check if libfunc for divmod is available.  */
-  rtx libfunc = optab_libfunc (divmod_optab, mode);
-  if (libfunc != NULL_RTX)
+  /* Check if target-specific divmod libfunc is available.
+     If target overrides expand_divmod_libfunc, then it *has to*
+     set_optab_libfunc (divmod_optab, mode) to target-specific divmod
+     libfunc.  */
+  
+  if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
     {
       /* If optab_handler exists for div_optab, perhaps in a wider mode,
 	 we don't want to use the libfunc even if it exists for given mode.  */ 
@@ -3820,12 +3823,7 @@ target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode
 	if (optab_handler (div_optab, div_mode) != CODE_FOR_nothing)
 	  return false;
 
-      /* FIXME: This is a hack to workaround an issue with optab_libfunc().
-	 optab_libfunc (sdivmod_optab, DImode) returns libfunc "__divmoddi4",
-	 although __divmoddi4() does not exist in libgcc. For now, enable the
-	 transform only if libfunc is guaranteed to be __udivmoddi4.  */
-      return (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc
-	     || !strcmp (XSTR (libfunc, 0), "__udivmoddi4"));
+      return true;
     }
 
   return false;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFC [1/2] divmod transform
  2016-08-13 11:26                                 ` Prathamesh Kulkarni
@ 2016-08-13 11:43                                   ` Prathamesh Kulkarni
  0 siblings, 0 replies; 23+ messages in thread
From: Prathamesh Kulkarni @ 2016-08-13 11:43 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jim Wilson, Richard Biener, gcc Patches, Ramana Radhakrishnan,
	Kugan Vivekanandarajah, Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 4050 bytes --]

On 13 August 2016 at 16:56, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 28 July 2016 at 19:05, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
>> On 8 June 2016 at 19:53, Richard Biener <rguenther@suse.de> wrote:
>>> On Fri, 3 Jun 2016, Jim Wilson wrote:
>>>
>>>> On Mon, May 30, 2016 at 12:45 AM, Richard Biener <rguenther@suse.de> wrote:
>>>> > Joseph - do you know sth about why there's not a full set of divmod
>>>> > libfuncs in libgcc?
>>>>
>>>> Because udivmoddi4 isn't a libfunc, it is a helper function for the
>>>> div and mov libfuncs.  Since we can compute the signed div and mod
>>>> results from udivmoddi4, there was no need to also add a signed
>>>> version of it.  It was given a libfunc style name so that we had the
>>>> option of making it a libfunc in the future, but that never happened.
>>>> There was no support for calling any divmod libfunc until it was added
>>>> as a special case to call an ARM library (not libgcc) function.  This
>>>> happened here
>>>>
>>>> 2004-08-09  Mark Mitchell  <mark@codesourcery.com>
>>>>
>>>>         * config.gcc (arm*-*-eabi*): New target.
>>>>         * defaults.h (TARGET_LIBGCC_FUNCS): New macro.
>>>>         (TARGET_LIB_INT_CMP_BIASED): Likewise.
>>>>         * expmed.c (expand_divmod): Try a two-valued divmod function as a
>>>>         last resort.
>>>>         ...
>>>>         * config/arm/arm.c (arm_init_libfuncs): New function.
>>>>         (arm_compute_initial_eliminatino_offset): Return HOST_WIDE_INT.
>>>>         (TARGET_INIT_LIBFUNCS): Define it.
>>>>         ...
>>>>
>>>> Later, two ports added their own divmod libfuncs, but I don't see any
>>>> evidence that they were ever used, since there is no support for
>>>> calling divmod other than the expand_divmod last resort code that only
>>>> triggers for ARM.
>>>>
>>>> It is only now that Prathamesh is adding gimple support for divmod
>>>> operations that we need to worry about getting this right, without
>>>> breaking the existing ARM library support or the existing udivmoddi4
>>>> support.
>>>
>>> Ok, so as he is primarily targeting the special arm divmod libcall
>>> I suppose we can live with special-casing libcall handling to
>>> udivmoddi3.  It would be nice to not lie about divmod availablilty
>>> as libcall though... - it looks like the libcall is also guarded
>>> on TARGET_HAS_NO_HW_DIVIDE (unless it was available historically
>>> like on x86).
>>>
>>> So not sure where to go from here.
>> Hi,
>> I have attached patch, which is rebased on trunk.
>> Needed to update divmod-7.c, which now gets transformed to divmod
>> thanks to your code-hoisting patch -;)
>> We still have the issue of optab_libfunc() returning non-existent
>> libcalls. As in previous patch, I am checking
>> explicitly for "__udivmoddi4", with a FIXME note.
>> I hope that's okay for now ?
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu,
>> armv8l-unknown-linux-gnueabihf.
>> Bootstrap+test in progress on i686-linux-gnu.
>> Cross-tested on arm*-*-*.
> Hi Richard,
> I have following two approaches to workaround optab_libfunc issue:
>
> a) Not lie about divmod libfunc availability by setting libcall entry to NULL
> for sdivmod_optab in optabs.def.
> Patch posted for that here:
> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01015.html
> Although it doesn't cause any regressions with the gcc testsuite,
> I am not sure if this change is correct.
>
> b) Perform the transform only if target-specific divmod is available,
> ie, drop targeting
> __udivmoddi4. I have attached (untested) patch for that.
> When/If we have the optab_libfunc issue resolved, we can later target "generic"
> divmod libfunc.
Oops, small mistake in the previous patch.
We also want to check if target has optab_libfunc set for the given mode.
Corrected in this version.

Thanks,
Prathamesh
>
> Do either of these approaches look reasonable ?
>
> PS: I am on vacation next week, will get back to working on patch
> after returning.
>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Prathamesh
>>>
>>> Richard.

[-- Attachment #2: remove-default-divmod-libfunc.diff --]
[-- Type: text/plain, Size: 3098 bytes --]

diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index f506a83..618c810 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -2012,28 +2012,14 @@ default_max_noce_ifcvt_seq_cost (edge e)
    DImode __udivmoddi4 (DImode op0, DImode op1, DImode *rem).  */
 
 void
-default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
-                              rtx op0, rtx op1,
-                              rtx *quot_p, rtx *rem_p)
+default_expand_divmod_libfunc (bool unsignedp ATTRIBUTE_UNUSED,
+			       machine_mode mode ATTRIBUTE_UNUSED,
+			       rtx op0 ATTRIBUTE_UNUSED,
+			       rtx op1 ATTRIBUTE_UNUSED,
+                               rtx *quot_p ATTRIBUTE_UNUSED,
+			       rtx *rem_p ATTRIBUTE_UNUSED)
 {
-  gcc_assert (mode == DImode);
-  gcc_assert (unsignedp);
-
-  rtx libfunc = optab_libfunc (udivmod_optab, DImode);
-  gcc_assert (libfunc);
-  gcc_assert (!strcmp (XSTR (libfunc, 0), "__udivmoddi4"));
-
-  rtx remainder = assign_stack_temp (DImode, GET_MODE_SIZE (DImode));
-  rtx address = XEXP (remainder, 0);
-
-  rtx quotient = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
-                                         DImode, 3,
-                                         op0, GET_MODE (op0),
-                                         op1, GET_MODE (op1),
-                                         address, GET_MODE (address));
-
-  *quot_p = quotient;
-  *rem_p = remainder;
+  gcc_unreachable (); 
 }
 
 
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index ad32744..933db67 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -3808,9 +3808,13 @@ target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode
   if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing)
     return true;
 
-  /* Check if libfunc for divmod is available.  */
-  rtx libfunc = optab_libfunc (divmod_optab, mode);
-  if (libfunc != NULL_RTX)
+  /* Check if target-specific divmod libfunc is available.
+     If target overrides expand_divmod_libfunc, then it *has to*
+     set_optab_libfunc (divmod_optab, mode) to target-specific divmod
+     libfunc or NULL for unsupported modes.  */ 
+  
+  if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc
+      && optab_libfunc (divmod_optab, mode))
     {
       /* If optab_handler exists for div_optab, perhaps in a wider mode,
 	 we don't want to use the libfunc even if it exists for given mode.  */ 
@@ -3820,12 +3824,7 @@ target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode
 	if (optab_handler (div_optab, div_mode) != CODE_FOR_nothing)
 	  return false;
 
-      /* FIXME: This is a hack to workaround an issue with optab_libfunc().
-	 optab_libfunc (sdivmod_optab, DImode) returns libfunc "__divmoddi4",
-	 although __divmoddi4() does not exist in libgcc. For now, enable the
-	 transform only if libfunc is guaranteed to be __udivmoddi4.  */
-      return (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc
-	     || !strcmp (XSTR (libfunc, 0), "__udivmoddi4"));
+      return true;
     }
 
   return false;

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-08-13 11:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-23  8:58 RFC [1/2] divmod transform Prathamesh Kulkarni
2016-05-23 12:05 ` Richard Biener
2016-05-24 12:08   ` Prathamesh Kulkarni
2016-05-24 12:19     ` Richard Biener
2016-05-24 14:52       ` Prathamesh Kulkarni
2016-05-24 14:59         ` Richard Biener
2016-05-24 16:50           ` Prathamesh Kulkarni
2016-05-25  9:20             ` Richard Biener
2016-05-25 13:33               ` Prathamesh Kulkarni
2016-05-27 12:05                 ` Richard Biener
2016-05-27 12:41                   ` Prathamesh Kulkarni
2016-05-27 13:04                     ` Richard Biener
2016-05-30  9:56                       ` Prathamesh Kulkarni
2016-05-30 10:36                         ` Richard Biener
2016-05-31 14:20                           ` Prathamesh Kulkarni
2016-06-01 11:09                             ` Richard Biener
2016-06-03 21:10                           ` Joseph Myers
2016-06-03 23:31                           ` Jim Wilson
2016-06-08 14:23                             ` Richard Biener
2016-07-28 13:36                               ` Prathamesh Kulkarni
2016-08-09 10:54                                 ` Prathamesh Kulkarni
2016-08-13 11:26                                 ` Prathamesh Kulkarni
2016-08-13 11:43                                   ` Prathamesh Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).