Re: [PATCH 3/3][AArch64] Emit division using the Newton series

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: James Greenhalgh <james.greenhalgh@arm.com>
To: Evandro Menezes <e.menezes@samsung.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Wilco Dijkstra	<Wilco.Dijkstra@arm.com>,
	Andrew Pinski <pinskia@gmail.com>,
	Philipp Tomsich	<philipp.tomsich@theobroma-systems.com>,
	Benedikt Huber	<benedikt.huber@theobroma-systems.com>,
	<nd@arm.com>
Subject: Re: [PATCH 3/3][AArch64] Emit division using the Newton series
Date: Tue, 31 May 2016 10:54:00 -0000	[thread overview]
Message-ID: <20160531092754.GA17601@arm.com> (raw)
In-Reply-To: <5748D0DA.4070301@samsung.com>

On Fri, May 27, 2016 at 05:57:30PM -0500, Evandro Menezes wrote:
> On 05/25/16 11:16, James Greenhalgh wrote:
> >On Wed, Apr 27, 2016 at 04:15:53PM -0500, Evandro Menezes wrote:
> >>    gcc/
> >>         * config/aarch64/aarch64-protos.h
> >>         (tune_params): Add new member "approx_div_modes".
> >>         (aarch64_emit_approx_div): Declare new function.
> >>         * config/aarch64/aarch64.c
> >>         (generic_tunings): New member "approx_div_modes".
> >>         (cortexa35_tunings): Likewise.
> >>         (cortexa53_tunings): Likewise.
> >>         (cortexa57_tunings): Likewise.
> >>         (cortexa72_tunings): Likewise.
> >>         (exynosm1_tunings): Likewise.
> >>         (thunderx_tunings): Likewise.
> >>         (xgene1_tunings): Likewise.
> >>         (aarch64_emit_approx_div): Define new function.
> >>         * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
> >>         * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
> >>         * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
> >>         * doc/invoke.texi (-mlow-precision-div): Describe new option.
> >My comments from the other two patches around using a structure to
> >group up the tuning flags and whether we really want the new option
> >apply here too.
> >
> >This code has no consumers by default and is only used for
> >-mlow-precision-div. Is this option likely to be useful to our users in
> >practice? It might all be more palatable under something like the rs6000's
> >-mrecip=opt .
> 
> I agree.  OK as a follow up?

Works for me.

> >>diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> >>index 47ccb18..7e99e16 100644
> >>--- a/gcc/config/aarch64/aarch64-simd.md
> >>+++ b/gcc/config/aarch64/aarch64-simd.md
> >>@@ -1509,7 +1509,19 @@
> >>    [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
> >>  )
> >>-(define_insn "div<mode>3"
> >>+(define_expand "div<mode>3"
> >>+ [(set (match_operand:VDQF 0 "register_operand")
> >>+       (div:VDQF (match_operand:VDQF 1 "general_operand")
> >What does this relaxation to general_operand give you?
> 
> Hold that thought...
> 
> >>+		 (match_operand:VDQF 2 "register_operand")))]
> >>+ "TARGET_SIMD"
> >>+{
> >>+  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
> >>+    DONE;
> >>+
> >>+  operands[1] = force_reg (<MODE>mode, operands[1]);
> >...other than the need to do this (sorry if I've missed something obvious).
> 
> Hold on...
> 
> >>+  if (num != CONST1_RTX (mode))
> >>+    {
> >>+      /* Calculate the approximate division.  */
> >>+      rtx xnum = force_reg (mode, num);
> >>+      emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
> >>+    }
> 
> About that relaxation, as you can see here, since the series
> approximates the reciprocal of the denominator, if the numerator is
> 1.0, a register can be spared, as the result is ready and the
> numerator is not needed.

But, in the case that the multiplication is by 1, can we not rely on the
other optimization machinery eliminating it? I mean, I see the optimization
that this enables for you, but can't you rely on future passes to do the
cleanup, and save yourself the few lines of special casing?

> +/* Emit the instruction sequence to compute the approximation for the division
> +   of NUM by DEN and return whether the sequence was emitted or not.  */

Needs a brief mention of what we use QUO for :).

> +
> +bool
> +aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
> +{
> +  machine_mode mode = GET_MODE (quo);
> +  bool use_approx_division_p = (flag_mlow_precision_div
> +			        || (aarch64_tune_params.approx_modes->division
> +				    & AARCH64_APPROX_MODE (mode)));
> +
> +  if (!flag_finite_math_only
> +      || flag_trapping_math
> +      || !flag_unsafe_math_optimizations
> +      || optimize_function_for_size_p (cfun)
> +      || !use_approx_division_p)
> +    return false;
> +
> +  /* Estimate the approximate reciprocal.  */
> +  rtx xrcp = gen_reg_rtx (mode);
> +  emit_insn ((*get_recpe_type (mode)) (xrcp, den));
> +
> +  /* Iterate over the series twice for SF and thrice for DF.  */
> +  int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
> +
> +  /* Optionally iterate over the series once less for faster performance,
> +     while sacrificing the accuracy.  */
> +  if (flag_mlow_precision_div)
> +    iterations--;
> +
> +  /* Iterate over the series to calculate the approximate reciprocal.  */
> +  rtx xtmp = gen_reg_rtx (mode);
> +  while (iterations--)
> +    {
> +      emit_insn ((*get_recps_type (mode)) (xtmp, xrcp, den));
> +
> +      if (iterations > 0)
> +	emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
> +    }
> +
> +  if (num != CONST1_RTX (mode))
> +    {
> +      /* As the approximate reciprocal of the denominator is already calculated,
> +         only calculate the approximate division when the numerator is not 1.0.  */

Long lines.

> +      rtx xnum = force_reg (mode, num);
> +      emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
> +    }
> +
> +  /* Finalize the approximation.  */
> +  emit_set_insn (quo, gen_rtx_MULT (mode, xrcp, xtmp));
> +  return true;
> +}
> +
>  /* Return the number of instructions that can be issued per cycle.  */
>  static int
>  aarch64_sched_issue_rate (void)

Thanks,
James

next prev parent reply	other threads:[~2016-05-31  9:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-27 21:16 Evandro Menezes
2016-05-25 18:57 ` James Greenhalgh
2016-05-28 13:53   ` Evandro Menezes
2016-05-31 10:54     ` James Greenhalgh [this message]
2016-05-31 21:31       ` Evandro Menezes
2016-06-03 21:50         ` Evandro Menezes
2016-06-13 10:15           ` James Greenhalgh
2016-06-13 19:06             ` Evandro Menezes
2016-06-14  8:29               ` Christophe Lyon
2016-06-14 15:59                 ` Evandro Menezes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160531092754.GA17601@arm.com \
    --to=james.greenhalgh@arm.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=benedikt.huber@theobroma-systems.com \
    --cc=e.menezes@samsung.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    --cc=philipp.tomsich@theobroma-systems.com \
    --cc=pinskia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).