From: James Greenhalgh <james.greenhalgh@arm.com>
To: Evandro Menezes <e.menezes@samsung.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
Andrew Pinski <pinskia@gmail.com>,
Philipp Tomsich <philipp.tomsich@theobroma-systems.com>,
Benedikt Huber <benedikt.huber@theobroma-systems.com>,
<nd@arm.com>
Subject: Re: [PATCH 3/3][AArch64] Emit division using the Newton series
Date: Tue, 31 May 2016 10:54:00 -0000 [thread overview]
Message-ID: <20160531092754.GA17601@arm.com> (raw)
In-Reply-To: <5748D0DA.4070301@samsung.com>
On Fri, May 27, 2016 at 05:57:30PM -0500, Evandro Menezes wrote:
> On 05/25/16 11:16, James Greenhalgh wrote:
> >On Wed, Apr 27, 2016 at 04:15:53PM -0500, Evandro Menezes wrote:
> >> gcc/
> >> * config/aarch64/aarch64-protos.h
> >> (tune_params): Add new member "approx_div_modes".
> >> (aarch64_emit_approx_div): Declare new function.
> >> * config/aarch64/aarch64.c
> >> (generic_tunings): New member "approx_div_modes".
> >> (cortexa35_tunings): Likewise.
> >> (cortexa53_tunings): Likewise.
> >> (cortexa57_tunings): Likewise.
> >> (cortexa72_tunings): Likewise.
> >> (exynosm1_tunings): Likewise.
> >> (thunderx_tunings): Likewise.
> >> (xgene1_tunings): Likewise.
> >> (aarch64_emit_approx_div): Define new function.
> >> * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
> >> * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
> >> * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
> >> * doc/invoke.texi (-mlow-precision-div): Describe new option.
> >My comments from the other two patches around using a structure to
> >group up the tuning flags and whether we really want the new option
> >apply here too.
> >
> >This code has no consumers by default and is only used for
> >-mlow-precision-div. Is this option likely to be useful to our users in
> >practice? It might all be more palatable under something like the rs6000's
> >-mrecip=opt .
>
> I agree. OK as a follow up?
Works for me.
> >>diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> >>index 47ccb18..7e99e16 100644
> >>--- a/gcc/config/aarch64/aarch64-simd.md
> >>+++ b/gcc/config/aarch64/aarch64-simd.md
> >>@@ -1509,7 +1509,19 @@
> >> [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
> >> )
> >>-(define_insn "div<mode>3"
> >>+(define_expand "div<mode>3"
> >>+ [(set (match_operand:VDQF 0 "register_operand")
> >>+ (div:VDQF (match_operand:VDQF 1 "general_operand")
> >What does this relaxation to general_operand give you?
>
> Hold that thought...
>
> >>+ (match_operand:VDQF 2 "register_operand")))]
> >>+ "TARGET_SIMD"
> >>+{
> >>+ if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
> >>+ DONE;
> >>+
> >>+ operands[1] = force_reg (<MODE>mode, operands[1]);
> >...other than the need to do this (sorry if I've missed something obvious).
>
> Hold on...
>
> >>+ if (num != CONST1_RTX (mode))
> >>+ {
> >>+ /* Calculate the approximate division. */
> >>+ rtx xnum = force_reg (mode, num);
> >>+ emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
> >>+ }
>
> About that relaxation, as you can see here, since the series
> approximates the reciprocal of the denominator, if the numerator is
> 1.0, a register can be spared, as the result is ready and the
> numerator is not needed.
But, in the case that the multiplication is by 1, can we not rely on the
other optimization machinery eliminating it? I mean, I see the optimization
that this enables for you, but can't you rely on future passes to do the
cleanup, and save yourself the few lines of special casing?
> +/* Emit the instruction sequence to compute the approximation for the division
> + of NUM by DEN and return whether the sequence was emitted or not. */
Needs a brief mention of what we use QUO for :).
> +
> +bool
> +aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
> +{
> + machine_mode mode = GET_MODE (quo);
> + bool use_approx_division_p = (flag_mlow_precision_div
> + || (aarch64_tune_params.approx_modes->division
> + & AARCH64_APPROX_MODE (mode)));
> +
> + if (!flag_finite_math_only
> + || flag_trapping_math
> + || !flag_unsafe_math_optimizations
> + || optimize_function_for_size_p (cfun)
> + || !use_approx_division_p)
> + return false;
> +
> + /* Estimate the approximate reciprocal. */
> + rtx xrcp = gen_reg_rtx (mode);
> + emit_insn ((*get_recpe_type (mode)) (xrcp, den));
> +
> + /* Iterate over the series twice for SF and thrice for DF. */
> + int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
> +
> + /* Optionally iterate over the series once less for faster performance,
> + while sacrificing the accuracy. */
> + if (flag_mlow_precision_div)
> + iterations--;
> +
> + /* Iterate over the series to calculate the approximate reciprocal. */
> + rtx xtmp = gen_reg_rtx (mode);
> + while (iterations--)
> + {
> + emit_insn ((*get_recps_type (mode)) (xtmp, xrcp, den));
> +
> + if (iterations > 0)
> + emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
> + }
> +
> + if (num != CONST1_RTX (mode))
> + {
> + /* As the approximate reciprocal of the denominator is already calculated,
> + only calculate the approximate division when the numerator is not 1.0. */
Long lines.
> + rtx xnum = force_reg (mode, num);
> + emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
> + }
> +
> + /* Finalize the approximation. */
> + emit_set_insn (quo, gen_rtx_MULT (mode, xrcp, xtmp));
> + return true;
> +}
> +
> /* Return the number of instructions that can be issued per cycle. */
> static int
> aarch64_sched_issue_rate (void)
Thanks,
James
next prev parent reply other threads:[~2016-05-31 9:28 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-27 21:16 Evandro Menezes
2016-05-25 18:57 ` James Greenhalgh
2016-05-28 13:53 ` Evandro Menezes
2016-05-31 10:54 ` James Greenhalgh [this message]
2016-05-31 21:31 ` Evandro Menezes
2016-06-03 21:50 ` Evandro Menezes
2016-06-13 10:15 ` James Greenhalgh
2016-06-13 19:06 ` Evandro Menezes
2016-06-14 8:29 ` Christophe Lyon
2016-06-14 15:59 ` Evandro Menezes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160531092754.GA17601@arm.com \
--to=james.greenhalgh@arm.com \
--cc=Wilco.Dijkstra@arm.com \
--cc=benedikt.huber@theobroma-systems.com \
--cc=e.menezes@samsung.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nd@arm.com \
--cc=philipp.tomsich@theobroma-systems.com \
--cc=pinskia@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).