From: "Dr. Philipp Tomsich" <philipp.tomsich@theobroma-systems.com>
To: James Greenhalgh <james.greenhalgh@arm.com>
Cc: "Kumar, Venkataramanan" <Venkataramanan.Kumar@amd.com>,
"pinskia@gmail.com" <pinskia@gmail.com>,
Benedikt Huber <benedikt.huber@theobroma-systems.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
Ramana Radhakrishnan <ramrad01@arm.com>,
Richard Earnshaw <rearnsha@arm.com>
Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
Date: Mon, 29 Jun 2015 11:56:00 -0000 [thread overview]
Message-ID: <00DB569E-D1C5-4CC5-AA2A-7572DCFEDB11@theobroma-systems.com> (raw)
In-Reply-To: <20150629113635.GA14400@arm.com>
James,
On 29 Jun 2015, at 13:36, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> On Mon, Jun 29, 2015 at 10:18:23AM +0100, Kumar, Venkataramanan wrote:
>>
>>> -----Original Message-----
>>> From: Dr. Philipp Tomsich [mailto:philipp.tomsich@theobroma-systems.com]
>>> Sent: Monday, June 29, 2015 2:17 PM
>>> To: Kumar, Venkataramanan
>>> Cc: pinskia@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org
>>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
>>> estimation in -ffast-math
>>>
>>> Kumar,
>>>
>>> This does not come unexpected, as the initial estimation and each iteration
>>> will add an architecturally-defined number of bits of precision (ARMv8
>>> guarantuees only a minimum number of bits provided per operation… the
>>> exact number is specific to each micro-arch, though).
>>> Depending on your architecture and on the required number of precise bits
>>> by any given benchmark, one may see miscompares.
>>
>> True.
>
> I would be very uncomfortable with this approach.
Same here. The default must be safe. Always.
Unlike other architectures, we don’t have a problem with making the proper
defaults for “safety”, as the ARMv8 ISA guarantees a minimum number of
precise bits per iteration.
> From Richard Biener's post in the thread Michael Matz linked earlier
> in the thread:
>
> It would follow existing practice of things we allow in
> -funsafe-math-optimizations. Existing practice in that we
> want to allow -ffast-math use with common benchmarks we care
> about.
>
> https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00100.html
>
> With the solution you seem to be converging on (2-steps for some
> microarchitectures, 3 for others), a binary generated for one micro-arch
> may drop below a minimum guarantee of precision when run on another. This
> seems to go against the spirit of the practice above. I would only support
> adding this optimization to -Ofast if we could keep to architectural
> guarantees of precision in the generated code (i.e. 3-steps everywhere).
>
> I don't object to adding a "-mlow-precision-recip-sqrt" style option,
> which would be off by default, would enable the 2-step mode, and would
> need to be explicitly enabled (i.e. not implied by -mcpu=foo) but I don't
> see what this buys you beyond the Gromacs boost (and even there you would
> be creating an Invalid Run as optimization flags must be applied across
> all workloads).
Any flag that reduces precision (and thus breaks IEEE floating-point semantics)
needs to be gated with an “unsafe” flag (i.e. one that is never on by default).
As a consequence, the “peak”-tuning for SPEC will turn this on… but barely
anyone else would.
> For the 3-step optimization, it is clear to me that for "generic" tuning
> we don't want this to be enabled by default experimental results and advice
> in this thread argues against it for thunderx and cortex-a57 targets.
> However, enabling it based on the CPU tuning selected seems fine to me.
I do not agree on this one, as I would like to see the safe form (i.e. 3 and 5
iterations respectively) to become the default. Most “server-type” chips
should not see a performance regression, while it will be easier to optimise for
this in hardware than for a (potentially microcoded) sqrt-instruction (and
subsequent, dependent divide).
I have not heard anyone claim a performance regression (either on thunderx
or on cortex-a57), but merely heard a “no speed-up”.
So I am strongly in favor of defaulting to the ‘safe’ number of iterations, even
when compiling for a generic target.
Best,
Philipp.
next prev parent reply other threads:[~2015-06-29 11:44 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-18 11:57 Benedikt Huber
2015-06-18 12:03 ` [PATCH] 2015-06-15 Benedikt Huber <benedikt.huber@theobroma-systems.com> Benedikt Huber
2015-06-27 8:12 ` Andrew Pinski
2015-06-18 12:36 ` [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Kumar, Venkataramanan
2015-06-24 16:49 ` Evandro Menezes
2015-06-24 16:55 ` Dr. Philipp Tomsich
2015-06-24 17:16 ` Benedikt Huber
2015-06-24 18:37 ` Evandro Menezes
2015-06-24 20:11 ` Dr. Philipp Tomsich
2015-06-24 20:54 ` Evandro Menezes
2015-06-25 11:52 ` Benedikt Huber
2015-06-25 7:01 ` Kumar, Venkataramanan
2015-06-25 7:03 ` pinskia
2015-06-25 9:43 ` Ramana Radhakrishnan
2015-06-27 2:01 ` Andrew Pinski
2015-06-25 11:07 ` Benedikt Huber
2015-06-25 13:27 ` Michael Matz
2015-06-25 15:43 ` Kumar, Venkataramanan
2015-06-25 15:52 ` Dr. Philipp Tomsich
2015-06-25 16:47 ` Kumar, Venkataramanan
2015-06-28 15:13 ` pinskia
2015-06-29 8:30 ` Kumar, Venkataramanan
2015-06-29 9:07 ` Dr. Philipp Tomsich
2015-06-29 9:22 ` Kumar, Venkataramanan
2015-06-29 11:44 ` James Greenhalgh
2015-06-29 11:56 ` Dr. Philipp Tomsich [this message]
2015-06-29 16:57 ` pinskia
2015-06-29 19:07 ` Kumar, Venkataramanan
2015-07-14 22:26 ` Evandro Menezes
2015-07-20 9:46 ` Kumar, Venkataramanan
2015-07-20 15:58 ` Evandro Menezes
2015-07-13 19:09 ` Evandro Menezes
2015-07-14 22:20 ` Evandro Menezes
2015-06-29 14:20 ` Benedikt Huber
2015-06-29 17:35 ` Benedikt Huber
2015-06-29 17:44 ` Kumar, Venkataramanan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=00DB569E-D1C5-4CC5-AA2A-7572DCFEDB11@theobroma-systems.com \
--to=philipp.tomsich@theobroma-systems.com \
--cc=Marcus.Shawcroft@arm.com \
--cc=Venkataramanan.Kumar@amd.com \
--cc=benedikt.huber@theobroma-systems.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=james.greenhalgh@arm.com \
--cc=pinskia@gmail.com \
--cc=ramrad01@arm.com \
--cc=rearnsha@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).