From: "Paul A. Clarke" <pc@us.ibm.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: libc-alpha@sourceware.org,
Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Subject: Re: [PATCH v4 06/12] math: Remove powerpc e_hypot
Date: Mon, 6 Dec 2021 15:29:32 -0600 [thread overview]
Message-ID: <20211206212932.GB48332@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> (raw)
In-Reply-To: <9ab1d04d-e6f7-0ca6-0541-374a8a55ab09@linaro.org>
On Mon, Dec 06, 2021 at 02:12:27PM -0300, Adhemerval Zanella wrote:
> On 02/12/2021 21:00, Adhemerval Zanella wrote:
> > The generic implementation is shows only slight worse performance:
> >
> > POWER9 reciprocal-throughput latency
> > master 13.4024 14.0967
> > new hypot 14.8479 15.8061
> >
> > POWER8 reciprocal-throughput latency
> > master 15.5767 16.8885
> > new hypot 16.5371 18.4057
> >
> > One way to improve might to make gcc generate xsmaxdp/xsmindp for
> > fmax/fmin (it onl does for -ffast-math, clang does for default
> > options).
> >
> > Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
> > (power9).
>
> Hi Tulio/Paul,
>
> This the only missing patch in this set and I would like to check with you,
> powerpc maintainers, that if it would be ok to push it. The resulting
> performance difference, including the latest one that removes the wrappers,
> is slight better:
>
>
> POWER9 reciprocal-throughput latency
> master 13.4024 14.0967
> new hypot 11.9206 13.9871
>
> POWER8 reciprocal-throughput latency
> master 15.5767 16.8885
> new hypot 15.3541 18.0856
For Power10 / master:
"hypot": {
"workload-random": {
"duration": 5.28242e+08,
"iterations": 4.8e+07,
"reciprocal-throughput": 8.28478,
"latency": 13.7253,
"max-throughput": 1.20703e+08,
"min-throughput": 7.2858e+07
}
}
For Power10 / new hypot:
"hypot": {
"workload-random": {
"duration": 5.30731e+08,
"iterations": 5.2e+07,
"reciprocal-throughput": 7.21945,
"latency": 13.1933,
"max-throughput": 1.38515e+08,
"min-throughput": 7.57963e+07
}
}
> The POWER8 çatency difference seems to be due branch misprediction
> in the max/min selection. In fact, if I use xsmaxdp/xsmindp on the
> USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8:
>
> POWER8 reciprocal-throughput latency
> xsmaxdp/xsmindp 12.8959 16.2082
>
> POWER9 is not affected (I don't see any performance difference by
> using xsmaxdp/xsmindp).
>
> The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math
> for some reason, clang use them on default -O2 option.
I believe clang may be wrong here, in that the instructions do not
properly handle NaN for the fmin/fmax semantics without -ffast-math.
PC
next prev parent reply other threads:[~2021-12-06 21:29 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
2021-12-03 0:00 ` [PATCH v4 01/12] math: Simplify hypotf implementation Adhemerval Zanella
2021-12-03 13:23 ` Wilco Dijkstra
2021-12-03 19:44 ` Adhemerval Zanella
2021-12-03 0:00 ` [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
2021-12-03 13:41 ` Wilco Dijkstra
2021-12-03 19:44 ` Adhemerval Zanella
2021-12-03 0:00 ` [PATCH v4 03/12] math: Improve hypot performance with FMA Adhemerval Zanella
2021-12-03 13:44 ` Wilco Dijkstra
2021-12-03 0:00 ` [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
2021-12-06 12:00 ` Wilco Dijkstra
2021-12-06 12:21 ` Adhemerval Zanella
2021-12-03 0:00 ` [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
2021-12-06 11:58 ` Wilco Dijkstra
2021-12-06 12:22 ` Adhemerval Zanella
2021-12-03 0:00 ` [PATCH v4 06/12] math: Remove powerpc e_hypot Adhemerval Zanella
2021-12-06 17:12 ` Adhemerval Zanella
2021-12-06 21:29 ` Paul A. Clarke [this message]
2021-12-07 13:19 ` Adhemerval Zanella
2021-12-03 0:00 ` [PATCH v4 07/12] i386: Move hypot implementation to C Adhemerval Zanella
2021-12-03 14:51 ` Wilco Dijkstra
2021-12-06 12:26 ` Adhemerval Zanella
2021-12-03 0:00 ` [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
2021-12-06 11:52 ` Wilco Dijkstra
2021-12-06 21:11 ` Joseph Myers
2021-12-07 13:21 ` Adhemerval Zanella
2021-12-03 0:01 ` [PATCH v4 09/12] math: Add math-use-builtinds-fmin.h Adhemerval Zanella
2021-12-06 11:50 ` Wilco Dijkstra
2021-12-03 0:01 ` [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
2021-12-06 11:46 ` Wilco Dijkstra
2021-12-06 12:35 ` Adhemerval Zanella
2021-12-03 0:01 ` [PATCH v4 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
2021-12-06 11:44 ` Wilco Dijkstra
2021-12-03 0:01 ` [PATCH v4 12/12] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella
2021-12-03 8:51 ` [PATCH v4 00/12] Improve hypot Paul Zimmermann
2021-12-06 12:36 ` Adhemerval Zanella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211206212932.GB48332@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com \
--to=pc@us.ibm.com \
--cc=adhemerval.zanella@linaro.org \
--cc=libc-alpha@sourceware.org \
--cc=tuliom@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).