Re: nonsense benchtests results

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* Re: nonsense benchtests results
@ 2020-07-29 17:32 Wilco Dijkstra
  2020-07-30  7:51 ` Paul Zimmermann
  0 siblings, 1 reply; 5+ messages in thread
From: Wilco Dijkstra @ 2020-07-29 17:32 UTC (permalink / raw)
  To: paul zimmermann; +Cc: 'GNU C Library'

Hi Paul,

> I'd like to measure the reciprocal-throughput and latency of sin, exp, pow,
> both in single, double and quadruple precision.

Ideally we need representative traces for that. You could reuse the workload
traces from the float version if you just want to do a quick test. You can find
a few more traces here:

https://github.com/ARM-software/optimized-routines/tree/master/math/test/traces

> There is the ##name: workload-<name> documented in benchtests/README for that
> (apparently, <name> is not used anywhere).

It's just a label for the json output so you can add traces from several
workloads.

> I thus added ## name: workload-spec2017.wrf (adapted) in sinf-inputs,
> expf-inputs, powf-inputs, and similar files for binary64 and binary128.

And you added the actual traces from the float versions too?

> However, for binary64 I get nonsense (for "sin" I tried also without "adapted",
> and in both cases the reciprocal-throughput and latency are much too big, and
> for "exp" the latency is smaller than the reciprocal-throughput):

If you don't use a proper trace then the latency number will be wrong. To create
a dependency the benchmark multiplies the previous input by zero and adds it to
the next input of the trace. So traces can't have inputs that return NaN or infinity,
otherwise all subsequent inputs just test NaN!

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nonsense benchtests results
  2020-07-29 17:32 nonsense benchtests results Wilco Dijkstra
@ 2020-07-30  7:51 ` Paul Zimmermann
  2020-07-30 13:29   ` Wilco Dijkstra
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Zimmermann @ 2020-07-30  7:51 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: libc-alpha

       Dear Wilco,

thank you for your answer. I understand what I'm missing is how I can make and
use another trace. And where is the "workload trace for the float version"
located (I could not find it). I'd like to document this in benchtests/README.

Best regards,
Paul

PS: if traces can't have inputs that return NaN or infinity, maybe the
benchtests framework should check it?

> From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> Date: Wed, 29 Jul 2020 17:32:32 +0000
> 
> Hi Paul,
> 
> > I'd like to measure the reciprocal-throughput and latency of sin, exp, pow,
> > both in single, double and quadruple precision.
> 
> Ideally we need representative traces for that. You could reuse the workload
> traces from the float version if you just want to do a quick test. You can find
> a few more traces here:
> 
> https://github.com/ARM-software/optimized-routines/tree/master/math/test/traces
> 
> > There is the ##name: workload-<name> documented in benchtests/README for that
> > (apparently, <name> is not used anywhere).
> 
> It's just a label for the json output so you can add traces from several
> workloads.
> 
> > I thus added ## name: workload-spec2017.wrf (adapted) in sinf-inputs,
> > expf-inputs, powf-inputs, and similar files for binary64 and binary128.
> 
> And you added the actual traces from the float versions too?
> 
> > However, for binary64 I get nonsense (for "sin" I tried also without "adapted",
> > and in both cases the reciprocal-throughput and latency are much too big, and
> > for "exp" the latency is smaller than the reciprocal-throughput):
> 
> If you don't use a proper trace then the latency number will be wrong. To create
> a dependency the benchmark multiplies the previous input by zero and adds it to
> the next input of the trace. So traces can't have inputs that return NaN or infinity,
> otherwise all subsequent inputs just test NaN!
> 
> Cheers,
> Wilco

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nonsense benchtests results
  2020-07-30  7:51 ` Paul Zimmermann
@ 2020-07-30 13:29   ` Wilco Dijkstra
  2020-08-01  7:02     ` Paul Zimmermann
  0 siblings, 1 reply; 5+ messages in thread
From: Wilco Dijkstra @ 2020-07-30 13:29 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: libc-alpha

Hi Paul,

> thank you for your answer. I understand what I'm missing is how I can make and
> use another trace. And where is the "workload trace for the float version"
> located (I could not find it). I'd like to document this in benchtests/README.

grep workload benchtests/*-inputs shows the traces that have been committed.
You can create new traces by running representative code with a printf in the
math function you're interested in. You could also generate randomized inputs
in an interval. They don't have to be very long, the key is to exercise a range of
common inputs rather than repeating the same input a million times.

> PS: if traces can't have inputs that return NaN or infinity, maybe the
> benchtests framework should check it?

Well it could be checked, but traces from real applications won't use NaN. Inf
might happen in some math functions, but I haven't seen it in any traces so far.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nonsense benchtests results
  2020-07-30 13:29   ` Wilco Dijkstra
@ 2020-08-01  7:02     ` Paul Zimmermann
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Zimmermann @ 2020-08-01  7:02 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: libc-alpha

> > thank you for your answer. I understand what I'm missing is how I can make and
> > use another trace. And where is the "workload trace for the float version"
> > located (I could not find it). I'd like to document this in benchtests/README.
> 
> grep workload benchtests/*-inputs shows the traces that have been committed.
> You can create new traces by running representative code with a printf in the
> math function you're interested in. You could also generate randomized inputs
> in an interval. They don't have to be very long, the key is to exercise a range of
> common inputs rather than repeating the same input a million times.
> 
> > PS: if traces can't have inputs that return NaN or infinity, maybe the
> > benchtests framework should check it?
> 
> Well it could be checked, but traces from real applications won't use NaN. Inf
> might happen in some math functions, but I haven't seen it in any traces so far.

thank you Wilco. I now understand how the workload traces scheme works.
After 2.32 is out, I'll submit a patch to better document that, and add
"workload" traces to a few functions.

Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* nonsense benchtests results
@ 2020-07-29 15:53 Paul Zimmermann
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Zimmermann @ 2020-07-29 15:53 UTC (permalink / raw)
  To: libc-alpha

       Hi,

I'd like to measure the reciprocal-throughput and latency of sin, exp, pow,
both in single, double and quadruple precision.

There is the ##name: workload-<name> documented in benchtests/README for that
(apparently, <name> is not used anywhere).

I thus added ## name: workload-spec2017.wrf (adapted) in sinf-inputs,
expf-inputs, powf-inputs, and similar files for binary64 and binary128.

This seems to work well for binary32, for example (this is with glibc-2.31):

  "expf": {
   "workload-spec2017.wrf (adapted)": {
    "duration": 3.32321e+09,
    "iterations": 1.14432e+08,
    "reciprocal-throughput": 14.88,
    "latency": 43.2017,
    "max-throughput": 6.72042e+07,
    "min-throughput": 2.31472e+07
   }

However, for binary64 I get nonsense (for "sin" I tried also without "adapted",
and in both cases the reciprocal-throughput and latency are much too big, and
for "exp" the latency is smaller than the reciprocal-throughput):

  "sin": {
   "workload-spec2017.wrf": {
    "duration": 4.02532e+09,
    "iterations": 2.8e+07,
    "reciprocal-throughput": 126.621,
    "latency": 160.902,
    "max-throughput": 7.89757e+06,
    "min-throughput": 6.21498e+06
   },

  "exp": {
   "workload-spec2017.wrf (adapted)": {
    "duration": 3.29728e+09,
    "iterations": 1.2336e+08,
    "reciprocal-throughput": 32.4024,
    "latency": 21.0555,
    "max-throughput": 3.08619e+07,
    "min-throughput": 4.74936e+07
   },

Same issue with binary128, with latency smaller than reciprocal-throughput:

  "expf128": {
   "workload-spec2017.wrf (adapted)": {
    "duration": 4.07871e+09,
    "iterations": 3.084e+06,
    "reciprocal-throughput": 2471.88,
    "latency": 173.196,
    "max-throughput": 404550,
    "min-throughput": 5.77382e+06
   },

  "powf128": {
   "workload-spec2017.wrf (adapted)": {
    "duration": 3.62441e+09,
    "iterations": 2.408e+06,
    "reciprocal-throughput": 2793.63,
    "latency": 216.675,
    "max-throughput": 357957,
    "min-throughput": 4.6152e+06
   },

Can someone make it work?

Paul






^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-01  7:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-29 17:32 nonsense benchtests results Wilco Dijkstra
2020-07-30  7:51 ` Paul Zimmermann
2020-07-30 13:29   ` Wilco Dijkstra
2020-08-01  7:02     ` Paul Zimmermann
  -- strict thread matches above, loose matches on Subject: below --
2020-07-29 15:53 Paul Zimmermann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).