[PATCH] Reduce floating-point difficulties in timevar.cc

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Reduce floating-point difficulties in timevar.cc
@ 2023-07-21 12:11 Matthew Malcomson
  2023-07-21 12:47 ` Richard Biener
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Matthew Malcomson @ 2023-07-21 12:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, rguenther

[-- Attachment #1: Type: text/plain, Size: 1515 bytes --]

On some AArch64 bootstrapped builds, we were getting a flaky test
because the floating point operations in `get_time` were being fused
with the floating point operations in `timevar_accumulate`.

This meant that the rounding behaviour of our multiplication with
`ticks_to_msec` was different when used in `timer::start` and when
performed in `timer::stop`.  These extra inaccuracies led to the
testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.

This change ensures those operations are not fused and hence stops the test
being flaky on that particular machine.  There is no expected change in the
generated code.
Bootstrap & regtest on AArch64 passes with no regressions.

gcc/ChangeLog:

	* timevar.cc (get_time): Make this noinline to avoid fusing
	behaviour and associated test flakyness.

N.b. I didn't know who to include as reviewer -- guessed Richard Biener as the
global reviewer that had the most contributions to this file and Richard
Sandiford since I've asked him for reviews a lot in the past.

###############     Attachment also inlined for ease of reply    ###############

diff --git a/gcc/timevar.cc b/gcc/timevar.cc
index d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18 100644
--- a/gcc/timevar.cc
+++ b/gcc/timevar.cc
@@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const timevar_time_def *total)
    HAVE_WALL_TIME macros.  */

 static void
+__attribute__((noinline))
 get_time (struct timevar_time_def *now)
 {
   now->user = 0;

[-- Attachment #2: noinline-get-time.patch --]
[-- Type: text/plain, Size: 409 bytes --]

diff --git a/gcc/timevar.cc b/gcc/timevar.cc
index d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18 100644
--- a/gcc/timevar.cc
+++ b/gcc/timevar.cc
@@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const timevar_time_def *total)
    HAVE_WALL_TIME macros.  */

 static void
+__attribute__((noinline))
 get_time (struct timevar_time_def *now)
 {
   now->user = 0;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Reduce floating-point difficulties in timevar.cc
  2023-07-21 12:11 [PATCH] Reduce floating-point difficulties in timevar.cc Matthew Malcomson
@ 2023-07-21 12:47 ` Richard Biener
  2023-07-21 13:11   ` Matthew Malcomson
  2023-08-03 13:38   ` [PATCH] mid-end: Use integral time intervals " Matthew Malcomson
  2023-07-21 12:49 ` [PATCH] Reduce floating-point difficulties " Xi Ruoyao
  2023-07-21 17:41 ` Andrew Pinski
  2 siblings, 2 replies; 18+ messages in thread
From: Richard Biener @ 2023-07-21 12:47 UTC (permalink / raw)
  To: Matthew Malcomson; +Cc: gcc-patches, richard.sandiford

On Fri, 21 Jul 2023, Matthew Malcomson wrote:

> On some AArch64 bootstrapped builds, we were getting a flaky test
> because the floating point operations in `get_time` were being fused
> with the floating point operations in `timevar_accumulate`.
> 
> This meant that the rounding behaviour of our multiplication with
> `ticks_to_msec` was different when used in `timer::start` and when
> performed in `timer::stop`.  These extra inaccuracies led to the
> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
> 
> This change ensures those operations are not fused and hence stops the test
> being flaky on that particular machine.  There is no expected change in the
> generated code.
> Bootstrap & regtest on AArch64 passes with no regressions.

I think this is undesriable.  With fused you mean we use FMA?
I think you could use -ffp-contract=off for the TU instead.

Note you can't use __attribute__((noinline)) literally since the
host compiler might not support this.

Richard.

> gcc/ChangeLog:
> 
> 	* timevar.cc (get_time): Make this noinline to avoid fusing
> 	behaviour and associated test flakyness.
> 
> 
> N.b. I didn't know who to include as reviewer -- guessed Richard Biener as the
> global reviewer that had the most contributions to this file and Richard
> Sandiford since I've asked him for reviews a lot in the past.
> 
> 
> ###############     Attachment also inlined for ease of reply    ###############
> 
> 
> diff --git a/gcc/timevar.cc b/gcc/timevar.cc
> index d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18 100644
> --- a/gcc/timevar.cc
> +++ b/gcc/timevar.cc
> @@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const timevar_time_def *total)
>     HAVE_WALL_TIME macros.  */
>  
>  static void
> +__attribute__((noinline))
>  get_time (struct timevar_time_def *now)
>  {
>    now->user = 0;
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Reduce floating-point difficulties in timevar.cc
  2023-07-21 12:11 [PATCH] Reduce floating-point difficulties in timevar.cc Matthew Malcomson
  2023-07-21 12:47 ` Richard Biener
@ 2023-07-21 12:49 ` Xi Ruoyao
  2023-07-21 17:41 ` Andrew Pinski
  2 siblings, 0 replies; 18+ messages in thread
From: Xi Ruoyao @ 2023-07-21 12:49 UTC (permalink / raw)
  To: Matthew Malcomson, gcc-patches; +Cc: richard.sandiford, rguenther

On Fri, 2023-07-21 at 13:11 +0100, Matthew Malcomson via Gcc-patches
wrote:
> This change ensures those operations are not fused and hence stops the test
> being flaky on that particular machine.  There is no expected change in the
> generated code.
> Bootstrap & regtest on AArch64 passes with no regressions.
> 
> gcc/ChangeLog:
> 
> 	* timevar.cc (get_time): Make this noinline to avoid fusing
> 	behaviour and associated test flakyness.

I don't think it's correct.  It will break bootstrapping GCC from other
ISO C++11 compilers, you need to at least guard it with #ifdef __GNUC__.
And IMO it's just hiding the real problem.

We need more info of the "particular machine".  Is this a hardware bug
(i.e. the machine violates the AArch64 spec) or a GCC code generation
issue?  Or should we generally use -ffp-contract=off in BOOT_CFLAGS?

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Reduce floating-point difficulties in timevar.cc
  2023-07-21 12:47 ` Richard Biener
@ 2023-07-21 13:11   ` Matthew Malcomson
  2023-07-21 13:41     ` Richard Biener
  2023-07-21 13:45     ` Xi Ruoyao
  2023-08-03 13:38   ` [PATCH] mid-end: Use integral time intervals " Matthew Malcomson
  1 sibling, 2 replies; 18+ messages in thread
From: Matthew Malcomson @ 2023-07-21 13:11 UTC (permalink / raw)
  To: Richard Biener, Xi Ruoyao; +Cc: gcc-patches, richard.sandiford

Responding to two emails at the same time ;-)

On 7/21/23 13:47, Richard Biener wrote:
> On Fri, 21 Jul 2023, Matthew Malcomson wrote:
> 
>> On some AArch64 bootstrapped builds, we were getting a flaky test
>> because the floating point operations in `get_time` were being fused
>> with the floating point operations in `timevar_accumulate`.
>>
>> This meant that the rounding behaviour of our multiplication with
>> `ticks_to_msec` was different when used in `timer::start` and when
>> performed in `timer::stop`.  These extra inaccuracies led to the
>> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
>>
>> This change ensures those operations are not fused and hence stops the test
>> being flaky on that particular machine.  There is no expected change in the
>> generated code.
>> Bootstrap & regtest on AArch64 passes with no regressions.
> 
> I think this is undesriable.  With fused you mean we use FMA?
> I think you could use -ffp-contract=off for the TU instead.

Yeah -- we used fused multiply subtract because we combined the multiply 
in `get_time` with the subtract in `timevar_accumulate`.

> 
> Note you can't use __attribute__((noinline)) literally since the
> host compiler might not support this.
> 
> Richard.
> 

On 7/21/23 13:49, Xi Ruoyao wrote:
...
> I don't think it's correct.  It will break bootstrapping GCC from other
> ISO C++11 compilers, you need to at least guard it with #ifdef __GNUC__.
> And IMO it's just hiding the real problem.
> 
> We need more info of the "particular machine".  Is this a hardware bug
> (i.e. the machine violates the AArch64 spec) or a GCC code generation
> issue?  Or should we generally use -ffp-contract=off in BOOT_CFLAGS?
> 

My understanding is that this is not a hardware bug and that it's 
specified that rounding does not happen on the multiply "sub-part" in 
`FNMSUB`, but rounding happens on the `FMUL` that generates some input 
to it.

I was given to understand from discussions with others that this codegen 
is allowed -- though I honestly didn't confirm the line of reasoning 
through all the relevant standards.


------------------------
W.r.t. both:
Thanks for pointing out bootstrapping from other ISO C++ compilers -- 
(didn't realise that was a concern).

I can look into `-ffp-contract=off` as you both have recommended.
One question -- if we have concerns that the host compiler may not be 
able to handle `attribute((noinline))` would we also be concerned that 
this flag may not be supported?
(Or is the severity of lack of support sufficiently different in the two 
cases that this is fine -- i.e. not compile vs may trigger floating 
point rounding inaccuracies?)



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Reduce floating-point difficulties in timevar.cc
  2023-07-21 13:11   ` Matthew Malcomson
@ 2023-07-21 13:41     ` Richard Biener
  2023-07-21 13:45     ` Xi Ruoyao
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Biener @ 2023-07-21 13:41 UTC (permalink / raw)
  To: Matthew Malcomson; +Cc: Xi Ruoyao, gcc-patches, richard.sandiford



> Am 21.07.2023 um 15:12 schrieb Matthew Malcomson <matthew.malcomson@arm.com>:
> 
> Responding to two emails at the same time ;-)
> 
>> On 7/21/23 13:47, Richard Biener wrote:
>>> On Fri, 21 Jul 2023, Matthew Malcomson wrote:
>>> On some AArch64 bootstrapped builds, we were getting a flaky test
>>> because the floating point operations in `get_time` were being fused
>>> with the floating point operations in `timevar_accumulate`.
>>> 
>>> This meant that the rounding behaviour of our multiplication with
>>> `ticks_to_msec` was different when used in `timer::start` and when
>>> performed in `timer::stop`.  These extra inaccuracies led to the
>>> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
>>> 
>>> This change ensures those operations are not fused and hence stops the test
>>> being flaky on that particular machine.  There is no expected change in the
>>> generated code.
>>> Bootstrap & regtest on AArch64 passes with no regressions.
>> I think this is undesriable.  With fused you mean we use FMA?
>> I think you could use -ffp-contract=off for the TU instead.
> 
> Yeah -- we used fused multiply subtract because we combined the multiply in `get_time` with the subtract in `timevar_accumulate`.
> 
>> Note you can't use __attribute__((noinline)) literally since the
>> host compiler might not support this.
>> Richard.
> 
> On 7/21/23 13:49, Xi Ruoyao wrote:
> ...
>> I don't think it's correct.  It will break bootstrapping GCC from other
>> ISO C++11 compilers, you need to at least guard it with #ifdef __GNUC__.
>> And IMO it's just hiding the real problem.
>> We need more info of the "particular machine".  Is this a hardware bug
>> (i.e. the machine violates the AArch64 spec) or a GCC code generation
>> issue?  Or should we generally use -ffp-contract=off in BOOT_CFLAGS?
> 
> My understanding is that this is not a hardware bug and that it's specified that rounding does not happen on the multiply "sub-part" in `FNMSUB`, but rounding happens on the `FMUL` that generates some input to it.
> 
> I was given to understand from discussions with others that this codegen is allowed -- though I honestly didn't confirm the line of reasoning through all the relevant standards.
> 
> 
> ------------------------
> W.r.t. both:
> Thanks for pointing out bootstrapping from other ISO C++ compilers -- (didn't realise that was a concern).
> 
> I can look into `-ffp-contract=off` as you both have recommended.
> One question -- if we have concerns that the host compiler may not be able to handle `attribute((noinline))` would we also be concerned that this flag may not be supported?
> (Or is the severity of lack of support sufficiently different in the two cases that this is fine -- i.e. not compile vs may trigger floating point rounding inaccuracies?)

I’d only use it in stage2+ flags where we know we’re dealing with GCC 

Richard 

> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Reduce floating-point difficulties in timevar.cc
  2023-07-21 13:11   ` Matthew Malcomson
  2023-07-21 13:41     ` Richard Biener
@ 2023-07-21 13:45     ` Xi Ruoyao
  2023-07-21 13:58       ` Alexander Monakov
  2023-07-21 15:24       ` Matthew Malcomson
  1 sibling, 2 replies; 18+ messages in thread
From: Xi Ruoyao @ 2023-07-21 13:45 UTC (permalink / raw)
  To: Matthew Malcomson, Richard Biener; +Cc: gcc-patches, richard.sandiford

On Fri, 2023-07-21 at 14:11 +0100, Matthew Malcomson wrote:
> My understanding is that this is not a hardware bug and that it's 
> specified that rounding does not happen on the multiply "sub-part" in 
> `FNMSUB`, but rounding happens on the `FMUL` that generates some input
> to it.

AFAIK the C standard does only say "A floating *expression* may be
contracted".  I.e:

double r = a * b + c;

may be compiled to use FMA because "a * b + c" is a floating point
expression.  But

double t = a * b;
double r = t + c;

is not, because "a * b" and "t + c" are two separate floating point
expressions.

So a contraction across two functions is not allowed.  We now have -ffp-
contract=on (https://gcc.gnu.org/r14-2023) to only allow C-standard
contractions.

Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
are building GCC 14 snapshot).  The default is "fast" (if no -std=
option is used), which allows some contractions disallowed by the
standard.

But GCC is in C++ and I'm not sure if the C++ standard has the same
definition for allowed contractions as C.

> I can look into `-ffp-contract=off` as you both have recommended.
> One question -- if we have concerns that the host compiler may not be 
> able to handle `attribute((noinline))` would we also be concerned that
> this flag may not be supported?

Only use it in BOOT_CFLAGS, i. e. 'make BOOT_CFLAGS="-O2 -g -ffp-
contract=on"' (or "off" instead of "on").  In 3-stage bootstrapping it's
only applied in stage 2 and 3, during which GCC is compiled by itself.

> (Or is the severity of lack of support sufficiently different in the two 
> cases that this is fine -- i.e. not compile vs may trigger floating 
> point rounding inaccuracies?)

It's possible that the test itself is flaky.  Can you provide some
detail about how it fails?

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Reduce floating-point difficulties in timevar.cc
  2023-07-21 13:45     ` Xi Ruoyao
@ 2023-07-21 13:58       ` Alexander Monakov
  2023-07-21 15:11         ` Xi Ruoyao
  2023-07-21 15:24       ` Matthew Malcomson
  1 sibling, 1 reply; 18+ messages in thread
From: Alexander Monakov @ 2023-07-21 13:58 UTC (permalink / raw)
  To: Xi Ruoyao
  Cc: Matthew Malcomson, Richard Biener, gcc-patches, richard.sandiford


On Fri, 21 Jul 2023, Xi Ruoyao via Gcc-patches wrote:

> Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
> are building GCC 14 snapshot).  The default is "fast" (if no -std=
> option is used), which allows some contractions disallowed by the
> standard.

Not fully, see below.

> But GCC is in C++ and I'm not sure if the C++ standard has the same
> definition for allowed contractions as C.

It doesn't, but in GCC we should aim to provide the same semantics in C++
as in C.

> > (Or is the severity of lack of support sufficiently different in the two 
> > cases that this is fine -- i.e. not compile vs may trigger floating 
> > point rounding inaccuracies?)
> 
> It's possible that the test itself is flaky.  Can you provide some
> detail about how it fails?

See also PR 99903 for an earlier known issue which appears due to x87
excess precision and so tweaking -ffp-contract wouldn't help:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903

Now that multiple platforms are hitting this, can we _please_ get rid
of the questionable attempt to compute time in a floating-point variable
and just use an uint64_t storing nanoseconds?

Alexander

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Reduce floating-point difficulties in timevar.cc
  2023-07-21 13:58       ` Alexander Monakov
@ 2023-07-21 15:11         ` Xi Ruoyao
  2023-07-21 15:56           ` Alexander Monakov
  0 siblings, 1 reply; 18+ messages in thread
From: Xi Ruoyao @ 2023-07-21 15:11 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Matthew Malcomson, Richard Biener, gcc-patches, richard.sandiford

On Fri, 2023-07-21 at 16:58 +0300, Alexander Monakov wrote:
> 
> On Fri, 21 Jul 2023, Xi Ruoyao via Gcc-patches wrote:
> 
> > Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
> > are building GCC 14 snapshot).  The default is "fast" (if no -std=
> > option is used), which allows some contractions disallowed by the
> > standard.
> 
> Not fully, see below.
> 
> > But GCC is in C++ and I'm not sure if the C++ standard has the same
> > definition for allowed contractions as C.
> 
> It doesn't, but in GCC we should aim to provide the same semantics in C++
> as in C.
> 
> > > (Or is the severity of lack of support sufficiently different in the two 
> > > cases that this is fine -- i.e. not compile vs may trigger floating 
> > > point rounding inaccuracies?)
> > 
> > It's possible that the test itself is flaky.  Can you provide some
> > detail about how it fails?
> 
> See also PR 99903 for an earlier known issue which appears due to x87
> excess precision and so tweaking -ffp-contract wouldn't help:
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903

Does it affect AArch64 too?

> Now that multiple platforms are hitting this, can we _please_ get rid
> of the questionable attempt to compute time in a floating-point variable
> and just use an uint64_t storing nanoseconds?

To me this is the correct thing to do.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Reduce floating-point difficulties in timevar.cc
  2023-07-21 13:45     ` Xi Ruoyao
  2023-07-21 13:58       ` Alexander Monakov
@ 2023-07-21 15:24       ` Matthew Malcomson
  1 sibling, 0 replies; 18+ messages in thread
From: Matthew Malcomson @ 2023-07-21 15:24 UTC (permalink / raw)
  To: Xi Ruoyao, Richard Biener; +Cc: gcc-patches, richard.sandiford

On 7/21/23 14:45, Xi Ruoyao wrote:
> On Fri, 2023-07-21 at 14:11 +0100, Matthew Malcomson wrote:
>> My understanding is that this is not a hardware bug and that it's
>> specified that rounding does not happen on the multiply "sub-part" in
>> `FNMSUB`, but rounding happens on the `FMUL` that generates some input
>> to it.
> 
> AFAIK the C standard does only say "A floating *expression* may be
> contracted".  I.e:
> 
> double r = a * b + c;
> 
> may be compiled to use FMA because "a * b + c" is a floating point
> expression.  But
> 
> double t = a * b;
> double r = t + c;
> 
> is not, because "a * b" and "t + c" are two separate floating point
> expressions.
> 
> So a contraction across two functions is not allowed.  We now have -ffp-
> contract=on (https://gcc.gnu.org/r14-2023) to only allow C-standard
> contractions.
> 
> Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
> are building GCC 14 snapshot).  The default is "fast" (if no -std=
> option is used), which allows some contractions disallowed by the
> standard.
> 
> But GCC is in C++ and I'm not sure if the C++ standard has the same
> definition for allowed contractions as C.
> 

Thanks -- I'll look into whether `-ffp-contract=on` works.

> 
> It's possible that the test itself is flaky.  Can you provide some
> detail about how it fails?
> 

Sure -- The outline is that `timer::validate_phases` sees the sum of 
sub-part timers as greater than the timer for the "overall" time 
(outside of a tolerance of 1.000001).  It then complains and hits 
`gcc_unreachable()`.

While I found it difficult to get enough information out of the test 
that is run in the testsuite, I found that if passing an invalid 
argument to `cc1plus` all sub-parts would be zero, and sometimes the 
"total" would be negative.

This was due to the `times` syscall returning the same clock tick for 
start and end of the "total" timer and the difference in rounding 
between FNMSUB and FMUL means that depending on what that clock tick is 
the "elapsed time" can end up calculated as negative.

I didn't proove it 100% but I believe the same fundamental difference 
(but opposite rounding error) could trigger the testsuite failure -- if 
the "end" of one sub-phase timer is greater than the "start" of another 
sub-phase timer then sum of parts could be greater than total.

There is a "tolerance" in this test that I considered increasing, but 
since that would not affect the "invalid arguments" thing (where the 
total is negative and hence the tolerance multiplication of 1.000001 
would have to be supplemented by a positive offset) I suggested avoiding 
the inline.

W.r.t. the x86 bug that Alexander Monakov has pointed to, it's a very 
similar thing but in this case the problem is not bit-precision of 
values after the inlining, but rather a difference between fused and not 
fused operations after the inlining.

Agreed that using integral arithmetic is the more robust solution.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Reduce floating-point difficulties in timevar.cc
  2023-07-21 15:11         ` Xi Ruoyao
@ 2023-07-21 15:56           ` Alexander Monakov
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Monakov @ 2023-07-21 15:56 UTC (permalink / raw)
  To: Xi Ruoyao
  Cc: Matthew Malcomson, Richard Biener, gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 551 bytes --]


On Fri, 21 Jul 2023, Xi Ruoyao wrote:

> > See also PR 99903 for an earlier known issue which appears due to x87
> > excess precision and so tweaking -ffp-contract wouldn't help:
> > 
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903
> 
> Does it affect AArch64 too?

Well, not literally (AArch64 doesn't have excess precision), but absence
of intermediate rounding in FMA is similar to excess precision.

I'm saying it's the same issue manifesting via different pathways on x86
and aarch64. Sorry if I misunderstood your question.

Alexander

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Reduce floating-point difficulties in timevar.cc
  2023-07-21 12:11 [PATCH] Reduce floating-point difficulties in timevar.cc Matthew Malcomson
  2023-07-21 12:47 ` Richard Biener
  2023-07-21 12:49 ` [PATCH] Reduce floating-point difficulties " Xi Ruoyao
@ 2023-07-21 17:41 ` Andrew Pinski
  2 siblings, 0 replies; 18+ messages in thread
From: Andrew Pinski @ 2023-07-21 17:41 UTC (permalink / raw)
  To: Matthew Malcomson; +Cc: gcc-patches, richard.sandiford, rguenther

On Fri, Jul 21, 2023 at 5:13 AM Matthew Malcomson via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On some AArch64 bootstrapped builds, we were getting a flaky test
> because the floating point operations in `get_time` were being fused
> with the floating point operations in `timevar_accumulate`.
>
> This meant that the rounding behaviour of our multiplication with
> `ticks_to_msec` was different when used in `timer::start` and when
> performed in `timer::stop`.  These extra inaccuracies led to the
> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
>
> This change ensures those operations are not fused and hence stops the test
> being flaky on that particular machine.  There is no expected change in the
> generated code.
> Bootstrap & regtest on AArch64 passes with no regressions.

Oh this does explain why powerpc also sees it: https://gcc.gnu.org/PR110316 .
I wonder if not adding noinline here but rather changing the code to
tolerate the fused multiple-subtract instead
which is kinda related to what I suggested in comment #1 in PR 110316.

Thanks,
Andrew

>
> gcc/ChangeLog:
>
>         * timevar.cc (get_time): Make this noinline to avoid fusing
>         behaviour and associated test flakyness.
>
>
> N.b. I didn't know who to include as reviewer -- guessed Richard Biener as the
> global reviewer that had the most contributions to this file and Richard
> Sandiford since I've asked him for reviews a lot in the past.
>
>
> ###############     Attachment also inlined for ease of reply    ###############
>
>
> diff --git a/gcc/timevar.cc b/gcc/timevar.cc
> index d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18 100644
> --- a/gcc/timevar.cc
> +++ b/gcc/timevar.cc
> @@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const timevar_time_def *total)
>     HAVE_WALL_TIME macros.  */
>
>  static void
> +__attribute__((noinline))
>  get_time (struct timevar_time_def *now)
>  {
>    now->user = 0;
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] mid-end: Use integral time intervals in timevar.cc
  2023-07-21 12:47 ` Richard Biener
  2023-07-21 13:11   ` Matthew Malcomson
@ 2023-08-03 13:38   ` Matthew Malcomson
  2023-08-03 14:09     ` David Malcolm
  2023-08-04  6:44     ` Richard Biener
  1 sibling, 2 replies; 18+ messages in thread
From: Matthew Malcomson @ 2023-08-03 13:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, richard.sandiford, Xi Ruoyao, Andrew Pinski

[-- Attachment #1: Type: text/plain, Size: 13589 bytes --]

> 
> I think this is undesriable.  With fused you mean we use FMA?
> I think you could use -ffp-contract=off for the TU instead.
> 
> Note you can't use __attribute__((noinline)) literally since the
> host compiler might not support this.
> 
> Richard.
> 


Trying to make the timevar store integral time intervals.
Hope this is acceptable -- I had originally planned to use
`-ffp-contract` as agreed until I saw the email mentioning the old x86
bug in the same area which was not to do with floating point contraction
of operations (PR 99903) and figured it would be better to try and solve
both at the same time while making things in general a bit more robust.
________________________________________________________________________


On some AArch64 bootstrapped builds, we were getting a flaky test
because the floating point operations in `get_time` were being fused
with the floating point operations in `timevar_accumulate`.

This meant that the rounding behaviour of our multiplication with
`ticks_to_msec` was different when used in `timer::start` and when
performed in `timer::stop`.  These extra inaccuracies led to the
testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.

------------------------------
Avoiding the inlining which was agreed to be undesirable.  Three
alternative approaches:
1) Use `-ffp-contract=on` to avoid this particular optimisation.
2) Adjusting the code so that the "tolerance" is always of the order of
   a "tick".
3) Recording times and elapsed differences in integral values.
   - Could be in terms of a standard measurement (e.g. nanoseconds or
     microseconds).
   - Could be in terms of whatever integral value ("ticks" /
     seconds&microseconds / "clock ticks") is returned from the syscall
     chosen at configure time.

While `-ffp-contract=on` removes the problem that I bumped into, there
has been a similar bug on x86 that was to do with a different floating
point problem that also happens after `get_time` and
`timevar_accumulate` both being inlined into the same function.  Hence
it seems worth choosing a different approach.

Of the two other solutions, recording measurements in integral values
seems the most robust against slightly "off" measurements being
presented to the user -- even though it could avoid the ICE that creates
a flaky test.

I considered storing time in whatever units our syscall returns and
normalising them at the time we print out rather than normalising them
to nanoseconds at the point we record our "current time".  The logic
being that normalisation could have some rounding affect (e.g. if
TICKS_PER_SECOND is 3) that would be taken into account in calculations.

I decided against it in order to give the values recorded in
`timevar_time_def` some interpretive value so it's easier to read the
code.  Compared to the small rounding that would represent a tiny amount
of time and AIUI can not trigger the same kind of ICE's as we are
attempting to fix, said interpretive value seems more valuable.

Recording time in microseconds seemed reasonable since all obvious
values for ticks and `getrusage` are at microsecond granularity or less
precise.  That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are both
variables given to use by the host system I was not sure of that enough
to make this decision.

------------------------------
timer::all_zero is ignoring rows which are inconsequential to the user
and would be printed out as all zeros.  Since upon printing rows we
convert to the same double value and print out the same precision as
before, we return true/false based on the same amount of time as before.

timer::print_row casts to a floating point measurement in units of
seconds as was printed out before.

timer::validate_phases -- I'm printing out nanoseconds here rather than
floating point seconds since this is an error message for when things
have "gone wrong" printing out the actual nanoseconds that have been
recorded seems like the best approach.
N.b. since we now print out nanoseconds instead of floating point value
the padding requirements are different.  Originally we were padding to
24 characters and printing 18 decimal places.  This looked odd with the
now visually smaller values getting printed.  I judged 13 characters
(corresponding to 2 hours) to be a reasonable point at which our
alignment could start to degrade and this provides a more compact output
for the majority of cases (checked by triggering the error case via
GDB).

------------------------------
N.b. I use a literal 1000000000 for "NANOSEC_PER_SEC".  I believe this
would fit in an integer on all hosts that GCC supports, but am not
certain there are not strange integer sizes we support hence am pointing
it out for special attention during review.

------------------------------
No expected change in generated code.
Bootstrapped and regtested on AArch64 with no regressions.
Manually checked that flaky test is no longer flaky on the machine it
was seen before.

gcc/ChangeLog:

	PR middle-end/110316
	PR middle-end/9903
	* timevar.cc (NANOSEC_PER_SEC, TICKS_TO_NANOSEC,
	CLOCKS_TO_NANOSEC, nanosec_to_floating_sec, percent_of): New.
	(TICKS_TO_MSEC, CLOCKS_TO_MSEC): Remove these macros.
	(timer::validate_phases): Use integral arithmetic to check
	validity.
	(timer::print_row, timer::print): Convert from integral
	nanoseconds to floating	point seconds before printing.
	(timer::all_zero): Change limit to nanosec count instead of
	fractional count of seconds.
	* timevar.h (struct timevar_time_def): Update all measurements
	to use uint64_t nanoseconds rather than seconds stored in a
	double.



###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/timevar.h b/gcc/timevar.h
index ad46573160957c6461190204c5181ec9e904e204..882954a527e11a90873f25a45e9073149709dd05 100644
--- a/gcc/timevar.h
+++ b/gcc/timevar.h
@@ -44,7 +44,7 @@
 */
 
 /* This structure stores the various varieties of time that can be
-   measured.  Times are stored in seconds.  The time may be an
+   measured.  Times are stored in nanoseconds.  The time may be an
    absolute time or a time difference; in the former case, the time
    base is undefined, except that the difference between two times
    produces a valid time difference.  */
@@ -52,14 +52,13 @@
 struct timevar_time_def
 {
   /* User time in this process.  */
-  double user;
+  uint64_t user;
 
-  /* System time (if applicable for this host platform) in this
-     process.  */
-  double sys;
+  /* System time (if applicable for this host platform) in this process.  */
+  uint64_t sys;
 
   /* Wall clock time.  */
-  double wall;
+  uint64_t wall;
 
   /* Garbage collector memory.  */
   size_t ggc_mem;
diff --git a/gcc/timevar.cc b/gcc/timevar.cc
index d695297aae7f6b2a6de01a37fe86c2a232338df0..c6094b1c336c30e5a404ff8fd5c79795edbd59f7 100644
--- a/gcc/timevar.cc
+++ b/gcc/timevar.cc
@@ -42,7 +42,7 @@ struct tms
 # define RUSAGE_SELF 0
 #endif
 
-/* Calculation of scale factor to convert ticks to microseconds.
+/* Calculation of scale factor to convert ticks to seconds.
    We mustn't use CLOCKS_PER_SEC except with clock().  */
 #if HAVE_SYSCONF && defined _SC_CLK_TCK
 # define TICKS_PER_SECOND sysconf (_SC_CLK_TCK) /* POSIX 1003.1-1996 */
@@ -91,14 +91,15 @@ struct tms
    the underlying constants, and that can be very slow, so we have to
    precompute them.  Whose wonderful idea was it to make all those
    _constants_ variable at run time, anyway?  */
+#define NANOSEC_PER_SEC 1000000000
 #ifdef USE_TIMES
-static double ticks_to_msec;
-#define TICKS_TO_MSEC (1 / (double)TICKS_PER_SECOND)
+static uint64_t ticks_to_nanosec;
+#define TICKS_TO_NANOSEC (NANOSEC_PER_SEC / TICKS_PER_SECOND)
 #endif
 
 #ifdef USE_CLOCK
-static double clocks_to_msec;
-#define CLOCKS_TO_MSEC (1 / (double)CLOCKS_PER_SEC)
+static uint64_t clocks_to_nanosec;
+#define CLOCKS_TO_NANOSEC (NANOSEC_PER_SEC / CLOCKS_PER_SEC)
 #endif
 
 /* Non-NULL if timevars should be used.  In GCC, this happens with
@@ -222,18 +223,20 @@ get_time (struct timevar_time_def *now)
   {
 #ifdef USE_TIMES
     struct tms tms;
-    now->wall = times (&tms)  * ticks_to_msec;
-    now->user = tms.tms_utime * ticks_to_msec;
-    now->sys  = tms.tms_stime * ticks_to_msec;
+    now->wall = times (&tms)  * ticks_to_nanosec;
+    now->user = tms.tms_utime * ticks_to_nanosec;
+    now->sys  = tms.tms_stime * ticks_to_nanosec;
 #endif
 #ifdef USE_GETRUSAGE
     struct rusage rusage;
     getrusage (RUSAGE_SELF, &rusage);
-    now->user = rusage.ru_utime.tv_sec + rusage.ru_utime.tv_usec * 1e-6;
-    now->sys  = rusage.ru_stime.tv_sec + rusage.ru_stime.tv_usec * 1e-6;
+    now->user = rusage.ru_utime.tv_sec * NANOSEC_PER_SEC
+		+ rusage.ru_utime.tv_usec * 1000;
+    now->sys  = rusage.ru_stime.tv_sec * NANOSEC_PER_SEC
+		+ rusage.ru_stime.tv_usec * 1000;
 #endif
 #ifdef USE_CLOCK
-    now->user = clock () * clocks_to_msec;
+    now->user = clock () * clocks_to_nanosec;
 #endif
   }
 }
@@ -271,10 +274,10 @@ timer::timer () :
   /* Initialize configuration-specific state.
      Ideally this would be one-time initialization.  */
 #ifdef USE_TIMES
-  ticks_to_msec = TICKS_TO_MSEC;
+  ticks_to_nanosec = TICKS_TO_NANOSEC;
 #endif
 #ifdef USE_CLOCK
-  clocks_to_msec = CLOCKS_TO_MSEC;
+  clocks_to_nanosec = CLOCKS_TO_NANOSEC;
 #endif
 }
 
@@ -583,12 +586,11 @@ timer::validate_phases (FILE *fp) const
 {
   unsigned int /* timevar_id_t */ id;
   const timevar_time_def *total = &m_timevars[TV_TOTAL].elapsed;
-  double phase_user = 0.0;
-  double phase_sys = 0.0;
-  double phase_wall = 0.0;
+  uint64_t phase_user = 0;
+  uint64_t phase_sys = 0;
+  uint64_t phase_wall = 0;
   size_t phase_ggc_mem = 0;
   static char phase_prefix[] = "phase ";
-  const double tolerance = 1.000001;  /* One part in a million.  */
 
   for (id = 0; id < (unsigned int) TIMEVAR_LAST; ++id)
     {
@@ -607,26 +609,32 @@ timer::validate_phases (FILE *fp) const
 	}
     }
 
-  if (phase_user > total->user * tolerance
-      || phase_sys > total->sys * tolerance
-      || phase_wall > total->wall * tolerance
-      || phase_ggc_mem > total->ggc_mem * tolerance)
+  if (phase_user > total->user
+      || phase_sys > total->sys
+      || phase_wall > total->wall
+      || phase_ggc_mem > total->ggc_mem)
     {
 
       fprintf (fp, "Timing error: total of phase timers exceeds total time.\n");
       if (phase_user > total->user)
-	fprintf (fp, "user    %24.18e > %24.18e\n", phase_user, total->user);
+	fprintf (fp, "user    %13" PRIu64 " > %13" PRIu64 "\n",
+		 phase_user, total->user);
       if (phase_sys > total->sys)
-	fprintf (fp, "sys     %24.18e > %24.18e\n", phase_sys, total->sys);
+	fprintf (fp, "sys     %13" PRIu64 " > %13" PRIu64 "\n",
+		 phase_sys, total->sys);
       if (phase_wall > total->wall)
-	fprintf (fp, "wall    %24.18e > %24.18e\n", phase_wall, total->wall);
+	fprintf (fp, "wall    %13" PRIu64 " > %13" PRIu64 "\n",
+		 phase_wall, total->wall);
       if (phase_ggc_mem > total->ggc_mem)
-	fprintf (fp, "ggc_mem %24lu > %24lu\n", (unsigned long)phase_ggc_mem,
+	fprintf (fp, "ggc_mem %13lu > %13lu\n", (unsigned long)phase_ggc_mem,
 		 (unsigned long)total->ggc_mem);
       gcc_unreachable ();
     }
 }
 
+#define nanosec_to_floating_sec(NANO) ((double)(NANO) * 1e-9)
+#define percent_of(TOTAL, SUBTOTAL) \
+  ((TOTAL) == 0 ? 0 : ((double)SUBTOTAL / TOTAL) * 100)
 /* Helper function for timer::print.  */
 
 void
@@ -640,22 +648,22 @@ timer::print_row (FILE *fp,
 #ifdef HAVE_USER_TIME
   /* Print user-mode time for this process.  */
   fprintf (fp, "%7.2f (%3.0f%%)",
-	   elapsed.user,
-	   (total->user == 0 ? 0 : elapsed.user / total->user) * 100);
+	   nanosec_to_floating_sec (elapsed.user),
+	   percent_of (total->user, elapsed.user));
 #endif /* HAVE_USER_TIME */
 
 #ifdef HAVE_SYS_TIME
   /* Print system-mode time for this process.  */
   fprintf (fp, "%7.2f (%3.0f%%)",
-	   elapsed.sys,
-	   (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100);
+	   nanosec_to_floating_sec (elapsed.sys),
+	   percent_of (total->sys, elapsed.sys));
 #endif /* HAVE_SYS_TIME */
 
 #ifdef HAVE_WALL_TIME
   /* Print wall clock time elapsed.  */
   fprintf (fp, "%7.2f (%3.0f%%)",
-	   elapsed.wall,
-	   (total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100);
+	   nanosec_to_floating_sec (elapsed.wall),
+	   percent_of (total->wall, elapsed.wall));
 #endif /* HAVE_WALL_TIME */
 
   /* Print the amount of ggc memory allocated.  */
@@ -673,7 +681,8 @@ timer::print_row (FILE *fp,
 bool
 timer::all_zero (const timevar_time_def &elapsed)
 {
-  const double tiny = 5e-3;
+  /* 5000000 nanosec == 5e-3 seconds.  */
+  uint64_t tiny = 5000000;
   return (elapsed.user < tiny
 	  && elapsed.sys < tiny
 	  && elapsed.wall < tiny
@@ -766,13 +775,13 @@ timer::print (FILE *fp)
   /* Print total time.  */
   fprintf (fp, " %-35s:", "TOTAL");
 #ifdef HAVE_USER_TIME
-  fprintf (fp, "%7.2f      ", total->user);
+  fprintf (fp, "%7.2f      ", nanosec_to_floating_sec (total->user));
 #endif
 #ifdef HAVE_SYS_TIME
-  fprintf (fp, "%8.2f      ", total->sys);
+  fprintf (fp, "%8.2f      ", nanosec_to_floating_sec (total->sys));
 #endif
 #ifdef HAVE_WALL_TIME
-  fprintf (fp, "%8.2f      ", total->wall);
+  fprintf (fp, "%8.2f      ", nanosec_to_floating_sec (total->wall));
 #endif
   fprintf (fp, PRsa (7) "\n", SIZE_AMOUNT (total->ggc_mem));
 
@@ -790,6 +799,8 @@ timer::print (FILE *fp)
 
   validate_phases (fp);
 }
+#undef nanosec_to_floating_sec
+#undef percent_of
 
 /* Get the name of the topmost item.  For use by jit for validating
    inputs to gcc_jit_timer_pop.  */




[-- Attachment #2: timevar-integral-time-intervals.patch --]
[-- Type: text/plain, Size: 7892 bytes --]

diff --git a/gcc/timevar.h b/gcc/timevar.h
index ad46573160957c6461190204c5181ec9e904e204..882954a527e11a90873f25a45e9073149709dd05 100644
--- a/gcc/timevar.h
+++ b/gcc/timevar.h
@@ -44,7 +44,7 @@
 */
 
 /* This structure stores the various varieties of time that can be
-   measured.  Times are stored in seconds.  The time may be an
+   measured.  Times are stored in nanoseconds.  The time may be an
    absolute time or a time difference; in the former case, the time
    base is undefined, except that the difference between two times
    produces a valid time difference.  */
@@ -52,14 +52,13 @@
 struct timevar_time_def
 {
   /* User time in this process.  */
-  double user;
+  uint64_t user;
 
-  /* System time (if applicable for this host platform) in this
-     process.  */
-  double sys;
+  /* System time (if applicable for this host platform) in this process.  */
+  uint64_t sys;
 
   /* Wall clock time.  */
-  double wall;
+  uint64_t wall;
 
   /* Garbage collector memory.  */
   size_t ggc_mem;
diff --git a/gcc/timevar.cc b/gcc/timevar.cc
index d695297aae7f6b2a6de01a37fe86c2a232338df0..c6094b1c336c30e5a404ff8fd5c79795edbd59f7 100644
--- a/gcc/timevar.cc
+++ b/gcc/timevar.cc
@@ -42,7 +42,7 @@ struct tms
 # define RUSAGE_SELF 0
 #endif
 
-/* Calculation of scale factor to convert ticks to microseconds.
+/* Calculation of scale factor to convert ticks to seconds.
    We mustn't use CLOCKS_PER_SEC except with clock().  */
 #if HAVE_SYSCONF && defined _SC_CLK_TCK
 # define TICKS_PER_SECOND sysconf (_SC_CLK_TCK) /* POSIX 1003.1-1996 */
@@ -91,14 +91,15 @@ struct tms
    the underlying constants, and that can be very slow, so we have to
    precompute them.  Whose wonderful idea was it to make all those
    _constants_ variable at run time, anyway?  */
+#define NANOSEC_PER_SEC 1000000000
 #ifdef USE_TIMES
-static double ticks_to_msec;
-#define TICKS_TO_MSEC (1 / (double)TICKS_PER_SECOND)
+static uint64_t ticks_to_nanosec;
+#define TICKS_TO_NANOSEC (NANOSEC_PER_SEC / TICKS_PER_SECOND)
 #endif
 
 #ifdef USE_CLOCK
-static double clocks_to_msec;
-#define CLOCKS_TO_MSEC (1 / (double)CLOCKS_PER_SEC)
+static uint64_t clocks_to_nanosec;
+#define CLOCKS_TO_NANOSEC (NANOSEC_PER_SEC / CLOCKS_PER_SEC)
 #endif
 
 /* Non-NULL if timevars should be used.  In GCC, this happens with
@@ -222,18 +223,20 @@ get_time (struct timevar_time_def *now)
   {
 #ifdef USE_TIMES
     struct tms tms;
-    now->wall = times (&tms)  * ticks_to_msec;
-    now->user = tms.tms_utime * ticks_to_msec;
-    now->sys  = tms.tms_stime * ticks_to_msec;
+    now->wall = times (&tms)  * ticks_to_nanosec;
+    now->user = tms.tms_utime * ticks_to_nanosec;
+    now->sys  = tms.tms_stime * ticks_to_nanosec;
 #endif
 #ifdef USE_GETRUSAGE
     struct rusage rusage;
     getrusage (RUSAGE_SELF, &rusage);
-    now->user = rusage.ru_utime.tv_sec + rusage.ru_utime.tv_usec * 1e-6;
-    now->sys  = rusage.ru_stime.tv_sec + rusage.ru_stime.tv_usec * 1e-6;
+    now->user = rusage.ru_utime.tv_sec * NANOSEC_PER_SEC
+		+ rusage.ru_utime.tv_usec * 1000;
+    now->sys  = rusage.ru_stime.tv_sec * NANOSEC_PER_SEC
+		+ rusage.ru_stime.tv_usec * 1000;
 #endif
 #ifdef USE_CLOCK
-    now->user = clock () * clocks_to_msec;
+    now->user = clock () * clocks_to_nanosec;
 #endif
   }
 }
@@ -271,10 +274,10 @@ timer::timer () :
   /* Initialize configuration-specific state.
      Ideally this would be one-time initialization.  */
 #ifdef USE_TIMES
-  ticks_to_msec = TICKS_TO_MSEC;
+  ticks_to_nanosec = TICKS_TO_NANOSEC;
 #endif
 #ifdef USE_CLOCK
-  clocks_to_msec = CLOCKS_TO_MSEC;
+  clocks_to_nanosec = CLOCKS_TO_NANOSEC;
 #endif
 }
 
@@ -583,12 +586,11 @@ timer::validate_phases (FILE *fp) const
 {
   unsigned int /* timevar_id_t */ id;
   const timevar_time_def *total = &m_timevars[TV_TOTAL].elapsed;
-  double phase_user = 0.0;
-  double phase_sys = 0.0;
-  double phase_wall = 0.0;
+  uint64_t phase_user = 0;
+  uint64_t phase_sys = 0;
+  uint64_t phase_wall = 0;
   size_t phase_ggc_mem = 0;
   static char phase_prefix[] = "phase ";
-  const double tolerance = 1.000001;  /* One part in a million.  */
 
   for (id = 0; id < (unsigned int) TIMEVAR_LAST; ++id)
     {
@@ -607,26 +609,32 @@ timer::validate_phases (FILE *fp) const
 	}
     }
 
-  if (phase_user > total->user * tolerance
-      || phase_sys > total->sys * tolerance
-      || phase_wall > total->wall * tolerance
-      || phase_ggc_mem > total->ggc_mem * tolerance)
+  if (phase_user > total->user
+      || phase_sys > total->sys
+      || phase_wall > total->wall
+      || phase_ggc_mem > total->ggc_mem)
     {
 
       fprintf (fp, "Timing error: total of phase timers exceeds total time.\n");
       if (phase_user > total->user)
-	fprintf (fp, "user    %24.18e > %24.18e\n", phase_user, total->user);
+	fprintf (fp, "user    %13" PRIu64 " > %13" PRIu64 "\n",
+		 phase_user, total->user);
       if (phase_sys > total->sys)
-	fprintf (fp, "sys     %24.18e > %24.18e\n", phase_sys, total->sys);
+	fprintf (fp, "sys     %13" PRIu64 " > %13" PRIu64 "\n",
+		 phase_sys, total->sys);
       if (phase_wall > total->wall)
-	fprintf (fp, "wall    %24.18e > %24.18e\n", phase_wall, total->wall);
+	fprintf (fp, "wall    %13" PRIu64 " > %13" PRIu64 "\n",
+		 phase_wall, total->wall);
       if (phase_ggc_mem > total->ggc_mem)
-	fprintf (fp, "ggc_mem %24lu > %24lu\n", (unsigned long)phase_ggc_mem,
+	fprintf (fp, "ggc_mem %13lu > %13lu\n", (unsigned long)phase_ggc_mem,
 		 (unsigned long)total->ggc_mem);
       gcc_unreachable ();
     }
 }
 
+#define nanosec_to_floating_sec(NANO) ((double)(NANO) * 1e-9)
+#define percent_of(TOTAL, SUBTOTAL) \
+  ((TOTAL) == 0 ? 0 : ((double)SUBTOTAL / TOTAL) * 100)
 /* Helper function for timer::print.  */
 
 void
@@ -640,22 +648,22 @@ timer::print_row (FILE *fp,
 #ifdef HAVE_USER_TIME
   /* Print user-mode time for this process.  */
   fprintf (fp, "%7.2f (%3.0f%%)",
-	   elapsed.user,
-	   (total->user == 0 ? 0 : elapsed.user / total->user) * 100);
+	   nanosec_to_floating_sec (elapsed.user),
+	   percent_of (total->user, elapsed.user));
 #endif /* HAVE_USER_TIME */
 
 #ifdef HAVE_SYS_TIME
   /* Print system-mode time for this process.  */
   fprintf (fp, "%7.2f (%3.0f%%)",
-	   elapsed.sys,
-	   (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100);
+	   nanosec_to_floating_sec (elapsed.sys),
+	   percent_of (total->sys, elapsed.sys));
 #endif /* HAVE_SYS_TIME */
 
 #ifdef HAVE_WALL_TIME
   /* Print wall clock time elapsed.  */
   fprintf (fp, "%7.2f (%3.0f%%)",
-	   elapsed.wall,
-	   (total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100);
+	   nanosec_to_floating_sec (elapsed.wall),
+	   percent_of (total->wall, elapsed.wall));
 #endif /* HAVE_WALL_TIME */
 
   /* Print the amount of ggc memory allocated.  */
@@ -673,7 +681,8 @@ timer::print_row (FILE *fp,
 bool
 timer::all_zero (const timevar_time_def &elapsed)
 {
-  const double tiny = 5e-3;
+  /* 5000000 nanosec == 5e-3 seconds.  */
+  uint64_t tiny = 5000000;
   return (elapsed.user < tiny
 	  && elapsed.sys < tiny
 	  && elapsed.wall < tiny
@@ -766,13 +775,13 @@ timer::print (FILE *fp)
   /* Print total time.  */
   fprintf (fp, " %-35s:", "TOTAL");
 #ifdef HAVE_USER_TIME
-  fprintf (fp, "%7.2f      ", total->user);
+  fprintf (fp, "%7.2f      ", nanosec_to_floating_sec (total->user));
 #endif
 #ifdef HAVE_SYS_TIME
-  fprintf (fp, "%8.2f      ", total->sys);
+  fprintf (fp, "%8.2f      ", nanosec_to_floating_sec (total->sys));
 #endif
 #ifdef HAVE_WALL_TIME
-  fprintf (fp, "%8.2f      ", total->wall);
+  fprintf (fp, "%8.2f      ", nanosec_to_floating_sec (total->wall));
 #endif
   fprintf (fp, PRsa (7) "\n", SIZE_AMOUNT (total->ggc_mem));
 
@@ -790,6 +799,8 @@ timer::print (FILE *fp)
 
   validate_phases (fp);
 }
+#undef nanosec_to_floating_sec
+#undef percent_of
 
 /* Get the name of the topmost item.  For use by jit for validating
    inputs to gcc_jit_timer_pop.  */




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mid-end: Use integral time intervals in timevar.cc
  2023-08-03 13:38   ` [PATCH] mid-end: Use integral time intervals " Matthew Malcomson
@ 2023-08-03 14:09     ` David Malcolm
  2023-08-03 14:54       ` Matthew Malcomson
  2023-08-04  6:44     ` Richard Biener
  1 sibling, 1 reply; 18+ messages in thread
From: David Malcolm @ 2023-08-03 14:09 UTC (permalink / raw)
  To: Matthew Malcomson, Richard Biener
  Cc: gcc-patches, richard.sandiford, Xi Ruoyao, Andrew Pinski

On Thu, 2023-08-03 at 14:38 +0100, Matthew Malcomson via Gcc-patches
wrote:
> > 
> > I think this is undesriable.  With fused you mean we use FMA?
> > I think you could use -ffp-contract=off for the TU instead.
> > 
> > Note you can't use __attribute__((noinline)) literally since the
> > host compiler might not support this.
> > 
> > Richard.
> > 
> 
> 
> Trying to make the timevar store integral time intervals.
> Hope this is acceptable -- I had originally planned to use
> `-ffp-contract` as agreed until I saw the email mentioning the old
> x86
> bug in the same area which was not to do with floating point
> contraction
> of operations (PR 99903) and figured it would be better to try and
> solve
> both at the same time while making things in general a bit more
> robust.
> _____________________________________________________________________
> ___
> 
> 
> On some AArch64 bootstrapped builds, we were getting a flaky test
> because the floating point operations in `get_time` were being fused
> with the floating point operations in `timevar_accumulate`.
> 
> This meant that the rounding behaviour of our multiplication with
> `ticks_to_msec` was different when used in `timer::start` and when
> performed in `timer::stop`.  These extra inaccuracies led to the
> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
> 
> ------------------------------
> Avoiding the inlining which was agreed to be undesirable.  Three
> alternative approaches:
> 1) Use `-ffp-contract=on` to avoid this particular optimisation.
> 2) Adjusting the code so that the "tolerance" is always of the order
> of
>    a "tick".
> 3) Recording times and elapsed differences in integral values.
>    - Could be in terms of a standard measurement (e.g. nanoseconds or
>      microseconds).
>    - Could be in terms of whatever integral value ("ticks" /
>      seconds&microseconds / "clock ticks") is returned from the
> syscall
>      chosen at configure time.
> 
> While `-ffp-contract=on` removes the problem that I bumped into,
> there
> has been a similar bug on x86 that was to do with a different
> floating
> point problem that also happens after `get_time` and
> `timevar_accumulate` both being inlined into the same function. 
> Hence
> it seems worth choosing a different approach.
> 
> Of the two other solutions, recording measurements in integral values
> seems the most robust against slightly "off" measurements being
> presented to the user -- even though it could avoid the ICE that
> creates
> a flaky test.
> 
> I considered storing time in whatever units our syscall returns and
> normalising them at the time we print out rather than normalising
> them
> to nanoseconds at the point we record our "current time".  The logic
> being that normalisation could have some rounding affect (e.g. if
> TICKS_PER_SECOND is 3) that would be taken into account in
> calculations.
> 
> I decided against it in order to give the values recorded in
> `timevar_time_def` some interpretive value so it's easier to read the
> code.  Compared to the small rounding that would represent a tiny
> amount
> of time and AIUI can not trigger the same kind of ICE's as we are
> attempting to fix, said interpretive value seems more valuable.
> 
> Recording time in microseconds seemed reasonable since all obvious
> values for ticks and `getrusage` are at microsecond granularity or
> less
> precise.  That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are
> both
> variables given to use by the host system I was not sure of that
> enough
> to make this decision.
> 
> ------------------------------
> timer::all_zero is ignoring rows which are inconsequential to the
> user
> and would be printed out as all zeros.  Since upon printing rows we
> convert to the same double value and print out the same precision as
> before, we return true/false based on the same amount of time as
> before.
> 
> timer::print_row casts to a floating point measurement in units of
> seconds as was printed out before.
> 
> timer::validate_phases -- I'm printing out nanoseconds here rather
> than
> floating point seconds since this is an error message for when things
> have "gone wrong" printing out the actual nanoseconds that have been
> recorded seems like the best approach.
> N.b. since we now print out nanoseconds instead of floating point
> value
> the padding requirements are different.  Originally we were padding
> to
> 24 characters and printing 18 decimal places.  This looked odd with
> the
> now visually smaller values getting printed.  I judged 13 characters
> (corresponding to 2 hours) to be a reasonable point at which our
> alignment could start to degrade and this provides a more compact
> output
> for the majority of cases (checked by triggering the error case via
> GDB).
> 
> ------------------------------
> N.b. I use a literal 1000000000 for "NANOSEC_PER_SEC".  I believe
> this
> would fit in an integer on all hosts that GCC supports, but am not
> certain there are not strange integer sizes we support hence am
> pointing
> it out for special attention during review.
> 
> ------------------------------
> No expected change in generated code.
> Bootstrapped and regtested on AArch64 with no regressions.
> Manually checked that flaky test is no longer flaky on the machine it
> was seen before.
> 
> gcc/ChangeLog:
> 
>         PR middle-end/110316
>         PR middle-end/9903
>         * timevar.cc (NANOSEC_PER_SEC, TICKS_TO_NANOSEC,
>         CLOCKS_TO_NANOSEC, nanosec_to_floating_sec, percent_of): New.
>         (TICKS_TO_MSEC, CLOCKS_TO_MSEC): Remove these macros.
>         (timer::validate_phases): Use integral arithmetic to check
>         validity.
>         (timer::print_row, timer::print): Convert from integral
>         nanoseconds to floating point seconds before printing.
>         (timer::all_zero): Change limit to nanosec count instead of
>         fractional count of seconds.
>         * timevar.h (struct timevar_time_def): Update all
> measurements
>         to use uint64_t nanoseconds rather than seconds stored in a
>         double.
> 
> 
> 
> ###############     Attachment also inlined for ease of reply   
> ###############
> 
> 
> diff --git a/gcc/timevar.h b/gcc/timevar.h
> index
> ad46573160957c6461190204c5181ec9e904e204..882954a527e11a90873f25a45e9
> 073149709dd05 100644
> --- a/gcc/timevar.h
> +++ b/gcc/timevar.h
> @@ -44,7 +44,7 @@
>  */
>  
>  /* This structure stores the various varieties of time that can be
> -   measured.  Times are stored in seconds.  The time may be an
> +   measured.  Times are stored in nanoseconds.  The time may be an
>     absolute time or a time difference; in the former case, the time
>     base is undefined, except that the difference between two times
>     produces a valid time difference.  */
> @@ -52,14 +52,13 @@
>  struct timevar_time_def
>  {
>    /* User time in this process.  */
> -  double user;
> +  uint64_t user;
>  
> -  /* System time (if applicable for this host platform) in this
> -     process.  */
> -  double sys;
> +  /* System time (if applicable for this host platform) in this
> process.  */
> +  uint64_t sys;
>  
>    /* Wall clock time.  */
> -  double wall;
> +  uint64_t wall;
>  
>    /* Garbage collector memory.  */
>    size_t ggc_mem;

Hi Matthew.  I recently touched the timevar code (in r14-2881-
g75d623946d4b6e) to add support for serializing the timevar data in
JSON form as part of the SARIF output (PR analyzer/109361).

Looking at your patch, it looks like the baseline for the patch seems
to predate r14-2881-g75d623946d4b6e.

I don't have a strong opinion on the implementation choices in your
patch, but please can you rebase to beyond my recent change and make
sure that the SARIF serialization still works with your patch.

Specifically, please try compiling with
  -ftime-report -fdiagnostics-format=sarif-file
and have a look at the generated .sarif file, e.g. via 
  python -m json.tool foo.c.sarif
which will pretty-print the JSON to stdout.

Currently I'm writing out the values as floating-point seconds, and
AFAIK my analyzer integration testsuite [1] is the only consumer of
this data.

[...snip...]

Thanks
Dave
[1]
https://github.com/davidmalcolm/gcc-analyzer-integration-tests/issues/5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mid-end: Use integral time intervals in timevar.cc
  2023-08-03 14:09     ` David Malcolm
@ 2023-08-03 14:54       ` Matthew Malcomson
  2023-08-03 15:34         ` David Malcolm
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Malcomson @ 2023-08-03 14:54 UTC (permalink / raw)
  To: David Malcolm, Richard Biener
  Cc: gcc-patches, richard.sandiford, Xi Ruoyao, Andrew Pinski

On 8/3/23 15:09, David Malcolm wrote:
> 
> Hi Matthew.  I recently touched the timevar code (in r14-2881-
> g75d623946d4b6e) to add support for serializing the timevar data in
> JSON form as part of the SARIF output (PR analyzer/109361).
> 
> Looking at your patch, it looks like the baseline for the patch seems
> to predate r14-2881-g75d623946d4b6e.
> 
> I don't have a strong opinion on the implementation choices in your
> patch, but please can you rebase to beyond my recent change and make
> sure that the SARIF serialization still works with your patch.
> 
> Specifically, please try compiling with
>    -ftime-report -fdiagnostics-format=sarif-file
> and have a look at the generated .sarif file, e.g. via
>    python -m json.tool foo.c.sarif
> which will pretty-print the JSON to stdout.
> 
> Currently I'm writing out the values as floating-point seconds, and
> AFAIK my analyzer integration testsuite [1] is the only consumer of
> this data.

Hi David,

Thanks for the heads-up.  Will update the patch.

I read your last paragraph as suggesting that you'd be open to changing 
the format.  Is that correct?

I would initially assume that writing out the time as floating-point 
seconds would still be most convenient for your use since it looks to be 
like something to be presented to a person.

However, since I don't know much about the intended uses of SARIF in 
general I figured I should double-check -- does that choice to remain 
printing out floating-point seconds seem best to you?

> 
> [...snip...]
> 
> Thanks
> Dave
> [1]
> https://github.com/davidmalcolm/gcc-analyzer-integration-tests/issues/5
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mid-end: Use integral time intervals in timevar.cc
  2023-08-03 14:54       ` Matthew Malcomson
@ 2023-08-03 15:34         ` David Malcolm
  0 siblings, 0 replies; 18+ messages in thread
From: David Malcolm @ 2023-08-03 15:34 UTC (permalink / raw)
  To: Matthew Malcomson, Richard Biener
  Cc: gcc-patches, richard.sandiford, Xi Ruoyao, Andrew Pinski

On Thu, 2023-08-03 at 15:54 +0100, Matthew Malcomson wrote:
> On 8/3/23 15:09, David Malcolm wrote:
> > 
> > Hi Matthew.  I recently touched the timevar code (in r14-2881-
> > g75d623946d4b6e) to add support for serializing the timevar data in
> > JSON form as part of the SARIF output (PR analyzer/109361).
> > 
> > Looking at your patch, it looks like the baseline for the patch
> > seems
> > to predate r14-2881-g75d623946d4b6e.
> > 
> > I don't have a strong opinion on the implementation choices in your
> > patch, but please can you rebase to beyond my recent change and
> > make
> > sure that the SARIF serialization still works with your patch.
> > 
> > Specifically, please try compiling with
> >    -ftime-report -fdiagnostics-format=sarif-file
> > and have a look at the generated .sarif file, e.g. via
> >    python -m json.tool foo.c.sarif
> > which will pretty-print the JSON to stdout.
> > 
> > Currently I'm writing out the values as floating-point seconds, and
> > AFAIK my analyzer integration testsuite [1] is the only consumer of
> > this data.
> 
> Hi David,
> 
> Thanks for the heads-up.  Will update the patch.
> 
> I read your last paragraph as suggesting that you'd be open to
> changing 
> the format.  Is that correct?

I suppose, but I'd prefer to keep the existing format.

> 
> I would initially assume that writing out the time as floating-point 
> seconds would still be most convenient for your use since it looks to
> be 
> like something to be presented to a person.

Yes.  I may be biased in that with -fanalyzer the times tend to be
measured in seconds rather than fractions of seconds, alas.

> 
> However, since I don't know much about the intended uses of SARIF in 
> general I figured I should double-check -- does that choice to remain
> printing out floating-point seconds seem best to you?

I'd prefer to keep the JSON output as floating-point seconds, if that's
not too much of a pain.

Dave



> 
> > 
> > [...snip...]
> > 
> > Thanks
> > Dave
> > [1]
> > https://github.com/davidmalcolm/gcc-analyzer-integration-tests/issues/5
> > 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mid-end: Use integral time intervals in timevar.cc
  2023-08-03 13:38   ` [PATCH] mid-end: Use integral time intervals " Matthew Malcomson
  2023-08-03 14:09     ` David Malcolm
@ 2023-08-04  6:44     ` Richard Biener
  2023-08-04  9:41       ` Matthew Malcomson
  1 sibling, 1 reply; 18+ messages in thread
From: Richard Biener @ 2023-08-04  6:44 UTC (permalink / raw)
  To: Matthew Malcomson
  Cc: gcc-patches, richard.sandiford, Xi Ruoyao, Andrew Pinski

On Thu, 3 Aug 2023, Matthew Malcomson wrote:

> > 
> > I think this is undesriable.  With fused you mean we use FMA?
> > I think you could use -ffp-contract=off for the TU instead.
> > 
> > Note you can't use __attribute__((noinline)) literally since the
> > host compiler might not support this.
> > 
> > Richard.
> > 
> 
> 
> Trying to make the timevar store integral time intervals.
> Hope this is acceptable -- I had originally planned to use
> `-ffp-contract` as agreed until I saw the email mentioning the old x86
> bug in the same area which was not to do with floating point contraction
> of operations (PR 99903) and figured it would be better to try and solve
> both at the same time while making things in general a bit more robust.
> ________________________________________________________________________
> 
> 
> On some AArch64 bootstrapped builds, we were getting a flaky test
> because the floating point operations in `get_time` were being fused
> with the floating point operations in `timevar_accumulate`.
> 
> This meant that the rounding behaviour of our multiplication with
> `ticks_to_msec` was different when used in `timer::start` and when
> performed in `timer::stop`.  These extra inaccuracies led to the
> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
> 
> ------------------------------
> Avoiding the inlining which was agreed to be undesirable.  Three
> alternative approaches:
> 1) Use `-ffp-contract=on` to avoid this particular optimisation.
> 2) Adjusting the code so that the "tolerance" is always of the order of
>    a "tick".
> 3) Recording times and elapsed differences in integral values.
>    - Could be in terms of a standard measurement (e.g. nanoseconds or
>      microseconds).
>    - Could be in terms of whatever integral value ("ticks" /
>      seconds&microseconds / "clock ticks") is returned from the syscall
>      chosen at configure time.
> 
> While `-ffp-contract=on` removes the problem that I bumped into, there
> has been a similar bug on x86 that was to do with a different floating
> point problem that also happens after `get_time` and
> `timevar_accumulate` both being inlined into the same function.  Hence
> it seems worth choosing a different approach.
> 
> Of the two other solutions, recording measurements in integral values
> seems the most robust against slightly "off" measurements being
> presented to the user -- even though it could avoid the ICE that creates
> a flaky test.
> 
> I considered storing time in whatever units our syscall returns and
> normalising them at the time we print out rather than normalising them
> to nanoseconds at the point we record our "current time".  The logic
> being that normalisation could have some rounding affect (e.g. if
> TICKS_PER_SECOND is 3) that would be taken into account in calculations.
> 
> I decided against it in order to give the values recorded in
> `timevar_time_def` some interpretive value so it's easier to read the
> code.  Compared to the small rounding that would represent a tiny amount
> of time and AIUI can not trigger the same kind of ICE's as we are
> attempting to fix, said interpretive value seems more valuable.
> 
> Recording time in microseconds seemed reasonable since all obvious
> values for ticks and `getrusage` are at microsecond granularity or less
> precise.  That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are both
> variables given to use by the host system I was not sure of that enough
> to make this decision.
> 
> ------------------------------
> timer::all_zero is ignoring rows which are inconsequential to the user
> and would be printed out as all zeros.  Since upon printing rows we
> convert to the same double value and print out the same precision as
> before, we return true/false based on the same amount of time as before.
> 
> timer::print_row casts to a floating point measurement in units of
> seconds as was printed out before.
> 
> timer::validate_phases -- I'm printing out nanoseconds here rather than
> floating point seconds since this is an error message for when things
> have "gone wrong" printing out the actual nanoseconds that have been
> recorded seems like the best approach.
> N.b. since we now print out nanoseconds instead of floating point value
> the padding requirements are different.  Originally we were padding to
> 24 characters and printing 18 decimal places.  This looked odd with the
> now visually smaller values getting printed.  I judged 13 characters
> (corresponding to 2 hours) to be a reasonable point at which our
> alignment could start to degrade and this provides a more compact output
> for the majority of cases (checked by triggering the error case via
> GDB).
> 
> ------------------------------
> N.b. I use a literal 1000000000 for "NANOSEC_PER_SEC".  I believe this
> would fit in an integer on all hosts that GCC supports, but am not
> certain there are not strange integer sizes we support hence am pointing
> it out for special attention during review.
> 
> ------------------------------
> No expected change in generated code.
> Bootstrapped and regtested on AArch64 with no regressions.
> Manually checked that flaky test is no longer flaky on the machine it
> was seen before.

The patch looks OK to me if it passes bootstrap / regtest and the
output of -ftime-report doesn't change (too much).

Thanks,
Richard.

> gcc/ChangeLog:
> 
> 	PR middle-end/110316
> 	PR middle-end/9903
> 	* timevar.cc (NANOSEC_PER_SEC, TICKS_TO_NANOSEC,
> 	CLOCKS_TO_NANOSEC, nanosec_to_floating_sec, percent_of): New.
> 	(TICKS_TO_MSEC, CLOCKS_TO_MSEC): Remove these macros.
> 	(timer::validate_phases): Use integral arithmetic to check
> 	validity.
> 	(timer::print_row, timer::print): Convert from integral
> 	nanoseconds to floating	point seconds before printing.
> 	(timer::all_zero): Change limit to nanosec count instead of
> 	fractional count of seconds.
> 	* timevar.h (struct timevar_time_def): Update all measurements
> 	to use uint64_t nanoseconds rather than seconds stored in a
> 	double.
> 
> 
> 
> ###############     Attachment also inlined for ease of reply    ###############
> 
> 
> diff --git a/gcc/timevar.h b/gcc/timevar.h
> index ad46573160957c6461190204c5181ec9e904e204..882954a527e11a90873f25a45e9073149709dd05 100644
> --- a/gcc/timevar.h
> +++ b/gcc/timevar.h
> @@ -44,7 +44,7 @@
>  */
>  
>  /* This structure stores the various varieties of time that can be
> -   measured.  Times are stored in seconds.  The time may be an
> +   measured.  Times are stored in nanoseconds.  The time may be an
>     absolute time or a time difference; in the former case, the time
>     base is undefined, except that the difference between two times
>     produces a valid time difference.  */
> @@ -52,14 +52,13 @@
>  struct timevar_time_def
>  {
>    /* User time in this process.  */
> -  double user;
> +  uint64_t user;
>  
> -  /* System time (if applicable for this host platform) in this
> -     process.  */
> -  double sys;
> +  /* System time (if applicable for this host platform) in this process.  */
> +  uint64_t sys;
>  
>    /* Wall clock time.  */
> -  double wall;
> +  uint64_t wall;
>  
>    /* Garbage collector memory.  */
>    size_t ggc_mem;
> diff --git a/gcc/timevar.cc b/gcc/timevar.cc
> index d695297aae7f6b2a6de01a37fe86c2a232338df0..c6094b1c336c30e5a404ff8fd5c79795edbd59f7 100644
> --- a/gcc/timevar.cc
> +++ b/gcc/timevar.cc
> @@ -42,7 +42,7 @@ struct tms
>  # define RUSAGE_SELF 0
>  #endif
>  
> -/* Calculation of scale factor to convert ticks to microseconds.
> +/* Calculation of scale factor to convert ticks to seconds.
>     We mustn't use CLOCKS_PER_SEC except with clock().  */
>  #if HAVE_SYSCONF && defined _SC_CLK_TCK
>  # define TICKS_PER_SECOND sysconf (_SC_CLK_TCK) /* POSIX 1003.1-1996 */
> @@ -91,14 +91,15 @@ struct tms
>     the underlying constants, and that can be very slow, so we have to
>     precompute them.  Whose wonderful idea was it to make all those
>     _constants_ variable at run time, anyway?  */
> +#define NANOSEC_PER_SEC 1000000000
>  #ifdef USE_TIMES
> -static double ticks_to_msec;
> -#define TICKS_TO_MSEC (1 / (double)TICKS_PER_SECOND)
> +static uint64_t ticks_to_nanosec;
> +#define TICKS_TO_NANOSEC (NANOSEC_PER_SEC / TICKS_PER_SECOND)
>  #endif
>  
>  #ifdef USE_CLOCK
> -static double clocks_to_msec;
> -#define CLOCKS_TO_MSEC (1 / (double)CLOCKS_PER_SEC)
> +static uint64_t clocks_to_nanosec;
> +#define CLOCKS_TO_NANOSEC (NANOSEC_PER_SEC / CLOCKS_PER_SEC)
>  #endif
>  
>  /* Non-NULL if timevars should be used.  In GCC, this happens with
> @@ -222,18 +223,20 @@ get_time (struct timevar_time_def *now)
>    {
>  #ifdef USE_TIMES
>      struct tms tms;
> -    now->wall = times (&tms)  * ticks_to_msec;
> -    now->user = tms.tms_utime * ticks_to_msec;
> -    now->sys  = tms.tms_stime * ticks_to_msec;
> +    now->wall = times (&tms)  * ticks_to_nanosec;
> +    now->user = tms.tms_utime * ticks_to_nanosec;
> +    now->sys  = tms.tms_stime * ticks_to_nanosec;
>  #endif
>  #ifdef USE_GETRUSAGE
>      struct rusage rusage;
>      getrusage (RUSAGE_SELF, &rusage);
> -    now->user = rusage.ru_utime.tv_sec + rusage.ru_utime.tv_usec * 1e-6;
> -    now->sys  = rusage.ru_stime.tv_sec + rusage.ru_stime.tv_usec * 1e-6;
> +    now->user = rusage.ru_utime.tv_sec * NANOSEC_PER_SEC
> +		+ rusage.ru_utime.tv_usec * 1000;
> +    now->sys  = rusage.ru_stime.tv_sec * NANOSEC_PER_SEC
> +		+ rusage.ru_stime.tv_usec * 1000;
>  #endif
>  #ifdef USE_CLOCK
> -    now->user = clock () * clocks_to_msec;
> +    now->user = clock () * clocks_to_nanosec;
>  #endif
>    }
>  }
> @@ -271,10 +274,10 @@ timer::timer () :
>    /* Initialize configuration-specific state.
>       Ideally this would be one-time initialization.  */
>  #ifdef USE_TIMES
> -  ticks_to_msec = TICKS_TO_MSEC;
> +  ticks_to_nanosec = TICKS_TO_NANOSEC;
>  #endif
>  #ifdef USE_CLOCK
> -  clocks_to_msec = CLOCKS_TO_MSEC;
> +  clocks_to_nanosec = CLOCKS_TO_NANOSEC;
>  #endif
>  }
>  
> @@ -583,12 +586,11 @@ timer::validate_phases (FILE *fp) const
>  {
>    unsigned int /* timevar_id_t */ id;
>    const timevar_time_def *total = &m_timevars[TV_TOTAL].elapsed;
> -  double phase_user = 0.0;
> -  double phase_sys = 0.0;
> -  double phase_wall = 0.0;
> +  uint64_t phase_user = 0;
> +  uint64_t phase_sys = 0;
> +  uint64_t phase_wall = 0;
>    size_t phase_ggc_mem = 0;
>    static char phase_prefix[] = "phase ";
> -  const double tolerance = 1.000001;  /* One part in a million.  */
>  
>    for (id = 0; id < (unsigned int) TIMEVAR_LAST; ++id)
>      {
> @@ -607,26 +609,32 @@ timer::validate_phases (FILE *fp) const
>  	}
>      }
>  
> -  if (phase_user > total->user * tolerance
> -      || phase_sys > total->sys * tolerance
> -      || phase_wall > total->wall * tolerance
> -      || phase_ggc_mem > total->ggc_mem * tolerance)
> +  if (phase_user > total->user
> +      || phase_sys > total->sys
> +      || phase_wall > total->wall
> +      || phase_ggc_mem > total->ggc_mem)
>      {
>  
>        fprintf (fp, "Timing error: total of phase timers exceeds total time.\n");
>        if (phase_user > total->user)
> -	fprintf (fp, "user    %24.18e > %24.18e\n", phase_user, total->user);
> +	fprintf (fp, "user    %13" PRIu64 " > %13" PRIu64 "\n",
> +		 phase_user, total->user);
>        if (phase_sys > total->sys)
> -	fprintf (fp, "sys     %24.18e > %24.18e\n", phase_sys, total->sys);
> +	fprintf (fp, "sys     %13" PRIu64 " > %13" PRIu64 "\n",
> +		 phase_sys, total->sys);
>        if (phase_wall > total->wall)
> -	fprintf (fp, "wall    %24.18e > %24.18e\n", phase_wall, total->wall);
> +	fprintf (fp, "wall    %13" PRIu64 " > %13" PRIu64 "\n",
> +		 phase_wall, total->wall);
>        if (phase_ggc_mem > total->ggc_mem)
> -	fprintf (fp, "ggc_mem %24lu > %24lu\n", (unsigned long)phase_ggc_mem,
> +	fprintf (fp, "ggc_mem %13lu > %13lu\n", (unsigned long)phase_ggc_mem,
>  		 (unsigned long)total->ggc_mem);
>        gcc_unreachable ();
>      }
>  }
>  
> +#define nanosec_to_floating_sec(NANO) ((double)(NANO) * 1e-9)
> +#define percent_of(TOTAL, SUBTOTAL) \
> +  ((TOTAL) == 0 ? 0 : ((double)SUBTOTAL / TOTAL) * 100)
>  /* Helper function for timer::print.  */
>  void
> @@ -640,22 +648,22 @@ timer::print_row (FILE *fp,
>  #ifdef HAVE_USER_TIME
>    /* Print user-mode time for this process.  */
>    fprintf (fp, "%7.2f (%3.0f%%)",
> -	   elapsed.user,
> -	   (total->user == 0 ? 0 : elapsed.user / total->user) * 100);
> +	   nanosec_to_floating_sec (elapsed.user),
> +	   percent_of (total->user, elapsed.user));
>  #endif /* HAVE_USER_TIME */
>  
>  #ifdef HAVE_SYS_TIME
>    /* Print system-mode time for this process.  */
>    fprintf (fp, "%7.2f (%3.0f%%)",
> -	   elapsed.sys,
> -	   (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100);
> +	   nanosec_to_floating_sec (elapsed.sys),
> +	   percent_of (total->sys, elapsed.sys));
>  #endif /* HAVE_SYS_TIME */
>  
>  #ifdef HAVE_WALL_TIME
>    /* Print wall clock time elapsed.  */
>    fprintf (fp, "%7.2f (%3.0f%%)",
> -	   elapsed.wall,
> -	   (total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100);
> +	   nanosec_to_floating_sec (elapsed.wall),
> +	   percent_of (total->wall, elapsed.wall));
>  #endif /* HAVE_WALL_TIME */
>  
>    /* Print the amount of ggc memory allocated.  */
> @@ -673,7 +681,8 @@ timer::print_row (FILE *fp,
>  bool
>  timer::all_zero (const timevar_time_def &elapsed)
>  {
> -  const double tiny = 5e-3;
> +  /* 5000000 nanosec == 5e-3 seconds.  */
> +  uint64_t tiny = 5000000;
>    return (elapsed.user < tiny
>  	  && elapsed.sys < tiny
>  	  && elapsed.wall < tiny
> @@ -766,13 +775,13 @@ timer::print (FILE *fp)
>    /* Print total time.  */
>    fprintf (fp, " %-35s:", "TOTAL");
>  #ifdef HAVE_USER_TIME
> -  fprintf (fp, "%7.2f      ", total->user);
> +  fprintf (fp, "%7.2f      ", nanosec_to_floating_sec (total->user));
>  #endif
>  #ifdef HAVE_SYS_TIME
> -  fprintf (fp, "%8.2f      ", total->sys);
> +  fprintf (fp, "%8.2f      ", nanosec_to_floating_sec (total->sys));
>  #endif
>  #ifdef HAVE_WALL_TIME
> -  fprintf (fp, "%8.2f      ", total->wall);
> +  fprintf (fp, "%8.2f      ", nanosec_to_floating_sec (total->wall));
>  #endif
>    fprintf (fp, PRsa (7) "\n", SIZE_AMOUNT (total->ggc_mem));
>  
> @@ -790,6 +799,8 @@ timer::print (FILE *fp)
>  
>    validate_phases (fp);
>  }
> +#undef nanosec_to_floating_sec
> +#undef percent_of
>  
>  /* Get the name of the topmost item.  For use by jit for validating
>     inputs to gcc_jit_timer_pop.  */
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] mid-end: Use integral time intervals in timevar.cc
  2023-08-04  6:44     ` Richard Biener
@ 2023-08-04  9:41       ` Matthew Malcomson
  2023-08-04 10:18         ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Malcomson @ 2023-08-04  9:41 UTC (permalink / raw)
  To: David Malcolm, Richard Biener
  Cc: gcc-patches, richard.sandiford, Xi Ruoyao, Andrew Pinski, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 6097 bytes --]

Hopefully last update ...

> Specifically, please try compiling with
>    -ftime-report -fdiagnostics-format=sarif-file
> and have a look at the generated .sarif file, e.g. via
>    python -m json.tool foo.c.sarif
> which will pretty-print the JSON to stdout.

Rebasing onto the JSON output was quite simple -- I've inlined the only
change in the patch below (to cast to floating point seconds before
generating the json report).

I have manually checked the SARIF output as you suggested and all looks
good (an in fact better because we no longer save some strange times
like the below due to avoiding the floating point rounding).
	  "wall": -4.49516e-09,


> 
> The patch looks OK to me if it passes bootstrap / regtest and the
> output of -ftime-report doesn't change (too much).
> 
> Thanks,
> Richard.

Though I don't expect you were asking for this, confirmation below that
the output doesn't change.  (Figured I may as well include that info
since the rebase to include the JSON output that David had just added
required re-sending an email anyway).



```
hw-a20-8:checking-theory [10:07:01] $ ${old_build}/gcc/xgcc -B${old_build}/gcc/    -fdiagnostics-plain-output    -Os  -w -S test-sum.c -o /dev/null   -ftime-report

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.01 ( 14%)   0.00 (  0%)   0.02 ( 18%)  3389k ( 74%)
 phase parsing                      :   0.03 ( 43%)   0.02 ( 67%)   0.06 ( 55%)   982k ( 21%)
 phase opt and generate             :   0.03 ( 43%)   0.01 ( 33%)   0.03 ( 27%)   215k (  5%)
 callgraph functions expansion      :   0.02 ( 29%)   0.00 (  0%)   0.03 ( 27%)   162k (  4%)
 callgraph ipa passes               :   0.01 ( 14%)   0.01 ( 33%)  -0.00 ( -0%)    38k (  1%)
 CFG verifier                       :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)     0  (  0%)
 preprocessing                      :   0.02 ( 29%)   0.02 ( 67%)   0.02 ( 18%)   272k (  6%)
 lexical analysis                   :   0.01 ( 14%)   0.00 (  0%)   0.02 ( 18%)     0  (  0%)
 parser (global)                    :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)   637k ( 14%)
 parser function body               :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    18k (  0%)
 tree CFG cleanup                   :   0.00 (  0%)   0.01 ( 33%)   0.00 (  0%)     0  (  0%)
 tree STMT verifier                 :   0.01 ( 14%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 expand                             :   0.01 ( 14%)   0.00 (  0%)   0.00 (  0%)    12k (  0%)
 loop init                          :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    12k (  0%)
 initialize rtl                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    15k (  0%)
 rest of compilation                :   0.01 ( 14%)   0.00 (  0%)   0.00 (  0%)  6920  (  0%)
 TOTAL                              :   0.07          0.03          0.11         4587k
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
hw-a20-8:checking-theory [10:06:44] $ ${new_build}/gcc/xgcc -B${new_build}/gcc/    -fdiagnostics-plain-output    -Os  -w -S test-sum.c -o /dev/null   -ftime-report

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.01 ( 17%)   0.00 (  0%)   0.02 ( 18%)  3389k ( 74%)
 phase parsing                      :   0.02 ( 33%)   0.03 ( 75%)   0.05 ( 45%)   982k ( 21%)
 phase opt and generate             :   0.03 ( 50%)   0.01 ( 25%)   0.04 ( 36%)   215k (  5%)
 callgraph construction             :   0.00 (  0%)   0.01 ( 25%)   0.00 (  0%)  1864  (  0%)
 callgraph functions expansion      :   0.02 ( 33%)   0.00 (  0%)   0.03 ( 27%)   162k (  4%)
 callgraph ipa passes               :   0.01 ( 17%)   0.00 (  0%)   0.01 (  9%)    38k (  1%)
 ipa free lang data                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)     0  (  0%)
 CFG verifier                       :   0.02 ( 33%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 preprocessing                      :   0.01 ( 17%)   0.03 ( 75%)   0.01 (  9%)   272k (  6%)
 lexical analysis                   :   0.01 ( 17%)   0.00 (  0%)   0.02 ( 18%)     0  (  0%)
 parser (global)                    :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)   637k ( 14%)
 parser inl. func. body             :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    19k (  0%)
 tree STMT verifier                 :   0.01 ( 17%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 expand                             :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    12k (  0%)
 integrated RA                      :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    50k (  1%)
 initialize rtl                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    15k (  0%)
 TOTAL                              :   0.06          0.04          0.11         4587k
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
```


N.b. in the change below it's worth mentioning that the
`nanosec_to_floating_sec` and `percent_of` macros were already defined.
I just moved them to below the `make_json_for_timevar_time_def`
function.
###############          Changes to the previous patch.          ###############


@@ -832,12 +841,16 @@ json::object *
 make_json_for_timevar_time_def (const timevar_time_def &ttd)
 {
   json::object *obj = new json::object ();
-  obj->set ("user", new json::float_number (ttd.user));
-  obj->set ("sys", new json::float_number (ttd.sys));
-  obj->set ("wall", new json::float_number (ttd.wall));
+  obj->set ("user",
+	    new json::float_number (nanosec_to_floating_sec (ttd.user)));
+  obj->set ("sys", new json::float_number (nanosec_to_floating_sec (ttd.sys)));
+  obj->set ("wall",
+	    new json::float_number (nanosec_to_floating_sec (ttd.wall)));
   obj->set ("ggc_mem", new json::integer_number (ttd.ggc_mem));
   return obj;
 }
+#undef nanosec_to_floating_sec
+#undef percent_of
 
 /* Create a json value representing this object, suitable for use
    in SARIF output.  */




[-- Attachment #2: timevar-including-json.patch --]
[-- Type: text/plain, Size: 8525 bytes --]

diff --git a/gcc/timevar.h b/gcc/timevar.h
index e359e9fa1fa5b4c230aa2711e58a8883824d9682..541216ee1dcaa6b3a4702fe6a2b171c51ffd6f27 100644
--- a/gcc/timevar.h
+++ b/gcc/timevar.h
@@ -46,7 +46,7 @@ namespace json { class value; }
 */
 
 /* This structure stores the various varieties of time that can be
-   measured.  Times are stored in seconds.  The time may be an
+   measured.  Times are stored in nanoseconds.  The time may be an
    absolute time or a time difference; in the former case, the time
    base is undefined, except that the difference between two times
    produces a valid time difference.  */
@@ -54,14 +54,13 @@ namespace json { class value; }
 struct timevar_time_def
 {
   /* User time in this process.  */
-  double user;
+  uint64_t user;
 
-  /* System time (if applicable for this host platform) in this
-     process.  */
-  double sys;
+  /* System time (if applicable for this host platform) in this process.  */
+  uint64_t sys;
 
   /* Wall clock time.  */
-  double wall;
+  uint64_t wall;
 
   /* Garbage collector memory.  */
   size_t ggc_mem;
diff --git a/gcc/timevar.cc b/gcc/timevar.cc
index 5368ff06ac966a1a69b7008fd23f33ff45464138..a63e5c6334e652c76fd91f673a6fdf6ff4b61867 100644
--- a/gcc/timevar.cc
+++ b/gcc/timevar.cc
@@ -43,7 +43,7 @@ struct tms
 # define RUSAGE_SELF 0
 #endif
 
-/* Calculation of scale factor to convert ticks to microseconds.
+/* Calculation of scale factor to convert ticks to seconds.
    We mustn't use CLOCKS_PER_SEC except with clock().  */
 #if HAVE_SYSCONF && defined _SC_CLK_TCK
 # define TICKS_PER_SECOND sysconf (_SC_CLK_TCK) /* POSIX 1003.1-1996 */
@@ -92,14 +92,15 @@ struct tms
    the underlying constants, and that can be very slow, so we have to
    precompute them.  Whose wonderful idea was it to make all those
    _constants_ variable at run time, anyway?  */
+#define NANOSEC_PER_SEC 1000000000
 #ifdef USE_TIMES
-static double ticks_to_msec;
-#define TICKS_TO_MSEC (1 / (double)TICKS_PER_SECOND)
+static uint64_t ticks_to_nanosec;
+#define TICKS_TO_NANOSEC (NANOSEC_PER_SEC / TICKS_PER_SECOND)
 #endif
 
 #ifdef USE_CLOCK
-static double clocks_to_msec;
-#define CLOCKS_TO_MSEC (1 / (double)CLOCKS_PER_SEC)
+static uint64_t clocks_to_nanosec;
+#define CLOCKS_TO_NANOSEC (NANOSEC_PER_SEC / CLOCKS_PER_SEC)
 #endif
 
 /* Non-NULL if timevars should be used.  In GCC, this happens with
@@ -243,18 +244,20 @@ get_time (struct timevar_time_def *now)
   {
 #ifdef USE_TIMES
     struct tms tms;
-    now->wall = times (&tms)  * ticks_to_msec;
-    now->user = tms.tms_utime * ticks_to_msec;
-    now->sys  = tms.tms_stime * ticks_to_msec;
+    now->wall = times (&tms)  * ticks_to_nanosec;
+    now->user = tms.tms_utime * ticks_to_nanosec;
+    now->sys  = tms.tms_stime * ticks_to_nanosec;
 #endif
 #ifdef USE_GETRUSAGE
     struct rusage rusage;
     getrusage (RUSAGE_SELF, &rusage);
-    now->user = rusage.ru_utime.tv_sec + rusage.ru_utime.tv_usec * 1e-6;
-    now->sys  = rusage.ru_stime.tv_sec + rusage.ru_stime.tv_usec * 1e-6;
+    now->user = rusage.ru_utime.tv_sec * NANOSEC_PER_SEC
+		+ rusage.ru_utime.tv_usec * 1000;
+    now->sys  = rusage.ru_stime.tv_sec * NANOSEC_PER_SEC
+		+ rusage.ru_stime.tv_usec * 1000;
 #endif
 #ifdef USE_CLOCK
-    now->user = clock () * clocks_to_msec;
+    now->user = clock () * clocks_to_nanosec;
 #endif
   }
 }
@@ -305,10 +308,10 @@ timer::timer () :
   /* Initialize configuration-specific state.
      Ideally this would be one-time initialization.  */
 #ifdef USE_TIMES
-  ticks_to_msec = TICKS_TO_MSEC;
+  ticks_to_nanosec = TICKS_TO_NANOSEC;
 #endif
 #ifdef USE_CLOCK
-  clocks_to_msec = CLOCKS_TO_MSEC;
+  clocks_to_nanosec = CLOCKS_TO_NANOSEC;
 #endif
 }
 
@@ -617,12 +620,11 @@ timer::validate_phases (FILE *fp) const
 {
   unsigned int /* timevar_id_t */ id;
   const timevar_time_def *total = &m_timevars[TV_TOTAL].elapsed;
-  double phase_user = 0.0;
-  double phase_sys = 0.0;
-  double phase_wall = 0.0;
+  uint64_t phase_user = 0;
+  uint64_t phase_sys = 0;
+  uint64_t phase_wall = 0;
   size_t phase_ggc_mem = 0;
   static char phase_prefix[] = "phase ";
-  const double tolerance = 1.000001;  /* One part in a million.  */
 
   for (id = 0; id < (unsigned int) TIMEVAR_LAST; ++id)
     {
@@ -641,26 +643,32 @@ timer::validate_phases (FILE *fp) const
 	}
     }
 
-  if (phase_user > total->user * tolerance
-      || phase_sys > total->sys * tolerance
-      || phase_wall > total->wall * tolerance
-      || phase_ggc_mem > total->ggc_mem * tolerance)
+  if (phase_user > total->user
+      || phase_sys > total->sys
+      || phase_wall > total->wall
+      || phase_ggc_mem > total->ggc_mem)
     {
 
       fprintf (fp, "Timing error: total of phase timers exceeds total time.\n");
       if (phase_user > total->user)
-	fprintf (fp, "user    %24.18e > %24.18e\n", phase_user, total->user);
+	fprintf (fp, "user    %13" PRIu64 " > %13" PRIu64 "\n",
+		 phase_user, total->user);
       if (phase_sys > total->sys)
-	fprintf (fp, "sys     %24.18e > %24.18e\n", phase_sys, total->sys);
+	fprintf (fp, "sys     %13" PRIu64 " > %13" PRIu64 "\n",
+		 phase_sys, total->sys);
       if (phase_wall > total->wall)
-	fprintf (fp, "wall    %24.18e > %24.18e\n", phase_wall, total->wall);
+	fprintf (fp, "wall    %13" PRIu64 " > %13" PRIu64 "\n",
+		 phase_wall, total->wall);
       if (phase_ggc_mem > total->ggc_mem)
-	fprintf (fp, "ggc_mem %24lu > %24lu\n", (unsigned long)phase_ggc_mem,
+	fprintf (fp, "ggc_mem %13lu > %13lu\n", (unsigned long)phase_ggc_mem,
 		 (unsigned long)total->ggc_mem);
       gcc_unreachable ();
     }
 }
 
+#define nanosec_to_floating_sec(NANO) ((double)(NANO) * 1e-9)
+#define percent_of(TOTAL, SUBTOTAL) \
+  ((TOTAL) == 0 ? 0 : ((double)SUBTOTAL / TOTAL) * 100)
 /* Helper function for timer::print.  */
 
 void
@@ -674,22 +682,22 @@ timer::print_row (FILE *fp,
 #ifdef HAVE_USER_TIME
   /* Print user-mode time for this process.  */
   fprintf (fp, "%7.2f (%3.0f%%)",
-	   elapsed.user,
-	   (total->user == 0 ? 0 : elapsed.user / total->user) * 100);
+	   nanosec_to_floating_sec (elapsed.user),
+	   percent_of (total->user, elapsed.user));
 #endif /* HAVE_USER_TIME */
 
 #ifdef HAVE_SYS_TIME
   /* Print system-mode time for this process.  */
   fprintf (fp, "%7.2f (%3.0f%%)",
-	   elapsed.sys,
-	   (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100);
+	   nanosec_to_floating_sec (elapsed.sys),
+	   percent_of (total->sys, elapsed.sys));
 #endif /* HAVE_SYS_TIME */
 
 #ifdef HAVE_WALL_TIME
   /* Print wall clock time elapsed.  */
   fprintf (fp, "%7.2f (%3.0f%%)",
-	   elapsed.wall,
-	   (total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100);
+	   nanosec_to_floating_sec (elapsed.wall),
+	   percent_of (total->wall, elapsed.wall));
 #endif /* HAVE_WALL_TIME */
 
   /* Print the amount of ggc memory allocated.  */
@@ -707,7 +715,8 @@ timer::print_row (FILE *fp,
 bool
 timer::all_zero (const timevar_time_def &elapsed)
 {
-  const double tiny = 5e-3;
+  /* 5000000 nanosec == 5e-3 seconds.  */
+  uint64_t tiny = 5000000;
   return (elapsed.user < tiny
 	  && elapsed.sys < tiny
 	  && elapsed.wall < tiny
@@ -800,13 +809,13 @@ timer::print (FILE *fp)
   /* Print total time.  */
   fprintf (fp, " %-35s:", "TOTAL");
 #ifdef HAVE_USER_TIME
-  fprintf (fp, "%7.2f      ", total->user);
+  fprintf (fp, "%7.2f      ", nanosec_to_floating_sec (total->user));
 #endif
 #ifdef HAVE_SYS_TIME
-  fprintf (fp, "%8.2f      ", total->sys);
+  fprintf (fp, "%8.2f      ", nanosec_to_floating_sec (total->sys));
 #endif
 #ifdef HAVE_WALL_TIME
-  fprintf (fp, "%8.2f      ", total->wall);
+  fprintf (fp, "%8.2f      ", nanosec_to_floating_sec (total->wall));
 #endif
   fprintf (fp, PRsa (7) "\n", SIZE_AMOUNT (total->ggc_mem));
 
@@ -832,12 +841,16 @@ json::object *
 make_json_for_timevar_time_def (const timevar_time_def &ttd)
 {
   json::object *obj = new json::object ();
-  obj->set ("user", new json::float_number (ttd.user));
-  obj->set ("sys", new json::float_number (ttd.sys));
-  obj->set ("wall", new json::float_number (ttd.wall));
+  obj->set ("user",
+	    new json::float_number (nanosec_to_floating_sec (ttd.user)));
+  obj->set ("sys", new json::float_number (nanosec_to_floating_sec (ttd.sys)));
+  obj->set ("wall",
+	    new json::float_number (nanosec_to_floating_sec (ttd.wall)));
   obj->set ("ggc_mem", new json::integer_number (ttd.ggc_mem));
   return obj;
 }
+#undef nanosec_to_floating_sec
+#undef percent_of
 
 /* Create a json value representing this object, suitable for use
    in SARIF output.  */




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mid-end: Use integral time intervals in timevar.cc
  2023-08-04  9:41       ` Matthew Malcomson
@ 2023-08-04 10:18         ` Richard Biener
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Biener @ 2023-08-04 10:18 UTC (permalink / raw)
  To: Matthew Malcomson
  Cc: David Malcolm, gcc-patches, richard.sandiford, Xi Ruoyao, Andrew Pinski

On Fri, 4 Aug 2023, Matthew Malcomson wrote:

> Hopefully last update ...
> 
> > Specifically, please try compiling with
> >    -ftime-report -fdiagnostics-format=sarif-file
> > and have a look at the generated .sarif file, e.g. via
> >    python -m json.tool foo.c.sarif
> > which will pretty-print the JSON to stdout.
> 
> Rebasing onto the JSON output was quite simple -- I've inlined the only
> change in the patch below (to cast to floating point seconds before
> generating the json report).
> 
> I have manually checked the SARIF output as you suggested and all looks
> good (an in fact better because we no longer save some strange times
> like the below due to avoiding the floating point rounding).
> 	  "wall": -4.49516e-09,
> 
> 
> > 
> > The patch looks OK to me if it passes bootstrap / regtest and the
> > output of -ftime-report doesn't change (too much).
> > 
> > Thanks,
> > Richard.
> 
> Though I don't expect you were asking for this, confirmation below that
> the output doesn't change.  (Figured I may as well include that info
> since the rebase to include the JSON output that David had just added
> required re-sending an email anyway).
> 
> 
> 
> ```
> hw-a20-8:checking-theory [10:07:01] $ ${old_build}/gcc/xgcc -B${old_build}/gcc/    -fdiagnostics-plain-output    -Os  -w -S test-sum.c -o /dev/null   -ftime-report
> 
> Time variable                                   usr           sys          wall           GGC
>  phase setup                        :   0.01 ( 14%)   0.00 (  0%)   0.02 ( 18%)  3389k ( 74%)
>  phase parsing                      :   0.03 ( 43%)   0.02 ( 67%)   0.06 ( 55%)   982k ( 21%)
>  phase opt and generate             :   0.03 ( 43%)   0.01 ( 33%)   0.03 ( 27%)   215k (  5%)
>  callgraph functions expansion      :   0.02 ( 29%)   0.00 (  0%)   0.03 ( 27%)   162k (  4%)
>  callgraph ipa passes               :   0.01 ( 14%)   0.01 ( 33%)  -0.00 ( -0%)    38k (  1%)
>  CFG verifier                       :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)     0  (  0%)
>  preprocessing                      :   0.02 ( 29%)   0.02 ( 67%)   0.02 ( 18%)   272k (  6%)
>  lexical analysis                   :   0.01 ( 14%)   0.00 (  0%)   0.02 ( 18%)     0  (  0%)
>  parser (global)                    :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)   637k ( 14%)
>  parser function body               :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    18k (  0%)
>  tree CFG cleanup                   :   0.00 (  0%)   0.01 ( 33%)   0.00 (  0%)     0  (  0%)
>  tree STMT verifier                 :   0.01 ( 14%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
>  expand                             :   0.01 ( 14%)   0.00 (  0%)   0.00 (  0%)    12k (  0%)
>  loop init                          :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    12k (  0%)
>  initialize rtl                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    15k (  0%)
>  rest of compilation                :   0.01 ( 14%)   0.00 (  0%)   0.00 (  0%)  6920  (  0%)
>  TOTAL                              :   0.07          0.03          0.11         4587k
> Extra diagnostic checks enabled; compiler may run slowly.
> Configure with --enable-checking=release to disable checks.
> hw-a20-8:checking-theory [10:06:44] $ ${new_build}/gcc/xgcc -B${new_build}/gcc/    -fdiagnostics-plain-output    -Os  -w -S test-sum.c -o /dev/null   -ftime-report
> 
> Time variable                                   usr           sys          wall           GGC
>  phase setup                        :   0.01 ( 17%)   0.00 (  0%)   0.02 ( 18%)  3389k ( 74%)
>  phase parsing                      :   0.02 ( 33%)   0.03 ( 75%)   0.05 ( 45%)   982k ( 21%)
>  phase opt and generate             :   0.03 ( 50%)   0.01 ( 25%)   0.04 ( 36%)   215k (  5%)
>  callgraph construction             :   0.00 (  0%)   0.01 ( 25%)   0.00 (  0%)  1864  (  0%)
>  callgraph functions expansion      :   0.02 ( 33%)   0.00 (  0%)   0.03 ( 27%)   162k (  4%)
>  callgraph ipa passes               :   0.01 ( 17%)   0.00 (  0%)   0.01 (  9%)    38k (  1%)
>  ipa free lang data                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)     0  (  0%)
>  CFG verifier                       :   0.02 ( 33%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
>  preprocessing                      :   0.01 ( 17%)   0.03 ( 75%)   0.01 (  9%)   272k (  6%)
>  lexical analysis                   :   0.01 ( 17%)   0.00 (  0%)   0.02 ( 18%)     0  (  0%)
>  parser (global)                    :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)   637k ( 14%)
>  parser inl. func. body             :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    19k (  0%)
>  tree STMT verifier                 :   0.01 ( 17%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
>  expand                             :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    12k (  0%)
>  integrated RA                      :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    50k (  1%)
>  initialize rtl                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  9%)    15k (  0%)
>  TOTAL                              :   0.06          0.04          0.11         4587k
> Extra diagnostic checks enabled; compiler may run slowly.
> Configure with --enable-checking=release to disable checks.
> ```
> 
> 
> N.b. in the change below it's worth mentioning that the
> `nanosec_to_floating_sec` and `percent_of` macros were already defined.
> I just moved them to below the `make_json_for_timevar_time_def`
> function.

LGTM.

Thanks for doing this,
Richard.

> ###############          Changes to the previous patch.          ###############
> 
> 
> @@ -832,12 +841,16 @@ json::object *
>  make_json_for_timevar_time_def (const timevar_time_def &ttd)
>  {
>    json::object *obj = new json::object ();
> -  obj->set ("user", new json::float_number (ttd.user));
> -  obj->set ("sys", new json::float_number (ttd.sys));
> -  obj->set ("wall", new json::float_number (ttd.wall));
> +  obj->set ("user",
> +	    new json::float_number (nanosec_to_floating_sec (ttd.user)));
> +  obj->set ("sys", new json::float_number (nanosec_to_floating_sec (ttd.sys)));
> +  obj->set ("wall",
> +	    new json::float_number (nanosec_to_floating_sec (ttd.wall)));
>    obj->set ("ggc_mem", new json::integer_number (ttd.ggc_mem));
>    return obj;
>  }
> +#undef nanosec_to_floating_sec
> +#undef percent_of
>  
>  /* Create a json value representing this object, suitable for use
>     in SARIF output.  */
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-08-04 10:18 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-21 12:11 [PATCH] Reduce floating-point difficulties in timevar.cc Matthew Malcomson
2023-07-21 12:47 ` Richard Biener
2023-07-21 13:11   ` Matthew Malcomson
2023-07-21 13:41     ` Richard Biener
2023-07-21 13:45     ` Xi Ruoyao
2023-07-21 13:58       ` Alexander Monakov
2023-07-21 15:11         ` Xi Ruoyao
2023-07-21 15:56           ` Alexander Monakov
2023-07-21 15:24       ` Matthew Malcomson
2023-08-03 13:38   ` [PATCH] mid-end: Use integral time intervals " Matthew Malcomson
2023-08-03 14:09     ` David Malcolm
2023-08-03 14:54       ` Matthew Malcomson
2023-08-03 15:34         ` David Malcolm
2023-08-04  6:44     ` Richard Biener
2023-08-04  9:41       ` Matthew Malcomson
2023-08-04 10:18         ` Richard Biener
2023-07-21 12:49 ` [PATCH] Reduce floating-point difficulties " Xi Ruoyao
2023-07-21 17:41 ` Andrew Pinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).