public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* feedback on gprofng
@ 2023-03-27 14:38 mikedunlavey44
  2023-03-27 19:04 ` Ruud van der Pas
  0 siblings, 1 reply; 4+ messages in thread
From: mikedunlavey44 @ 2023-03-27 14:38 UTC (permalink / raw)
  To: binutils

[-- Attachment #1: Type: text/plain, Size: 971 bytes --]

I suggest you have another output from gprofng:

 

(I assume the sampling is on wall-clock time, so it has visibility into
I/O.)

 

Let the user choose a small number N, like 10 or 20, and then select N
stacks at random (with source code line info) and display them, in a tree or
in raw form. The point is - any performance problem consists of activity
that isn't necessary, and if it accounts for fraction F of time, then it
will show up on NF samples. High precision of measurement is not necessary,
but precision of insight is. 

 

If there are multiple threads, let each sample be from all running threads
at the same time, so the user can see which threads are waiting for which
other threads at the point in time.

 

Let me know if this makes sense, or maybe you've already done it.

 

Thanks,

Mike Dunlavey

 

P.S. I've been advocating this for years on StackOverflow. People who've
tried it agree that it works. I've also got a YouTube video about it.

 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: feedback on gprofng
  2023-03-27 14:38 feedback on gprofng mikedunlavey44
@ 2023-03-27 19:04 ` Ruud van der Pas
  2023-03-30 12:23   ` mikedunlavey44
  0 siblings, 1 reply; 4+ messages in thread
From: Ruud van der Pas @ 2023-03-27 19:04 UTC (permalink / raw)
  To: mikedunlavey44; +Cc: binutils

Hi Mike,

Thanks for your email and suggestion.

> I suggest you have another output from gprofng:

We actually use sampling. It is the cornerstone of our approach.

> (I assume the sampling is on wall-clock time, so it has visibility into
> I/O.)

Yes, we do and indeed have visibility into I/O.

We actually just heard of a use case where gprofng shows that the (formatted)
I/O is the sequential bottleneck in a multithreaded application.

We record call stacks, also for individual threads, using sampling.

The user can control the sampling granularity, either symbolically (e.g.
"high", or "low"), or through a sampling rate.

There are also filters to select call stacks, a window in time, threads,
etc.

We also just released a GUI that, among other things, has a timeline
where we show color coded call stacks. Each function we see gets assigned
a color and we show these call stacks as a function of time. This is 
also done for the threads in a multithreaded application.

In this timeline, it is really easy to identify gaps in the execution.
You literally see them and by inspecting the call stacks, you can find
out where the execution was when it happened.

Kind regards, Ruud

> 
> 
> 
> Let the user choose a small number N, like 10 or 20, and then select N
> stacks at random (with source code line info) and display them, in a tree or
> in raw form. The point is - any performance problem consists of activity
> that isn't necessary, and if it accounts for fraction F of time, then it
> will show up on NF samples. High precision of measurement is not necessary,
> but precision of insight is. 
> 
> 
> 
> If there are multiple threads, let each sample be from all running threads
> at the same time, so the user can see which threads are waiting for which
> other threads at the point in time.
> 
> 
> 
> Let me know if this makes sense, or maybe you've already done it.
> 
> 
> 
> Thanks,
> 
> Mike Dunlavey
> 
> 
> 
> P.S. I've been advocating this for years on StackOverflow. People who've
> tried it agree that it works. I've also got a YouTube video about it.
> 
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: feedback on gprofng
  2023-03-27 19:04 ` Ruud van der Pas
@ 2023-03-30 12:23   ` mikedunlavey44
       [not found]     ` <05CC91CC-6095-4DD9-83E3-0A1DC2A5E4D9@oracle.com>
  0 siblings, 1 reply; 4+ messages in thread
From: mikedunlavey44 @ 2023-03-30 12:23 UTC (permalink / raw)
  To: 'Ruud van der Pas'; +Cc: binutils

Thanks for responding, Ruud.
Your profiler does things, in my opinion, correctly. It samples the stack at
the line-of-code or instruction level, on wall-clock time.
My only suggestion is an output option. Let the user see a small number,
randomly chosen, of such stacks in raw form.
Here's an example:
https://softwareengineering.stackexchange.com/a/302345/2429

-----Original Message-----
From: Ruud van der Pas <ruud.vanderpas@oracle.com> 
Sent: Monday, March 27, 2023 15:04
To: mikedunlavey44@gmail.com
Cc: binutils@sourceware.org
Subject: Re: feedback on gprofng

Hi Mike,

Thanks for your email and suggestion.

> I suggest you have another output from gprofng:

We actually use sampling. It is the cornerstone of our approach.

> (I assume the sampling is on wall-clock time, so it has visibility 
> into
> I/O.)

Yes, we do and indeed have visibility into I/O.

We actually just heard of a use case where gprofng shows that the
(formatted) I/O is the sequential bottleneck in a multithreaded application.

We record call stacks, also for individual threads, using sampling.

The user can control the sampling granularity, either symbolically (e.g.
"high", or "low"), or through a sampling rate.

There are also filters to select call stacks, a window in time, threads,
etc.

We also just released a GUI that, among other things, has a timeline where
we show color coded call stacks. Each function we see gets assigned a color
and we show these call stacks as a function of time. This is also done for
the threads in a multithreaded application.

In this timeline, it is really easy to identify gaps in the execution.
You literally see them and by inspecting the call stacks, you can find out
where the execution was when it happened.

Kind regards, Ruud

> 
> 
> 
> Let the user choose a small number N, like 10 or 20, and then select N 
> stacks at random (with source code line info) and display them, in a 
> tree or in raw form. The point is - any performance problem consists 
> of activity that isn't necessary, and if it accounts for fraction F of 
> time, then it will show up on NF samples. High precision of 
> measurement is not necessary, but precision of insight is.
> 
> 
> 
> If there are multiple threads, let each sample be from all running 
> threads at the same time, so the user can see which threads are 
> waiting for which other threads at the point in time.
> 
> 
> 
> Let me know if this makes sense, or maybe you've already done it.
> 
> 
> 
> Thanks,
> 
> Mike Dunlavey
> 
> 
> 
> P.S. I've been advocating this for years on StackOverflow. People 
> who've tried it agree that it works. I've also got a YouTube video about
it.
> 
> 
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: feedback on gprofng
       [not found]       ` <004b01d9631b$4e9a4460$ebcecd20$@gmail.com>
@ 2023-03-30 18:05         ` Ruud van der Pas
  0 siblings, 0 replies; 4+ messages in thread
From: Ruud van der Pas @ 2023-03-30 18:05 UTC (permalink / raw)
  To: mikedunlavey44; +Cc: binutils

Hi Mike,

> If you have or can point to a realistic example in C or C++, preferably a batch program and preferably large, I will show you what I mean.
> I’m on Windows, but I can run gnu compiler and debugger.

Thanks.

I'm afraid I don't have such an example, but how about submitting an RFE?

https://sourceware.org/bugzilla/ -> New -> binutils

Component: gprofng
Version: 2.40

In that way we can keep track of this request.

If you can add an example there, or add a link to an example, we can better
understand what would be required from our side.

It doesn't have to be a large example by the way, but the more specific
it is, the better it will be.

Alternatively, I can submit such an RFE, but if you have an example
that we can look at, and ideally that I can include, it will be really helpful.

Kind regards, Ruud

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-03-30 18:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-27 14:38 feedback on gprofng mikedunlavey44
2023-03-27 19:04 ` Ruud van der Pas
2023-03-30 12:23   ` mikedunlavey44
     [not found]     ` <05CC91CC-6095-4DD9-83E3-0A1DC2A5E4D9@oracle.com>
     [not found]       ` <004b01d9631b$4e9a4460$ebcecd20$@gmail.com>
2023-03-30 18:05         ` Ruud van der Pas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).