public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* eu-stacktrace: roadmap and discussion thread
@ 2023-05-08 12:33 Serhei Makarov
  2023-05-08 21:50 ` Christian Hergert
  2023-05-09 18:02 ` Milian Wolff
  0 siblings, 2 replies; 6+ messages in thread
From: Serhei Makarov @ 2023-05-08 12:33 UTC (permalink / raw)
  To: elfutils-devel; +Cc: fche, mark, chergert

Hello all,

I wanted to open up public discussion on a project I'm looking to
develop in elfutils, tentatively named eu-stacktrace. I've started to
write code on branch users/serhei/eu-stacktrace.

eu-stacktrace will be a utility to process a stream of raw stack
samples (such as those obtained from the Linux kernel's
PERF_SAMPLE_STACK facility) into a stream of stack traces (such as the
ones obtained from PERF_SAMPLE_CALLCHAIN), freeing various profiling
utilities from having to implement their own backtracing logic.

My initial goal is to make the tool work with a (slightly modified)
version of the sysprof profiler. If all goes well, I hope to produce a
demonstration of sysprof using elfutils eu-stacktrace and eh_frame
data to produce useful profiles on code compiled with
-fomit-frame-pointer. (I'm aware of the problem of profiling
-fomit-frame-pointer programs being a topic of some fairly contentious
recent discussion, which I'm not looking to rehash; I'm just
interested to see if I can add a viable technical solution to the
mix.) I'm cc:ing chergert and posting a link to this thread on
GNOME Discourse so that sysprof developers can keep track of the
discussion.

For the time being, eu-stacktrace is meant to be fed data from a
profiling tool via a pipe or fifo. We will see how well this idea
works as implementation proceeds.

The eventual goal is to work with various profiler data formats. After
sysprof, supporting perf's native data format is an obvious
prerequisite for merging the users/serhei/eu-stacktrace branch into
elfutils. Ideally, I would like for eu-stacktrace to also convert
between different profile data formats (e.g. taking sysprof data as
input and emitting perf data, and vice-versa), but this may be
out-of-scope given the amount of code that would need to be written to
handle profile data other than stack traces.

Usage instructions will be kept up-to-date in README.eu-stacktrace on
the topic branch:

- https://sourceware.org/cgit/elfutils/tree/README.eu-stacktrace?h=users/serhei/eu-stacktrace

All the best,
  Serhei Makarov

PS. More information follows.

* * *

My current roadmap for the prototype with sysprof is as follows:

# 1. Get build-ids of all executables as sysprof encounters them.

Build-id data can be obtained by coding sysprof to support
PERF_RECORD_MMAP2 rather than PERF_RECORD_MMAP. As far as I
understand, there are indications this would be a welcome patch for
the sysprof project.

# 2. Get stack samples with PERF_SAMPLE_STACK; pipe to eu-stacktrace.

Within sysprof, add an option to switch the perf data source to use
PERF_SAMPLE_STACK rather than PERF_SAMPLE_CALLCHAIN. The capture
writer will write the data to a pipe to be processed by eu-stacktrace;
thus the stack samples never hit the disk.

Within eu-stacktrace, I'm implementing the code to accept data in
sysprof format, as defined in the public header (e.g. sysprof-devel
package on Fedora provides
/usr/include/sysprof-4/sysprof-capture-types.h).

# 3. Implement eh_frame / dwarf-via-debuginfod data retrieval in eu-stacktrace.

I am hoping that eh_frame data will be sufficient, but elfutils
includes support for retrieving data via debuginfod as a
fallback.

There are a number of use cases relating to executables inside
containers that sysprof handles with clever logic. If I want to match
the profile coverage of plain sysprof with sysprof+eu-stacktrace, some
contemplation is required as to whether I need to duplicate that
logic, or to leverage sysprof's codebase directly from eu-stacktrace.

# 4. Implement and benchmark naive unwinding of all samples as they come in.

Within eu-stacktrace, once we have the stack samples and the .eh_frame
data accessible, use them to unwind the stack sample and output the
resulting compact stack traces as callchain frames in sysprof's
currently-existing format. Resulting pipeline:

Of course, it is possible that eu-stacktrace is so slow that an
unsuitable amount of data piles up in the pipe. This would be
guaranteed if we need to retrieve data from debuginfod.

# 5. If needed, scope out / implement async preparation of unwinder data.

If eu-stacktrace cannot handle all of the stack samples in real time,
there is a scheme that will allow us to reach good-enough profile
coverage (e.g. 90%+ on a long-enough run) by caching data structures
pertaining to a repeatedly-encountered code location and using a
JIT-style 'priming' scheme.

The overall idea: the first time we encounter a code location, we
would drop the sample and initiate whatever preparation procedure
(setting up data structures or retrieving data via debuginfod) is
needed to unwind from that code location successfully.

After the preparation procedure completes, we will be able to unwind
future samples based at that code location.

Within sysprof, we could add code to display a percentage indicator of
how many samples in the profile were successfully converted to stack
traces. This could be provided by having eu-stacktrace export the
number to a procfs-style file which sysprof can poll and incorporate
into its live statistic UI that already displays a running total of
the number of samples. As the eu-stacktrace cache is primed with data,
the success rate will rise -- in my simulated scenarios, it routinely
reached 90%+ -- and the sysprof user can keep an eye on the indicator
and stop profiling once the percentage has reached a satisfyingly high
value.

I am sure the details will be complex and interesting to work out, but
I also hope this is not actually needed outside of unusual cases.

# 6. Implement support for stitching stack traces to always reach the root.

For a top-down profile visualization, it's not crucial to accurately
unwind 100% of the samples, but it is important that the
accurately-unwound samples reach the root of the stack. However,
PERF_SAMPLE_STACK only provides a fixed-size sample of the stack,
which may not include the root. This can be worked around with
per-thread caching of the last-known state of the entire stack.
Frank Ch. Eigler and I brainstormed around 5-6 possibilities
for how to maintain this cache.

* * *

Based on the above staging, the required changes to sysprof would be
reduced to the following four:

1. Collect build-id data via PERF_RECORD_MMAP2 rather than PERF_RECORD_MMAP
2. Collect stack samples via PERF_SAMPLE_STACK rather than PERF_SAMPLE_CALLCHAIN
3. Output the sample frames to a pipe connected to eu-stacktrace
4. (If needed,) poll a procfs-style file updated by eu-stacktrace to
   receive and display the percentage of successfully-unwound frames

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: eu-stacktrace: roadmap and discussion thread
  2023-05-08 12:33 eu-stacktrace: roadmap and discussion thread Serhei Makarov
@ 2023-05-08 21:50 ` Christian Hergert
  2023-05-11 19:27   ` Serhei Makarov
  2023-05-09 18:02 ` Milian Wolff
  1 sibling, 1 reply; 6+ messages in thread
From: Christian Hergert @ 2023-05-08 21:50 UTC (permalink / raw)
  To: Serhei Makarov, elfutils-devel; +Cc: fche, mark

First off, this all sounds great!

I'm not on the mailing list, so apologies if this takes extra-effort to 
show up there.

On 5/8/23 5:33 AM, Serhei Makarov wrote:
> eu-stacktrace will be a utility to process a stream of raw stack
> samples (such as those obtained from the Linux kernel's
> PERF_SAMPLE_STACK facility) into a stream of stack traces (such as the
> ones obtained from PERF_SAMPLE_CALLCHAIN), freeing various profiling
> utilities from having to implement their own backtracing logic.

 From a consumption standpoint, it would be nice if Sysprof could get a 
perf stream where the PERF_SAMPLE_STACK are transparently converted to 
PERF_SAMPLE_CALLCHAIN. I don't think eu-stacktrace necessarily needs to 
speak Sysprof's capture API.

Sysprof already contains `sysprofd` which can open the Perf event stream 
for us via d-bus/CAP_SYS_ADMIN/polkit integration. After Sysprof gets 
the perf FD back from sysprofd we could spawn eu-stacktrace giving it 
the FD to consume and another FD to write the translated/passthrough events.

Sysprof can do offline symbolizing of frames which is somewhat important 
when trying to analyze profiles from an embedded device, a machine that 
is disk/network constrained, or end-user-system via bug reports. We can 
fairly trivially teach Sysprof to do symbolizing via debuginfod.

In the case you're describing, is it right that you may not be able to 
unwind stack frames without debuginfod because there was no way to 
locate the `.eh_frame` section for the binary?

The code to do the mount namespace conversion is quite terrible in 
Sysprof and even now I'm in the midst of cleaning it up. We have to both 
create a "view" of the namespace as it was to the PID as well as a way 
to convert that view into something the mount namespace analyzing the 
capture file might be able to open. Any of these may or may not be in a 
rootless container (toolbox/podman/flatpak/etc).

Whether or not this is something we can eventually contain inside of 
bubblewrap is another can of worms.

Thanks again for all your work on this, I'm very excited to see what we 
can come up with!

-- Christian


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: eu-stacktrace: roadmap and discussion thread
  2023-05-08 12:33 eu-stacktrace: roadmap and discussion thread Serhei Makarov
  2023-05-08 21:50 ` Christian Hergert
@ 2023-05-09 18:02 ` Milian Wolff
  2023-05-11 19:31   ` Serhei Makarov
  1 sibling, 1 reply; 6+ messages in thread
From: Milian Wolff @ 2023-05-09 18:02 UTC (permalink / raw)
  To: elfutils-devel, Serhei Makarov; +Cc: fche, mark, chergert

[-- Attachment #1: Type: text/plain, Size: 3162 bytes --]

On Montag, 8. Mai 2023 14:33:57 CEST Serhei Makarov wrote:
> Hello all,
> 
> I wanted to open up public discussion on a project I'm looking to
> develop in elfutils, tentatively named eu-stacktrace. I've started to
> write code on branch users/serhei/eu-stacktrace.
> 
> eu-stacktrace will be a utility to process a stream of raw stack
> samples (such as those obtained from the Linux kernel's
> PERF_SAMPLE_STACK facility) into a stream of stack traces (such as the
> ones obtained from PERF_SAMPLE_CALLCHAIN), freeing various profiling
> utilities from having to implement their own backtracing logic.
> 
> My initial goal is to make the tool work with a (slightly modified)
> version of the sysprof profiler. If all goes well, I hope to produce a
> demonstration of sysprof using elfutils eu-stacktrace and eh_frame
> data to produce useful profiles on code compiled with
> -fomit-frame-pointer. (I'm aware of the problem of profiling
> -fomit-frame-pointer programs being a topic of some fairly contentious
> recent discussion, which I'm not looking to rehash; I'm just
> interested to see if I can add a viable technical solution to the
> mix.) I'm cc:ing chergert and posting a link to this thread on
> GNOME Discourse so that sysprof developers can keep track of the
> discussion.
> 
> For the time being, eu-stacktrace is meant to be fed data from a
> profiling tool via a pipe or fifo. We will see how well this idea
> works as implementation proceeds.
> 
> The eventual goal is to work with various profiler data formats. After
> sysprof, supporting perf's native data format is an obvious
> prerequisite for merging the users/serhei/eu-stacktrace branch into
> elfutils. Ideally, I would like for eu-stacktrace to also convert
> between different profile data formats (e.g. taking sysprof data as
> input and emitting perf data, and vice-versa), but this may be
> out-of-scope given the amount of code that would need to be written to
> handle profile data other than stack traces.

<snip>

Hey Serhey,

sounds like a fun project. If you want to see some prior art in that area, do 
have a look at perfparser [1], esp. [2], and [3]. We solved quite a few of the 
problems you might encounter in this area. Esp. for good performance, you'll 
need quite a bit of caching on the stacktrace side, such as [4] and [5].

[1]: https://codereview.qt-project.org/gitweb?p=qt-creator%2Fperfparser.git;a=summary
[2]: https://codereview.qt-project.org/gitweb?p=qt-creator/
perfparser.git;a=blob;f=app/
perfunwind.cpp;h=9c09740c756beabac43b45ed6f34bbcfc77e0860;hb=refs/heads/master
[3]: https://codereview.qt-project.org/gitweb?p=qt-creator/
perfparser.git;a=blob;f=app/
perfunwind.cpp;h=9c09740c756beabac43b45ed6f34bbcfc77e0860;hb=refs/heads/master
[4]: https://codereview.qt-project.org/gitweb?p=qt-creator/
perfparser.git;a=blob;f=app/
perfaddresscache.cpp;h=8ff26d09a71aff0343378ff062c8fb2fcf601c08;hb=refs/heads/
master
[5]:  https://codereview.qt-project.org/gitweb?p=qt-creator/
perfparser.git;a=blob;f=app/
perfdwarfdiecache.cpp;h=10e432e64cdce6c5e9e6a72a49a1cd73ddb9e7ce;hb=refs/
heads/master

Good luck!
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: eu-stacktrace: roadmap and discussion thread
  2023-05-08 21:50 ` Christian Hergert
@ 2023-05-11 19:27   ` Serhei Makarov
  2023-05-15 17:55     ` Christian Hergert
  0 siblings, 1 reply; 6+ messages in thread
From: Serhei Makarov @ 2023-05-11 19:27 UTC (permalink / raw)
  To: Christian Hergert, builder---; +Cc: Frank Ch. Eigler, Mark Wielaard

On Mon, May 8, 2023, at 5:50 PM, Christian Hergert wrote:
> First off, this all sounds great!
> ...
>
>  From a consumption standpoint, it would be nice if Sysprof could get a 
> perf stream where the PERF_SAMPLE_STACK are transparently converted to 
> PERF_SAMPLE_CALLCHAIN. I don't think eu-stacktrace necessarily needs to 
> speak Sysprof's capture API.
>...
>
> Sysprof already contains `sysprofd` which can open the Perf event stream 
> for us via d-bus/CAP_SYS_ADMIN/polkit integration. After Sysprof gets 
> the perf FD back from sysprofd we could spawn eu-stacktrace giving it 
> the FD to consume and another FD to write the translated/passthrough events.
The idea with sysprofd in the below quote sounds intriguing; I think we
can experiment with it once we have some perf parser code of our own.

For now, I'm satisfied with the fact that the patch to enable Sysprof to pipe
data to eu-stacktrace is very small, and the parsing code I'm working on
for separating out Sysprof capture frames is also quite small. Both are
easy to adapt to any refactoring or even data format changes you
might happen to do.

> Sysprof can do offline symbolizing of frames which is somewhat important 
> when trying to analyze profiles from an embedded device, a machine that 
> is disk/network constrained, or end-user-system via bug reports. We can 
> fairly trivially teach Sysprof to do symbolizing via debuginfod.
Yep -- and that could be a separate patchset, since the current approach
I'm using changes nothing about the Sysprof symbolizing phase.

> In the case you're describing, is it right that you may not be able to 
> unwind stack frames without debuginfod because there was no way to 
> locate the `.eh_frame` section for the binary?
Yep -- for containers, I considered debuginfod as a possible fallback
if the .eh_frame data isn't accessible on the local filesystem.
Of course that doesn't work for non-packaged container programs,
unless the developers of those programs set up debuginfod.
In general the workflows for debuginfo on containers are
rather under-developed.

Honestly this isn't a scenario to implement as a first priority,
but I do want to keep track of everything that's required
to have sysprof+eu_stacktrace+eh_frame+PERF_SAMPLE_STACK
maintain feature parity with sysprof+-fno-omit-fp+PERF_SAMPLE_CALLCHAIN.
Thus I am making sure to list such consideration.

> The code to do the mount namespace conversion is quite terrible in 
> Sysprof and even now I'm in the midst of cleaning it up. We have to both 
> create a "view" of the namespace as it was to the PID as well as a way 
> to convert that view into something the mount namespace analyzing the 
> capture file might be able to open. Any of these may or may not be in a 
> rootless container (toolbox/podman/flatpak/etc).
>
> Whether or not this is something we can eventually contain inside of 
> bubblewrap is another can of worms.
One possible solution....

For this type of situation, we could perhaps give eu-stacktrace an
optional dependency on sysprof-devel.

Right now, I rely on sysprof-capture-types.h but that's just the header
and therefore not a runtime dependency.

When the sysprof libraries are available, we could also use sysprof's
mount namespace conversion code (in whatever eventual form it takes)
to get .eh_frame data.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: eu-stacktrace: roadmap and discussion thread
  2023-05-09 18:02 ` Milian Wolff
@ 2023-05-11 19:31   ` Serhei Makarov
  0 siblings, 0 replies; 6+ messages in thread
From: Serhei Makarov @ 2023-05-11 19:31 UTC (permalink / raw)
  To: Milian Wolff, builder---
  Cc: Frank Ch. Eigler, Mark Wielaard, Christian Hergert

On Tue, May 9, 2023, at 2:02 PM, Milian Wolff wrote:
> Hey Serhey,
>
> sounds like a fun project. If you want to see some prior art in that area, do 
> have a look at perfparser [1], esp. [2], and [3]. We solved quite a few of the 
> problems you might encounter in this area. Esp. for good performance, you'll 
> need quite a bit of caching on the stacktrace side, such as [4] and [5].
Excellent, I do look forward to finding out what kind of caching I'll need to implement,
and if your solutions work for our use case, then I think making them more
widely-available via elfutils would be a good outcome all round.

All the best,
     Serhei

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: eu-stacktrace: roadmap and discussion thread
  2023-05-11 19:27   ` Serhei Makarov
@ 2023-05-15 17:55     ` Christian Hergert
  0 siblings, 0 replies; 6+ messages in thread
From: Christian Hergert @ 2023-05-15 17:55 UTC (permalink / raw)
  To: Serhei Makarov, builder---; +Cc: Frank Ch. Eigler, Mark Wielaard

On 5/11/23 12:27 PM, Serhei Makarov wrote:
> When the sysprof libraries are available, we could also use sysprof's
> mount namespace conversion code (in whatever eventual form it takes)
> to get .eh_frame data.

Is it correct that we won't get a PERF_EVENT_MMAP2 for all the processes 
already running when the profiler starts?

If so, do you have a strategy for how we would go extract `build-id` to 
insert into the capture file and/or prime eu-stacktrace?

-- Christian


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-15 17:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-08 12:33 eu-stacktrace: roadmap and discussion thread Serhei Makarov
2023-05-08 21:50 ` Christian Hergert
2023-05-11 19:27   ` Serhei Makarov
2023-05-15 17:55     ` Christian Hergert
2023-05-09 18:02 ` Milian Wolff
2023-05-11 19:31   ` Serhei Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).