* [Bug translator/24363] New: Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode()
@ 2019-03-19 22:18 agentzh at gmail dot com
2019-03-19 22:27 ` [Bug translator/24363] " agentzh at gmail dot com
2019-03-22 15:23 ` fche at redhat dot com
0 siblings, 2 replies; 3+ messages in thread
From: agentzh at gmail dot com @ 2019-03-19 22:18 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=24363
Bug ID: 24363
Summary: Slow pass 2 due to
varuse_collecting_visitor::visit_embeddedcode()
Product: systemtap
Version: unspecified
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: translator
Assignee: systemtap at sourceware dot org
Reporter: agentzh at gmail dot com
Target Milestone: ---
Created attachment 11688
--> https://sourceware.org/bugzilla/attachment.cgi?id=11688&action=edit
Sample stap script to reproduce the p2 slowness
Even though fche introduced a tagged_p() and find_string_memoized map to
optimize this last month, it is still slow when a lot of embeddedcode nodes are
present in the parse tree (or AST). A lot of time is spent on the rbtree as
well as the string comparisons.
My first attempt of addressing this with boolean flags gave very good
performance result but fche still has his concerns. This patch was submitted to
the mailing list: https://sourceware.org/ml/systemtap/2019-q1/msg00071.html
Using fche's suggested embedded* object level memo cache based on
std::unordered_map can yield 55% p2 time reduction, but still far from the
improvement of my original patch. The patch is here:
https://pastebin.com/vCbxcwdf There's still quite some overhead involved with
the hash table itself and std::string objects' construction and destruction,
according to the corresponding CPU flame graphs..
Not even on the same order of magnitude. And a global std::unordered_map would
make the performance 2x *worse*. The patch is here:
https://pastebin.com/HLPcmyLW According to the flame graph, the majority of the
CPU time now spends on the hash function on string pairs.
I've prepared an artificial .stp script which resembles my real .stp script.
I've attached it to this PR. Running it using the unpatched stap gives
Pass 2: analyzed script: 1 probe, 1015 functions, 1 embed, 999 globals using
373460virt/88080res/7724shr/80368data kb, in 7740usr/0sys/7742real ms.
while using my original patch doing boolean flags, it only takes a tiny
fraction of time on P2:
Pass 2: analyzed script: 1 probe, 1015 functions, 1 embed, 999 globals using
373476virt/88248res/7888shr/80392data kb, in 860usr/0sys/858real ms.
Not on the same order of magnitude at all.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug translator/24363] Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode()
2019-03-19 22:18 [Bug translator/24363] New: Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode() agentzh at gmail dot com
@ 2019-03-19 22:27 ` agentzh at gmail dot com
2019-03-22 15:23 ` fche at redhat dot com
1 sibling, 0 replies; 3+ messages in thread
From: agentzh at gmail dot com @ 2019-03-19 22:27 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=24363
--- Comment #1 from agentzh <agentzh at gmail dot com> ---
Got some flame graphs for the generated .stp sample script attached to my
previous comment:
Classic CPU flame graph:
http://openresty.org/misc/flamegraph/stap-unordered-map-generated-script-2019-03-19.svg
Reversed CPU flame graph:
http://openresty.org/misc/flamegraph/inv-stap-unordered-map-generated-script-2019-03-19.svg
The stap has got the patch here: https://pastebin.com/vCbxcwdf And you can see
all those hash table overhead introduced by `std::unordered_map::find()`.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug translator/24363] Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode()
2019-03-19 22:18 [Bug translator/24363] New: Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode() agentzh at gmail dot com
2019-03-19 22:27 ` [Bug translator/24363] " agentzh at gmail dot com
@ 2019-03-22 15:23 ` fche at redhat dot com
1 sibling, 0 replies; 3+ messages in thread
From: fche at redhat dot com @ 2019-03-22 15:23 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=24363
Frank Ch. Eigler <fche at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
CC| |fche at redhat dot com
Resolution|--- |FIXED
--- Comment #2 from Frank Ch. Eigler <fche at redhat dot com> ---
commit 7ab4d70f5
commit b9392d1e3
constitute an algorithmic fix that takes pass-2 runtime
from 22s (stap 4.0 release)
through 19s (git HEAD^^)
down to 0.3s (git HEAD)
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-03-22 15:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-19 22:18 [Bug translator/24363] New: Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode() agentzh at gmail dot com
2019-03-19 22:27 ` [Bug translator/24363] " agentzh at gmail dot com
2019-03-22 15:23 ` fche at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).