public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug translator/24363] New: Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode()
@ 2019-03-19 22:18 agentzh at gmail dot com
  2019-03-19 22:27 ` [Bug translator/24363] " agentzh at gmail dot com
  2019-03-22 15:23 ` fche at redhat dot com
  0 siblings, 2 replies; 3+ messages in thread
From: agentzh at gmail dot com @ 2019-03-19 22:18 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=24363

            Bug ID: 24363
           Summary: Slow pass 2 due to
                    varuse_collecting_visitor::visit_embeddedcode()
           Product: systemtap
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: translator
          Assignee: systemtap at sourceware dot org
          Reporter: agentzh at gmail dot com
  Target Milestone: ---

Created attachment 11688
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11688&action=edit
Sample stap script to reproduce the p2 slowness

Even though fche introduced a tagged_p() and find_string_memoized map to
optimize this last month, it is still slow when a lot of embeddedcode nodes are
present in the parse tree (or AST). A lot of time is spent on the rbtree as
well as the string comparisons.

My first attempt of addressing this with boolean flags gave very good
performance result but fche still has his concerns. This patch was submitted to
the mailing list: https://sourceware.org/ml/systemtap/2019-q1/msg00071.html

Using fche's suggested embedded* object level memo cache based on
std::unordered_map can yield 55% p2 time reduction, but still far from the
improvement of my original patch. The patch is here:
https://pastebin.com/vCbxcwdf There's still quite some overhead involved with
the hash table itself and std::string objects' construction and destruction,
according to the corresponding CPU flame graphs..

Not even on the same order of magnitude. And a global std::unordered_map would
make the performance 2x *worse*. The patch is here:
https://pastebin.com/HLPcmyLW According to the flame graph, the majority of the
CPU time now spends on the hash function on string pairs.

I've prepared an artificial .stp script which resembles my real .stp script.
I've attached it to this PR. Running it using the unpatched stap gives

Pass 2: analyzed script: 1 probe, 1015 functions, 1 embed, 999 globals using
373460virt/88080res/7724shr/80368data kb, in 7740usr/0sys/7742real ms.

while using my original patch doing boolean flags, it only takes a tiny
fraction of time on P2:

Pass 2: analyzed script: 1 probe, 1015 functions, 1 embed, 999 globals using
373476virt/88248res/7888shr/80392data kb, in 860usr/0sys/858real ms.

Not on the same order of magnitude at all.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug translator/24363] Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode()
  2019-03-19 22:18 [Bug translator/24363] New: Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode() agentzh at gmail dot com
@ 2019-03-19 22:27 ` agentzh at gmail dot com
  2019-03-22 15:23 ` fche at redhat dot com
  1 sibling, 0 replies; 3+ messages in thread
From: agentzh at gmail dot com @ 2019-03-19 22:27 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=24363

--- Comment #1 from agentzh <agentzh at gmail dot com> ---
Got some flame graphs for the generated .stp sample script attached to my
previous comment:

Classic CPU flame graph:

http://openresty.org/misc/flamegraph/stap-unordered-map-generated-script-2019-03-19.svg

Reversed CPU flame graph:

http://openresty.org/misc/flamegraph/inv-stap-unordered-map-generated-script-2019-03-19.svg

The stap has got the patch here: https://pastebin.com/vCbxcwdf And you can see
all those hash table overhead introduced by `std::unordered_map::find()`.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug translator/24363] Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode()
  2019-03-19 22:18 [Bug translator/24363] New: Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode() agentzh at gmail dot com
  2019-03-19 22:27 ` [Bug translator/24363] " agentzh at gmail dot com
@ 2019-03-22 15:23 ` fche at redhat dot com
  1 sibling, 0 replies; 3+ messages in thread
From: fche at redhat dot com @ 2019-03-22 15:23 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=24363

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |fche at redhat dot com
         Resolution|---                         |FIXED

--- Comment #2 from Frank Ch. Eigler <fche at redhat dot com> ---
commit 7ab4d70f5
commit b9392d1e3

constitute an algorithmic fix that takes pass-2 runtime
from 22s (stap 4.0 release)
through 19s (git HEAD^^)
down to 0.3s (git HEAD)

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-03-22 15:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-19 22:18 [Bug translator/24363] New: Slow pass 2 due to varuse_collecting_visitor::visit_embeddedcode() agentzh at gmail dot com
2019-03-19 22:27 ` [Bug translator/24363] " agentzh at gmail dot com
2019-03-22 15:23 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).