public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug gcov-profile/98257] New: Replace Donald B. Johnson's cycle enumeration with iterative loop finding
@ 2020-12-13  2:54 i at maskray dot me
  2020-12-14 18:20 ` [Bug gcov-profile/98257] " marxin at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: i at maskray dot me @ 2020-12-13  2:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98257

            Bug ID: 98257
           Summary: Replace Donald B. Johnson's cycle enumeration with
                    iterative loop finding
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: gcov-profile
          Assignee: unassigned at gcc dot gnu.org
          Reporter: i at maskray dot me
                CC: marxin at gcc dot gnu.org
  Target Milestone: ---

gcov used _J. C. Tiernan, An Efficient Search Algorithm to Find the Elementary
Circuits of a Graph, Comm ACM 1970_. The worst-case time bound is exponential
in the number of elementary circuits. It enumerated cycles (aka simple circuit,
aka elementary circuit) and performed cycle cancelling.

In 2016, the resolution to PR67992 switched to Donald B. Johnson's algorithm to
improve performance. The theoretical time complexity is $O((V+E)(c+1))$ where
$c$ is the number of cycles, which is exponential in the size of the graph.
(Boost attributed the algorithm to K. A. Hawick and H. A. James, and gcov
inherited this name. However, that paper did not improve Johnson's algorithm.)

Actually every step of cycle cancelling decreases the count of at lease one arc
to 0, so there is at most $O(E)$ cycles. The resolution to PR90380 skipped
non-positive arcs and decreased the time complexity to $O(V*E^2)$ (in theory it
could be $O(E^2)$ but the implementation has a linear scan).

This is all unnecessary. We can just iteratively find cycles (using the
classical tri-color DFS) and perform cycle cancelling. There are at most O(E)
cycles and the overall time complexity is O(E^2). 

(
We are processing a reducible flow graph (there is no intuitive cycle count for
an irreducible flow graph).
Every natural loop is identified by a back edge. By constructing a dominator
tree, finding back edges, identifying natural loops and clearing the arc
counters (we will compute incoming counts so we clear counters to prevent
duplicates), the time complexity can be decreased to $O(depthOfNestedLoops*E)$.
In practice, the semi-NCA algorithm (time complexity: $O(V^2)$, but considered
faster than the almost linear Lengauer-Tarjan's algorithm) is not difficult to
implement, but identifying natural loops is troublesome. So the method is not
useful.)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug gcov-profile/98257] Replace Donald B. Johnson's cycle enumeration with iterative loop finding
  2020-12-13  2:54 [Bug gcov-profile/98257] New: Replace Donald B. Johnson's cycle enumeration with iterative loop finding i at maskray dot me
@ 2020-12-14 18:20 ` marxin at gcc dot gnu.org
  2021-01-04 14:24 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-12-14 18:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98257

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-12-14
     Ever confirmed|0                           |1

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
Thank you for the feedback.

I have 2 basic questions:
1) Are you willing to provide patches that will improve the algorithm?
2) Do you see a real-world program that suffers from the currently used
algorithm?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug gcov-profile/98257] Replace Donald B. Johnson's cycle enumeration with iterative loop finding
  2020-12-13  2:54 [Bug gcov-profile/98257] New: Replace Donald B. Johnson's cycle enumeration with iterative loop finding i at maskray dot me
  2020-12-14 18:20 ` [Bug gcov-profile/98257] " marxin at gcc dot gnu.org
@ 2021-01-04 14:24 ` rguenth at gcc dot gnu.org
  2021-10-06 14:40 ` jasonm at cadence dot com
  2021-10-07 14:16 ` marxin at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-04 14:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98257

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
There's also 'A New Algorithm for Identifying Loops in Decompilation' by
Tao Wei, Jian Mao, Wei Zou and You Chen of the Institute of Computer Science
and Technology of the Peking University with similar complexity which is used
by rev_post_order_and_mark_dfs_back_seme.  Note for irreducible regions there
can be multiple entries into a cycle and what edges are "back" depends on the
visitation order of the entries.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug gcov-profile/98257] Replace Donald B. Johnson's cycle enumeration with iterative loop finding
  2020-12-13  2:54 [Bug gcov-profile/98257] New: Replace Donald B. Johnson's cycle enumeration with iterative loop finding i at maskray dot me
  2020-12-14 18:20 ` [Bug gcov-profile/98257] " marxin at gcc dot gnu.org
  2021-01-04 14:24 ` rguenth at gcc dot gnu.org
@ 2021-10-06 14:40 ` jasonm at cadence dot com
  2021-10-07 14:16 ` marxin at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: jasonm at cadence dot com @ 2021-10-06 14:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98257

Jason McCampbell <jasonm at cadence dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jasonm at cadence dot com

--- Comment #3 from Jason McCampbell <jasonm at cadence dot com> ---
I am running into a performance issue that I believe is related to this ticket.
GCOV is taking 20+ minutes on a single file on a fast server. All of the time
is spent in recursive calls to 'circuit'. I also note that there appear to be a
lot of linear searches through the vectors, though I haven't looked to see the
size of the vectors, whether that's an issue or not.

The problematic file in our case uses Boost program options. There are enough
options defined that GCC warns that the variable tracking size limit was
exceeded. I can easily test proposed changes and will see if I can generate a
post-able test case that reproduces the issue.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug gcov-profile/98257] Replace Donald B. Johnson's cycle enumeration with iterative loop finding
  2020-12-13  2:54 [Bug gcov-profile/98257] New: Replace Donald B. Johnson's cycle enumeration with iterative loop finding i at maskray dot me
                   ` (2 preceding siblings ...)
  2021-10-06 14:40 ` jasonm at cadence dot com
@ 2021-10-07 14:16 ` marxin at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-10-07 14:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98257

--- Comment #4 from Martin Liška <marxin at gcc dot gnu.org> ---
Yes, please create a test-case I can possibly debug and fix.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-07 14:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-13  2:54 [Bug gcov-profile/98257] New: Replace Donald B. Johnson's cycle enumeration with iterative loop finding i at maskray dot me
2020-12-14 18:20 ` [Bug gcov-profile/98257] " marxin at gcc dot gnu.org
2021-01-04 14:24 ` rguenth at gcc dot gnu.org
2021-10-06 14:40 ` jasonm at cadence dot com
2021-10-07 14:16 ` marxin at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).