[PATCH 1/2] Record edge true/false value for gcov

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 1/2] Record edge true/false value for gcov
@ 2024-06-25  8:04 Jørgen Kvalsvik
  2024-06-25  8:04 ` [PATCH 2/2] [RFC] Prime path coverage in gcc/gcov Jørgen Kvalsvik
  2024-06-25 21:37 ` [PATCH 1/2] Record edge true/false value for gcov Jeff Law
  0 siblings, 2 replies; 4+ messages in thread
From: Jørgen Kvalsvik @ 2024-06-25  8:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: hubicka, Jørgen Kvalsvik

Make gcov aware which edges are the true/false to more accurately
reconstruct the CFG.  There are plenty of bits left in arc_info and it
opens up for richer reporting.

gcc/ChangeLog:

	* gcov-io.h (GCOV_ARC_TRUE): New.
	(GCOV_ARC_FALSE): New.
	* gcov.cc (struct arc_info): Add true_value, false_value.
	(read_graph_file): Read true_value, false_value.
	* profile.cc (branch_prob): Write GCOV_ARC_TRUE, GCOV_ARC_FALSE.
---
 gcc/gcov-io.h  | 2 ++
 gcc/gcov.cc    | 8 ++++++++
 gcc/profile.cc | 4 ++++
 3 files changed, 14 insertions(+)

diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 20f805598f0..5dc467c92b1 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -337,6 +337,8 @@ GCOV_COUNTERS
 #define GCOV_ARC_ON_TREE 	(1 << 0)
 #define GCOV_ARC_FAKE		(1 << 1)
 #define GCOV_ARC_FALLTHROUGH	(1 << 2)
+#define GCOV_ARC_TRUE		(1 << 3)
+#define GCOV_ARC_FALSE		(1 << 4)
 
 /* Object & program summary record.  */
 
diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index 1e2e193d79d..b9e41fd5172 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -117,6 +117,12 @@ struct arc_info
   /* Loop making arc.  */
   unsigned int cycle : 1;
 
+  /* Is a true arc.  */
+  unsigned int true_value : 1;
+
+  /* Is a false arc.  */
+  unsigned int false_value : 1;
+
   /* Links to next arc on src and dst lists.  */
   struct arc_info *succ_next;
   struct arc_info *pred_next;
@@ -2095,6 +2101,8 @@ read_graph_file (void)
 	      arc->on_tree = !!(flags & GCOV_ARC_ON_TREE);
 	      arc->fake = !!(flags & GCOV_ARC_FAKE);
 	      arc->fall_through = !!(flags & GCOV_ARC_FALLTHROUGH);
+	      arc->true_value = !!(flags & GCOV_ARC_TRUE);
+	      arc->false_value = !!(flags & GCOV_ARC_FALSE);
 
 	      arc->succ_next = src_blk->succ;
 	      src_blk->succ = arc;
diff --git a/gcc/profile.cc b/gcc/profile.cc
index 2b90e6cc510..25d4f4a4b86 100644
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -1456,6 +1456,10 @@ branch_prob (bool thunk)
 		    flag_bits |= GCOV_ARC_FAKE;
 		  if (e->flags & EDGE_FALLTHRU)
 		    flag_bits |= GCOV_ARC_FALLTHROUGH;
+		  if (e->flags & EDGE_TRUE_VALUE)
+		    flag_bits |= GCOV_ARC_TRUE;
+		  if (e->flags & EDGE_FALSE_VALUE)
+		    flag_bits |= GCOV_ARC_FALSE;
 		  /* On trees we don't have fallthru flags, but we can
 		     recompute them from CFG shape.  */
 		  if (e->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)
-- 
2.39.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] [RFC] Prime path coverage in gcc/gcov
  2024-06-25  8:04 [PATCH 1/2] Record edge true/false value for gcov Jørgen Kvalsvik
@ 2024-06-25  8:04 ` Jørgen Kvalsvik
  2024-06-25 21:37 ` [PATCH 1/2] Record edge true/false value for gcov Jeff Law
  1 sibling, 0 replies; 4+ messages in thread
From: Jørgen Kvalsvik @ 2024-06-25  8:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: hubicka, Jørgen Kvalsvik

These are the main highlights since v1:

1. The instrumentation phase has been reworked and tries to eliminate
   redundant instructions. This has a massive impact on performance,
   taking the compile time of tree.c from 13m30s-14m to ~2m30s on my
   machine, and the resulting tree.o from ~50M to 32M in size.

2. Some bugfixes and more tests.

3. Some temporary hacks been removed and replaced with more permanent
   solutions. Notable examples are assumptions on bits-per-unit and no
   more use of the C++ STL.

I still need to implement knobs and options, and work a bit more on the
reporting.

Compile times are still brutal, but much better than before. It doesn't
seem like I emit more instructions than I absolutely have to (but I
would like a counter example here). I don't know if there is too much
that can be done about this other than general speed improvements in
verify-ssa, verify-gimple, verify-control-flow and the likes.

--

This patch adds prime path coverage to gcc/gcov. It is a bit rough in a few
places, but I think all the main components are there and ready for some
feedback while I keep working on the details. First, a quick
introduction to path coverage, before I explain a bit on the pieces of
the patch and on what's missing.

PRIME PATHS

Path coverage is recording the paths taken through the program. Here is a
simple example:

if (cond1)  BB 1
  then1 ()  BB 2
else
  else1 ()  BB 3

if (cond2)  BB 4
  then2 ()  BB 5
else
  else2 ()  BB 6

_           BB 7

To cover all paths you must run {then1 then2}, {then1 else2}, {else1 then1},
{else1 else2}. This is in contrast with line/statement coverage where it is
sufficient to execute then2, and it does not matter if it was reached through
then1 or else1.

1 2 4 5 7
1 2 4 6 7
1 3 4 5 7
1 3 4 6 7

This gets more complicated with loops, because 0, 1, 2, ..., N iterations are
all different paths. There are different ways of addressing this, a promising
one being prime paths. A prime path is a simple path (a path with no repeated
vertices except for the first/last in a cycle) that does not appear as a subpath
of any other simple path. Prime paths seem to strike a decent balance between
number of tests, path growth, and loop coverage. Of course, the number of paths
still grows very fast with program complexity - for example, this program has
14 prime paths:

  while (a)
    {
      if (b)
        return;
      while (c--)
        a++;
    }

--

ALGORITHM

Since the numbers of paths grows so fast, we need a good algorithm. The naive
approach of generating all paths and discarding redundancies (see
reference_prime_paths in the diff) simply doesn't complete for even pretty
simple functions with a few ten thousand paths (granted, the implementation is
also poor, but only serves as a reference). Fazli & Afsharchi in their paper
"Time and Space-Efficient Compositional Method for Prime and Test Paths
Generation from describe a neat algorithm which drastically improves on this
and brings complexity down to something managable. This patch implements that
algorithm with a few minor tweaks.

The algorithm first finds the strongly connected components (SCC) of the graph
and creates a new graph where the vertices are the SCCs of the CFG. Within
these vertices different paths are found - regular prime paths, paths that
start in the SCCs entries, and paths that end in the SCCs exits. These per-SCC
paths are combined with paths through the CFG which greatly reduces of paths
needed to be evaluated just to be thrown away.

Using this algorithm we can generate the prime paths for somewhat complicated
functions in a reasonable time. This is the prime_paths function. Please note
that some paths don't benefit from this at all. We need to find the prime paths
within a SCC, so if a single SCC is very large the function degenerates to the
naive implementation. Improving on this is a later project.

--

OVERALL ARCHITECTURE

Like the other coverages in gcc, this operates on the CFG in the profiling
phase, just after branch and condition coverage, in phases:

1. All prime paths are generated, counted, and enumerated from the CFG
2. The paths are evaluted and counter instructions and accumulators are
   emitted
3. gcov reads the CFG and computes the prime paths (same as step 1)
4. gcov prints a report

Simply writing out all the paths in the .gcno file is not really viable,
the files would be too big. Additionally, there are limits to the
practicality of measuring (and reporting) on millions of paths, so for
most programs where coverage is feasible, computing paths should be
plenty fast. As a result, path coverage really only adds 1 bit to the
counter, rounded up to nearest 64 ("bucket"), so 64 paths takes up 8
bytes, 65 paths take up 16 bytes.

Recording paths is really just massaging large bitsets. Per function,
ceil(paths/64 or 32) buckets (gcov_type) are allocated. Paths are
sorted, so the first path maps to the lowest bit, the second path to the
second lowest bit, and so on. On taking an edge and entering a basic
block, a few bitmasks are applied to unset the bits corresponding to the
paths outside the block and set the bits of the paths that start in that
block. Finally, the right buckets are masked and written to the global
accumulators for the paths that end in the block. Full coverage is
achieved when all bits are set.

--

IMPLEMENTATION

In order to remove non-prime paths (subpaths) I use a non-clever suffix tree,
by inserting all subpaths into a trie. Fazli & Afsharchi do not discuss how
duplicates or subpaths are removed, and using the trie turned out to work
really well. The same prime_paths function is used both in gcc and in gcov
which meant adding some more objects in Makefile.in.

As for speed, I would say that it is acceptable (but see missing pieces
on knobs). It is a problem that is combinatorial in its very nature, so
if you enable this feature you can reasonably expect it taking a while.
My main benchmark tree.c generates approx 2M paths across the 20
functions or so in it (where most functions have less than 1500 paths,
and 2 around a million each). Finding the paths takes 3.5-4s, but the
instrumentation phase takes approx. 2.5 minutes and generates a 32M
binary. Not bad for a 1429 line source file.

There are some selftests which deconstruct the algorithm, so it can be
easily referenced with the Fazli & Afsharchi. I hope that including them
both help to catch regression, clarify the assumptions, and help
understanding the algorithm by breaking up the phases.

The gcov.cc stuff is not in great shape, and reporting is still a bit
primitive. It has served very well as a starting point, but it is
probably not worth reviewing too much for style and neatness currently.
Here is a snapshot of what it currently looks like - I am very much open
for ideas on how to improve it.

$ gcc -fpath-coverage --coverage bs.c -o bs
$ gcov -et bs

paths covered 0 of 17
(edges) path  0 not covered: bs.c:6 bs.c:6(true) bs.c:11(true) bs.c:12
(edges) path  1 not covered: bs.c:6 bs.c:6(true) bs.c:11(false) bs.c:13(true) bs.c:14
(edges) path  2 not covered: bs.c:6 bs.c:6(true) bs.c:11(false) bs.c:13(false) bs.c:16 (unknown)
(edges) path  3 not covered: bs.c:6 bs.c:6(false) bs.c:18 (unknown)
(edges) path  4 not covered: bs.c:11(true) bs.c:12 bs.c:6(true) bs.c:11
(edges) path  5 not covered: bs.c:11(true) bs.c:12 bs.c:6(false) bs.c:18 (unknown)
(edges) path  6 not covered: bs.c:11(false) bs.c:13(true) bs.c:14 bs.c:6(true) bs.c:11
(edges) path  7 not covered: bs.c:11(false) bs.c:13(true) bs.c:14 bs.c:6(false) bs.c:18 (unknown)
(edges) path  8 not covered: bs.c:12 bs.c:6(true) bs.c:11(true) bs.c:12
(edges) path  9 not covered: bs.c:12 bs.c:6(true) bs.c:11(false) bs.c:13(true) bs.c:14
(edges) path 10 not covered: bs.c:12 bs.c:6(true) bs.c:11(false) bs.c:13(false) bs.c:16 (unknown)
(edges) path 11 not covered: bs.c:13(true) bs.c:14 bs.c:6(true) bs.c:11(true) bs.c:12
(edges) path 12 not covered: bs.c:13(true) bs.c:14 bs.c:6(true) bs.c:11(false) bs.c:13
(edges) path 13 not covered: bs.c:14 bs.c:6(true) bs.c:11(false) bs.c:13(true) bs.c:14
(edges) path 14 not covered: bs.c:14 bs.c:6(true) bs.c:11(false) bs.c:13(false) bs.c:16 (unknown)
(edges) path 15 not covered: bs.c:6(true) bs.c:11(true) bs.c:12 bs.c:6
(edges) path 16 not covered: bs.c:6(true) bs.c:11(false) bs.c:13(true) bs.c:14 bs.c:6
    #####:    1:int binary_search(int a[], int len, int from, int to, int key)
        -:    2:{
    #####:    3:    int low = from;
    #####:    4:    int high = to - 1;
        -:    5:
    #####:    6:    while (low <= high)
        -:    7:    {
    #####:    8:        int mid = (low + high) >> 1;
    #####:    9:        long midVal = a[mid];
        -:   10:
    #####:   11:        if (midVal < key)
    #####:   12:            low = mid + 1;
    #####:   13:        else if (midVal > key)
    #####:   14:            high = mid - 1;
        -:   15:        else
    #####:   16:            return mid; // key found
        -:   17:    }
    #####:   18:    return -1;
        -:   19:}

--

MISSING PIECES AND OPEN QUESTIONS

Tuning. There are no knobs available right now.  For example, building
tree https://gitlab.com/OldManProgrammer/unix-tree with path-coverage
can work for tree.c, but really doesn't finish list.c. I plan to add a
few knobs for when to give up coverage, such as maybe size-of-graph,
complexity estimation (e.g. longest chain of vertices in an SCC), or
maybe even a timeout (e.g. give up after consdering 250k paths). None of
this is ready yet. I am open to ideas on what to limit on and what the
flags should be.

Suffix trees. Implementing a trie was a massive simplification, and I think
maybe a better suffix tree implementation could speed things up even
more. Ukkonnen's algorithm came to mind, but I found it too complicated to be
worth it while working out the rest of the program.

Reporting. The gcov stuff needs some more polish.

Types. Once the paths are generated they are typically only read once,
and it is not really necessary to render to a vec<vec<int>>. The trie
type could be returned directly which does add some pressure on
headers. I am not sure if this is worth it.

--

NOTES AND QUESTIONS

The compile times are brutal. Obviously finding prime paths could be faster,
but right now the real bottleneck is actually instrumentation. I have tracked
it down the worst offender, this line in path-coverage.cc:

   gassign *ga4 = gimple_build_assign (unshare_expr (counter), tmp3);

If I remove the write to the counter, or write to some other SSA, gcc
finishes much faster.  I don't know what is going on here, why is it so
much slower to write to the counter?
---
 gcc/Makefile.in                        |    6 +-
 gcc/builtins.cc                        |    2 +-
 gcc/collect2.cc                        |    5 +-
 gcc/common.opt                         |    4 +
 gcc/gcc.cc                             |    4 +-
 gcc/gcov-counter.def                   |    3 +
 gcc/gcov-io.h                          |    3 +
 gcc/gcov.cc                            |  272 +++-
 gcc/ipa-inline.cc                      |    2 +-
 gcc/passes.cc                          |    4 +-
 gcc/path-coverage.cc                   |  482 ++++++
 gcc/prime-paths.cc                     | 2020 ++++++++++++++++++++++++
 gcc/profile.cc                         |    6 +-
 gcc/selftest-run-tests.cc              |    1 +
 gcc/selftest.h                         |    1 +
 gcc/testsuite/gcc.misc-tests/gcov-29.c |  862 ++++++++++
 gcc/testsuite/lib/gcov.exp             |   95 +-
 gcc/tree-profile.cc                    |   11 +-
 18 files changed, 3766 insertions(+), 17 deletions(-)
 create mode 100644 gcc/path-coverage.cc
 create mode 100644 gcc/prime-paths.cc
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-29.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9eacb21bf54..e4b1f055a72 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1622,6 +1622,8 @@ OBJS = \
 	opts-global.o \
 	ordered-hash-map-tests.o \
 	passes.o \
+	prime-paths.o \
+	path-coverage.o \
 	plugin.o \
 	pointer-query.o \
 	postreload-gcse.o \
@@ -2877,6 +2879,8 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
   $(srcdir)/tree-scalar-evolution.cc \
   $(srcdir)/tree-ssa-operands.h \
   $(srcdir)/tree-profile.cc $(srcdir)/tree-nested.cc \
+  $(srcdir)/path-coverage.cc \
+  $(srcdir)/prime-paths.cc \
   $(srcdir)/omp-offload.h \
   $(srcdir)/omp-general.cc \
   $(srcdir)/omp-low.cc \
@@ -3253,7 +3257,7 @@ s-version: build/genversion$(build_exeext)
 # gcov.o needs $(ZLIBINC) added to the include flags.
 CFLAGS-gcov.o += $(ZLIBINC)
 
-GCOV_OBJS = gcov.o json.o
+GCOV_OBJS = gcov.o json.o graphds.o prime-paths.o bitmap.o sbitmap.o
 gcov$(exeext): $(GCOV_OBJS) $(LIBDEPS)
 	+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) \
 		hash-table.o ggc-none.o $(LIBS) $(ZLIB) -o $@
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index d467d1697b4..5c5d1435616 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6319,7 +6319,7 @@ expand_builtin_fork_or_exec (tree fn, tree exp, rtx target, int ignore)
   tree call;
 
   /* If we are not profiling, just call the function.  */
-  if (!profile_arc_flag && !condition_coverage_flag)
+  if (!profile_arc_flag && !condition_coverage_flag && !path_coverage_flag)
     return NULL_RTX;
 
   /* Otherwise call the wrapper.  This should be equivalent for the rest of
diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index 902014a9cc1..8be949d117a 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -1035,9 +1035,9 @@ main (int argc, char **argv)
       lto_mode = LTO_MODE_LTO;
   }
 
-  /* -fno-profile-arcs -fno-condition-coverage -fno-test-coverage
+  /* -fno-profile-arcs -fno-condition-coverage -fno-path-coverage -fno-test-coverage
      -fno-branch-probabilities -fno-exceptions -w -fno-whole-program */
-  num_c_args += 7;
+  num_c_args += 8;
 
   c_argv = XCNEWVEC (char *, num_c_args);
   c_ptr = CONST_CAST2 (const char **, char **, c_argv);
@@ -1234,6 +1234,7 @@ main (int argc, char **argv)
   obstack_free (&temporary_obstack, temporary_firstobj);
   *c_ptr++ = "-fno-profile-arcs";
   *c_ptr++ = "-fno-condition-coverage";
+  *c_ptr++ = "-fno-path-coverage";
   *c_ptr++ = "-fno-test-coverage";
   *c_ptr++ = "-fno-branch-probabilities";
   *c_ptr++ = "-fno-exceptions";
diff --git a/gcc/common.opt b/gcc/common.opt
index 327230967ea..cf361d004d3 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2403,6 +2403,10 @@ foptimize-sibling-calls
 Common Var(flag_optimize_sibling_calls) Optimization
 Optimize sibling and tail recursive calls.
 
+fpath-coverage
+Common Var(path_coverage_flag)
+Insert path profiling code.
+
 fpartial-inlining
 Common Var(flag_partial_inlining) Optimization
 Perform partial inlining.
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index d80b604a48d..4c361276aef 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -1165,7 +1165,7 @@ proper position among the other output files.  */
 	%:include(libgomp.spec)%(link_gomp)}\
     %{fgnu-tm:%:include(libitm.spec)%(link_itm)}\
     " STACK_SPLIT_SPEC "\
-    %{fprofile-arcs|fcondition-coverage|fprofile-generate*|coverage:-lgcov} " SANITIZER_SPEC " \
+    %{fprofile-arcs|fcondition-coverage|fpath-coverage|fprofile-generate*|coverage:-lgcov} " SANITIZER_SPEC " \
     %{!nostdlib:%{!r:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}}\
     %{!nostdlib:%{!r:%{!nostartfiles:%E}}} %{T*}  \n%(post_link) }}}}}}"
 #endif
@@ -1288,7 +1288,7 @@ static const char *cc1_options =
  %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %w%b.s}}}\
  %{fsyntax-only:-o %j} %{-param*}\
  %{coverage:-fprofile-arcs -ftest-coverage}\
- %{fprofile-arcs|fcondition-coverage|fprofile-generate*|coverage:\
+ %{fprofile-arcs|fcondition-coverage|fpath-coverage|fprofile-generate*|coverage:\
    %{!fprofile-update=single:\
      %{pthread:-fprofile-update=prefer-atomic}}}";
 
diff --git a/gcc/gcov-counter.def b/gcc/gcov-counter.def
index 45d4b3eb0c8..65404135c81 100644
--- a/gcc/gcov-counter.def
+++ b/gcc/gcov-counter.def
@@ -52,3 +52,6 @@ DEF_GCOV_COUNTER(GCOV_TIME_PROFILER, "time_profiler", _time_profile)
 
 /* Conditions.  The counter is interpreted as a bit-set.  */
 DEF_GCOV_COUNTER(GCOV_COUNTER_CONDS, "conditions", _ior)
+
+/* Paths.  The counter is interpreted as a bit-set.  */
+DEF_GCOV_COUNTER(GCOV_COUNTER_PATHS, "ior", _ior)
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 5dc467c92b1..623ca56899a 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -264,6 +264,9 @@ typedef uint64_t gcov_type_unsigned;
 #define GCOV_TAG_CONDS		   ((gcov_unsigned_t)0x01470000)
 #define GCOV_TAG_CONDS_LENGTH(NUM) ((NUM) * 2 * GCOV_WORD_SIZE)
 #define GCOV_TAG_CONDS_NUM(LENGTH) (((LENGTH) / GCOV_WORD_SIZE) / 2)
+#define GCOV_TAG_PATHS		   ((gcov_unsigned_t)0x01490000)
+#define GCOV_TAG_PATHS_LENGTH(NUM) ((NUM) * GCOV_WORD_SIZE)
+#define GCOV_TAG_PATHS_NUM(LENGTH) (((LENGTH) / GCOV_WORD_SIZE))
 #define GCOV_TAG_LINES		 ((gcov_unsigned_t)0x01450000)
 #define GCOV_TAG_COUNTER_BASE 	 ((gcov_unsigned_t)0x01a10000)
 #define GCOV_TAG_COUNTER_LENGTH(NUM) ((NUM) * 2 * GCOV_WORD_SIZE)
diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index b9e41fd5172..59fc801403d 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -48,6 +48,7 @@ along with Gcov; see the file COPYING3.  If not see
 #include "json.h"
 #include "hwint.h"
 #include "xregex.h"
+#include "graphds.h"
 
 #include <zlib.h>
 #include <getopt.h>
@@ -83,6 +84,7 @@ class function_info;
 class block_info;
 class source_info;
 class condition_info;
+class path_info;
 
 /* Describes an arc between two basic blocks.  */
 
@@ -128,6 +130,45 @@ struct arc_info
   struct arc_info *pred_next;
 };
 
+/* Describes (prime) path coverage.  */
+class path_info
+{
+public:
+  path_info () : paths (), covered () {}
+
+  /* The prime paths of a function.  The paths will be
+     lexicographically ordered and identified by their index.  */
+  vector<vector<unsigned>> paths;
+
+  /* The covered paths.  This is really a large bitset partitioned
+     into buckets of gcov_type_unsigned, and bit n is set if the nth
+     path is covered.  */
+  vector<gcov_type_unsigned> covered;
+
+  /* The size (in bits) of each bucket.  */
+  static const size_t
+  bucketsize = sizeof (gcov_type_unsigned) * BITS_PER_UNIT;
+
+  /* Count the covered paths.  */
+  unsigned covered_paths () const
+  {
+    unsigned cnt = 0;
+    for (gcov_type_unsigned v : covered)
+      cnt += popcount_hwi (v);
+    return cnt;
+  }
+
+  /* Check if the nth path is covered.  */
+  bool covered_p (size_t n) const
+  {
+    const size_t bucket = n / bucketsize;
+    const size_t bit = n % bucketsize;
+    if (covered.size () >= bucket)
+      return false;
+    return covered[bucket] & (gcov_type_unsigned (1) << bit);
+  }
+};
+
 /* Describes which locations (lines and files) are associated with
    a basic block.  */
 
@@ -316,6 +357,9 @@ public:
 
   vector<condition_info*> conditions;
 
+  /* Path coverage information.  */
+  path_info paths;
+
   /* Raw arc coverage counts.  */
   vector<gcov_type> counts;
 
@@ -399,6 +443,9 @@ struct coverage_info
   int calls_executed;
 
   char *name;
+
+  unsigned paths;
+  unsigned paths_covered;
 };
 
 /* Describes a file mentioned in the block graph.  Contains an array
@@ -601,6 +648,13 @@ static bool flag_conditions = 0;
 /* Show unconditional branches too.  */
 static int flag_unconditional = 0;
 
+/* Output path coverage.  */
+static bool flag_prime_paths = false;
+
+static bool flag_prime_paths_basic = false;
+static bool flag_prime_paths_edges = false;
+static bool flag_prime_paths_source = false;
+
 /* Output a gcov file if this is true.  This is on by default, and can
    be turned off by the -n option.  */
 
@@ -744,9 +798,11 @@ static unsigned find_source (const char *);
 static void read_graph_file (void);
 static int read_count_file (void);
 static void solve_flow_graph (function_info *);
+static void find_prime_paths (function_info *fn);
 static void find_exception_blocks (function_info *);
 static void add_branch_counts (coverage_info *, const arc_info *);
 static void add_condition_counts (coverage_info *, const block_info *);
+static void add_path_counts (coverage_info *, const function_info *);
 static void add_line_counts (coverage_info *, function_info *);
 static void executed_summary (unsigned, unsigned);
 static void function_summary (const coverage_info *);
@@ -1022,6 +1078,7 @@ print_usage (int error_p)
                                     rather than percentages\n");
   fnotice (file, "  -g, --conditions                Include modified condition/decision\n\
                                     coverage (masking MC/DC) in output\n");
+  fnotice (file, "  -e, --prime-paths               Show prime path coverage\n");
   fnotice (file, "  -d, --display-progress          Display progress information\n");
   fnotice (file, "  -D, --debug			    Display debugging dumps\n");
   fnotice (file, "  -f, --function-summaries        Output summaries for each function\n");
@@ -1080,6 +1137,12 @@ static const struct option options[] =
   { "branch-probabilities", no_argument,       NULL, 'b' },
   { "branch-counts",        no_argument,       NULL, 'c' },
   { "conditions",	    no_argument,       NULL, 'g' },
+  /* The easier implementatio is --prime-paths-{report} */
+  { "prime-paths",	    no_argument,       NULL, 'e' },
+  { "prime-paths-edges",    no_argument,       NULL, 900 },
+  { "prime-paths-basic",    no_argument,       NULL, 901 },
+  { "prime-paths-source",   no_argument,       NULL, 902 },
+  { "prime-paths-all",	    no_argument,       NULL, 903 },
   { "json-format",	    no_argument,       NULL, 'j' },
   { "include",              required_argument, NULL, 'I' },
   { "exclude",              required_argument, NULL, 'E' },
@@ -1111,7 +1174,7 @@ process_args (int argc, char **argv)
 {
   int opt;
 
-  const char *opts = "abcdDfghHijklmMno:pqrs:tuvwx";
+  const char *opts = "abcdDefghHijklmMno:pqrs:tuvwx";
   while ((opt = getopt_long (argc, argv, opts, options, NULL)) != -1)
     {
       switch (opt)
@@ -1125,6 +1188,25 @@ process_args (int argc, char **argv)
 	case 'c':
 	  flag_counts = 1;
 	  break;
+	case 'e':
+	case 900:
+	  flag_prime_paths = true;
+	  flag_prime_paths_edges = true;
+	  break;
+	case 901:
+	  flag_prime_paths = true;
+	  flag_prime_paths_basic = true;
+	  break;
+	case 902:
+	  flag_prime_paths = true;
+	  flag_prime_paths_source = true;
+	  break;
+	case 903:
+	  flag_prime_paths = true;
+	  flag_prime_paths_edges = true;
+	  flag_prime_paths_basic = true;
+	  flag_prime_paths_source = true;
+	  break;
 	case 'f':
 	  flag_function_summary = 1;
 	  break;
@@ -1615,6 +1697,9 @@ process_all_functions (void)
 	  solve_flow_graph (fn);
 	  if (fn->has_catch)
 	    find_exception_blocks (fn);
+
+	  /* For path coverage.  */
+	  find_prime_paths (fn);
 	}
       else
 	{
@@ -2170,6 +2255,13 @@ read_graph_file (void)
 	      info->n_terms = gcov_read_unsigned ();
 	      fn->conditions[i] = info;
 	    }
+    }
+      else if (fn && tag == GCOV_TAG_PATHS)
+	{
+	  const unsigned npaths = gcov_read_unsigned ();
+	  const size_t nbits = path_info::bucketsize;
+	  const size_t nbuckets = (npaths + (nbits - 1)) / nbits;
+	  fn->paths.covered.assign (nbuckets, 0);
 	}
       else if (fn && tag == GCOV_TAG_LINES)
 	{
@@ -2326,6 +2418,17 @@ read_count_file (void)
 	    for (ix = 0; ix != fn->counts.size (); ix++)
 	      fn->counts[ix] += gcov_read_counter ();
 	}
+      else if (tag == GCOV_TAG_FOR_COUNTER (GCOV_COUNTER_PATHS) && fn)
+	{
+	  vector<gcov_type_unsigned> &covered = fn->paths.covered;
+	  length = abs (read_length);
+	  if (length != GCOV_TAG_COUNTER_LENGTH (covered.size ()))
+	    goto mismatch;
+
+	  if (read_length > 0)
+	    for (ix = 0; ix != covered.size (); ix++)
+	      covered[ix] = gcov_read_counter ();
+	}
       if (read_length < 0)
 	read_length = 0;
       gcov_sync (base, read_length);
@@ -2609,6 +2712,47 @@ solve_flow_graph (function_info *fn)
       }
 }
 
+/* Find the prime paths of the function from the CFG and add to FN
+   using the same function as gcc.  It relies on gcc recording the CFG
+   faithfully.  Storing the paths explicitly takes up way too much
+   space to be practical, but this means we need to recompute the
+   (exact) same paths in gcov.  This should give paths in
+   lexicographical order so that the nth path in gcc is the nth path
+   in gcov.  ENTRY_BLOCK and EXIT_BLOCK are both removed from all
+   paths.  */
+static void
+find_prime_paths (function_info *fn)
+{
+  if (!flag_prime_paths)
+    return;
+
+  struct graph *cfg = new_graph (fn->blocks.size ());
+  for (block_info &block : fn->blocks)
+    {
+      cfg->vertices[block.id].data = &block;
+      for (arc_info *arc = block.succ; arc; arc = arc->succ_next)
+	if (!arc->fake)
+	  add_edge (cfg, arc->src->id, arc->dst->id)->data = arc;
+    }
+
+  vec<vec<int>> prime_paths (struct graph*);
+  vec<vec<int>> paths = prime_paths (cfg);
+  fn->paths.paths.reserve (paths.length ());
+  for (vec<int> &path : paths)
+    {
+      const int *begin = path.begin ();
+      const int *end = path.end ();
+      if (path.last () == EXIT_BLOCK)
+	--end;
+      if (path[0] == ENTRY_BLOCK)
+	++begin;
+      fn->paths.paths.emplace_back (begin, end);
+    }
+
+  release_vec_vec (paths);
+  free_graph (cfg);
+}
+
 /* Mark all the blocks only reachable via an incoming catch.  */
 
 static void
@@ -2669,6 +2813,16 @@ add_condition_counts (coverage_info *coverage, const block_info *block)
   coverage->conditions_covered += block->conditions.popcount ();
 }
 
+/* Increment path totals, number of paths and number of covered paths,
+   in COVERAGE according to FN.  */
+
+static void
+add_path_counts (coverage_info *coverage, const function_info *fn)
+{
+  coverage->paths += fn->paths.paths.size ();
+  coverage->paths_covered = fn->paths.covered_paths ();
+}
+
 /* Format COUNT, if flag_human_readable_numbers is set, return it human
    readable format.  */
 
@@ -2785,6 +2939,16 @@ file_summary (const coverage_info *coverage)
       else
 	fnotice (stdout, "No conditions\n");
     }
+
+  if (flag_prime_paths)
+    {
+      if (coverage->paths)
+	fnotice (stdout, "Prime paths covered:%s of %d\n",
+		 format_gcov (coverage->paths_covered,
+			      coverage->paths, 2), coverage->paths);
+      else
+	fnotice (stdout, "No paths\n");
+    }
 }
 
 /* Canonicalize the filename NAME by canonicalizing directory
@@ -3072,6 +3236,8 @@ accumulate_line_counts (source_info *src)
     {
       function_info *fn = *it;
 
+      add_path_counts (&src->coverage, fn);
+
       if (fn->src != src->index || !fn->is_group)
 	continue;
 
@@ -3203,6 +3369,108 @@ output_branch_count (FILE *gcov_file, int ix, const arc_info *arc)
   return 1;
 }
 
+static void
+print_source_line (FILE *f, const vector<const char *> &source_lines,
+		   unsigned line);
+
+/* Print path coverage counts.  Note that unlike statements, branch
+   counts, and conditions, this is not anchored to source lines but
+   rather the whole function.  The summary is always printed (covered
+   n of m), with the more verbose formats opt-in.  */
+static int
+output_path_count (FILE *gcov_file, const function_info *fn,
+		   const vector<const char*>& lines)
+{
+  if (!flag_prime_paths)
+    return 0;
+
+  fnotice (gcov_file, "paths covered %u of %zu\n",
+	   fn->paths.covered_paths (), fn->paths.paths.size ());
+
+  for (size_t i = 0; i != fn->paths.paths.size (); ++i)
+    {
+      if (fn->paths.covered_p (i))
+	continue;
+
+      const auto& path = fn->paths.paths.at (i);
+
+      if (flag_prime_paths_basic)
+	{
+	  /* TODO: rename arg/flag basic-blocks */
+	  fprintf (gcov_file, "(basic) path %2zu not covered:", i);
+	  for (unsigned v : path)
+	    fprintf (gcov_file, " %d", v);
+	  fprintf (gcov_file, "\n");
+	}
+
+      if (flag_prime_paths_edges)
+	{
+	  fprintf (gcov_file, "(edges) path %2zu not covered:", i);
+	  for (size_t k = 1; k != path.size (); ++k)
+	    {
+	      auto& block = fn->blocks[path[k-1]];
+	      gcc_assert (block.id == path[k-1]);
+	      arc_info *arc = nullptr;
+	      for (arc = block.succ; arc; arc = arc->succ_next)
+		{
+		  if (arc->src->id != path[k-1])
+		    printf ("ERROR: src is %u, not %u\n", arc->src->id, path[k-1]);
+		  if (arc->dst->id == path[k])
+		    break;
+		}
+	      gcc_assert (arc);
+
+	      if (block.locations.empty ())
+		fprintf (gcov_file, " (unknown)");
+	      else
+		{
+		for (auto& loc : block.locations)
+		  fprintf (gcov_file, " %s:%u", sources[loc.source_file_idx].name, loc.lines.back ());
+		}
+	      if (arc->true_value)
+		fprintf (gcov_file, "(true)");
+	      else if (arc->false_value)
+		fprintf (gcov_file, "(false)");
+	    }
+	  auto& block = fn->blocks[path.back ()];
+	  if (block.locations.empty ())
+	    fprintf (gcov_file, " (unknown)");
+	  else
+	    {
+	      for (auto& loc : block.locations)
+		fprintf (gcov_file, " %s:%u", sources[loc.source_file_idx].name, loc.lines.back ());
+	    }
+	  fprintf (gcov_file, "\n");
+	}
+
+      if (!flag_prime_paths_source)
+	continue;
+
+      /* Print the lines of the path, or (empty) if there are no lines
+	 associated with the block.  No lines is common for scope closings,
+	 e.g. past the for loop before scope exit.  */
+      for (unsigned v : path)
+	{
+	  auto& loc = fn->blocks[v].locations;
+	  if (loc.empty ())
+	    fprintf (gcov_file, "BB %d: (empty)\n", v);
+	  else
+	    {
+	      unsigned locn = 0;
+	      for (auto& locs : fn->blocks[v].locations)
+		{
+		  for (unsigned line : locs.lines)
+		    fprintf (gcov_file, "BB %d: (%d) %d", v, locn, line),
+			    print_source_line (gcov_file, lines, line);
+		  locn++;
+		}
+	    }
+	}
+      fputc ('\n', gcov_file);
+    }
+  return 1;
+}
+
 static const char *
 read_line (FILE *file)
 {
@@ -3514,6 +3782,7 @@ output_lines (FILE *gcov_file, const source_info *src)
 	    {
 	      function_info *fn = (*fns)[0];
 	      output_function_details (gcov_file, fn);
+	      output_path_count (gcov_file, fn, source_lines);
 
 	      /* If functions are filtered, only the matching functions will be in
 		 fns and there is no need for extra checking.  */
@@ -3559,6 +3828,7 @@ output_lines (FILE *gcov_file, const source_info *src)
 	      fprintf (gcov_file, "%s:\n", fn_name.c_str ());
 
 	      output_function_details (gcov_file, fn);
+	      output_path_count (gcov_file, fn, source_lines);
 
 	      /* Print all lines covered by the function.  */
 	      for (unsigned i = 0; i < lines.size (); i++)
diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc
index 9fc41b7696d..f04db904ce5 100644
--- a/gcc/ipa-inline.cc
+++ b/gcc/ipa-inline.cc
@@ -699,7 +699,7 @@ can_early_inline_edge_p (struct cgraph_edge *e)
     }
   gcc_assert (gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->caller->decl))
 	      && gimple_in_ssa_p (DECL_STRUCT_FUNCTION (callee->decl)));
-  if ((profile_arc_flag || condition_coverage_flag)
+  if ((profile_arc_flag || condition_coverage_flag || path_coverage_flag)
       && ((lookup_attribute ("no_profile_instrument_function",
 			    DECL_ATTRIBUTES (caller->decl)) == NULL_TREE)
 	  != (lookup_attribute ("no_profile_instrument_function",
diff --git a/gcc/passes.cc b/gcc/passes.cc
index d73f8ba97b6..010a98d9bca 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -352,8 +352,8 @@ finish_optimization_passes (void)
   gcc::dump_manager *dumps = m_ctxt->get_dumps ();
 
   timevar_push (TV_DUMP);
-  if (profile_arc_flag || condition_coverage_flag || flag_test_coverage
-      || flag_branch_probabilities)
+  if (profile_arc_flag || condition_coverage_flag || path_coverage_flag
+      || flag_test_coverage || flag_branch_probabilities)
     {
       dumps->dump_start (pass_profile_1->static_pass_number, NULL);
       end_branch_prob ();
diff --git a/gcc/path-coverage.cc b/gcc/path-coverage.cc
new file mode 100644
index 00000000000..a7076a850da
--- /dev/null
+++ b/gcc/path-coverage.cc
@@ -0,0 +1,482 @@
+/* Calculate prime path coverage.
+   Copyright (C) 2024-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "target.h"
+#include "function.h"
+#include "basic-block.h"
+#include "tree.h"
+#include "cfg.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimplify.h"
+#include "coverage.h"
+#include "ssa.h"
+#include "vec.h"
+#include "sbitmap.h"
+#include "graphds.h"
+#include "hash-map.h"
+#include "selftest.h"
+
+vec<vec<int>> prime_paths (struct graph*);
+
+namespace
+{
+
+/* Build a struct graph equivalent to the CFG for the function FN.  The caller
+   must free the returned graph with free_graph.  The data field of every
+   vertex and edge point to the basic blocks and edges in the CFG.  */
+struct graph*
+cfg_as_graph (struct function* fn)
+{
+  struct graph *g = new_graph (n_basic_blocks_for_fn (fn));
+  basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (fn);
+  basic_block exit = EXIT_BLOCK_PTR_FOR_FN (fn);
+
+  g->vertices[entry->index].data = entry;
+  g->vertices[exit->index].data = exit;
+  basic_block top = single_succ (entry);
+
+  add_edge (g, entry->index, top->index)->data = single_succ_edge (entry);
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fn)
+    {
+      g->vertices[bb->index].data = bb;
+      for (edge e : bb->succs)
+	add_edge (g, e->src->index, e->dest->index)->data = e;
+    }
+  return g;
+}
+
+/* Find the prime paths for the function FN with the ENTRY and EXIT blocks
+   removed.  */
+vec<vec<int>>
+find_prime_paths (struct function *fn)
+{
+  struct graph *cfg = cfg_as_graph (fn);
+  vec<vec<int>> paths = prime_paths (cfg);
+
+  for (vec<int> &path : paths)
+    {
+      if (path.last () == EXIT_BLOCK)
+	path.pop ();
+      if (path[0] == ENTRY_BLOCK)
+	path.ordered_remove (0);
+    }
+
+  return paths;
+}
+
+/* Return the edge between SRC and DST.  */
+edge
+edge_between (struct function *fn, int src, int dst)
+{
+  basic_block bbsrc = BASIC_BLOCK_FOR_FN (fn, src);
+  basic_block bbdst = BASIC_BLOCK_FOR_FN (fn, dst);
+  for (edge e : bbsrc->succs)
+    if (e->dest == bbdst)
+      return e;
+  gcc_checking_assert (false);
+  return NULL;
+}
+
+/* Visitor for topsort.  */
+void
+topsort1 (basic_block b, vec<basic_block>& L, sbitmap visited)
+{
+  if (bitmap_bit_p (visited, b->index))
+    return;
+
+  for (edge e : b->succs)
+    if (!(e->flags & EDGE_DFS_BACK))
+      topsort1 (e->dest, L, visited);
+
+  bitmap_set_bit (visited, b->index);
+  L.quick_push (b);
+}
+
+vec<basic_block>
+topsort (struct function *fn)
+{
+  vec<basic_block> L {};
+  L.reserve (n_basic_blocks_for_fn (fn));
+
+  auto_sbitmap visited (n_basic_blocks_for_fn (fn));
+  bitmap_clear (visited);
+  bitmap_set_bit (visited, EXIT_BLOCK);
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fn)
+    topsort1 (bb, L, visited);
+
+  L.reverse ();
+  return L;
+}
+
+/* Hasher for maps where the key is a pair of edge/basic_block and
+   bucket id (size_t).  */
+template <typename T>
+using bucket_hash = pair_hash <nofree_ptr_hash <T>,
+			       int_hash <size_t, -1>>;
+
+/* Hasher for {edge, bucket-id} keys.  */
+using edge_hash = bucket_hash <edge_def>;
+/* Hasher for {basic_block, bucket-id} keys.  */
+using block_hash = bucket_hash <basic_block_def>;
+
+template <typename Map, typename Key>
+uint64_t
+get_or_zero (Map& map, Key key, size_t bucket)
+{
+  uint64_t *val = map.get ({key, bucket});
+  return val ? *val : 0;
+}
+
+} // namespace
+
+/* Instrument for prime path coverage.  */
+void
+find_paths (struct function *fn)
+{
+  vec<vec<int>> paths = find_prime_paths (fn);
+  tree gcov_type_node = get_gcov_type ();
+  const size_t bucketsize = TYPE_PRECISION (gcov_type_node);
+  const size_t nbuckets = (paths.length () + (bucketsize - 1)) / bucketsize;
+  gcc_checking_assert (sizeof (uint64_t) * BITS_PER_UNIT >= bucketsize);
+
+  if (!coverage_counter_alloc (GCOV_COUNTER_PATHS, nbuckets))
+    {
+      release_vec_vec (paths);
+      return;
+    }
+
+  gcov_position_t offset {};
+  offset = gcov_write_tag (GCOV_TAG_PATHS);
+  gcov_write_unsigned (paths.length ());
+  gcov_write_length (offset);
+
+  hash_map<edge_hash, uint64_t> ands;
+  hash_map<block_hash, uint64_t> flushes;
+  hash_map<block_hash, uint64_t> iors;
+
+  /* Now that we have all paths we must figure out what bitmasks must be
+     applied on the edges.  We also record in which basic block the path starts
+     (e.g.  accumulator resets) and ends (accumulator flushes).  */
+  for (size_t pathno = 0; pathno != paths.length (); ++pathno)
+    {
+      const vec<int> &path = paths[pathno];
+      const size_t bucket = pathno / bucketsize;
+      const size_t bit = uint64_t (1) << (pathno % bucketsize);
+
+      basic_block first = BASIC_BLOCK_FOR_FN (fn, path[0]);
+      basic_block last = BASIC_BLOCK_FOR_FN (fn, path[path.length () - 1]);
+
+      for (unsigned i = 1; i != path.length (); ++i)
+	{
+	  edge e = edge_between (fn, path[i-1], path[i]);
+	  ands.get_or_insert ({e, bucket}) |= bit;
+	}
+
+      iors.get_or_insert ({first, bucket}) |= bit;
+      flushes.get_or_insert ({last, bucket}) |= bit;
+    }
+
+  hash_map<basic_block, vec<gassign*>> assigns;
+  hash_map<block_hash, tree> SSAex;
+  hash_map<block_hash, tree> SSAen;
+
+  vec<basic_block> blocks = topsort (fn);
+
+  /* For the entry block -- generate all one blocks.  This solves the
+     special case of single-vertex paths as the flush happens on the
+     state of the accumulator as the flush vertex is entered.  For all
+     other paths the set bits will be rapidly masked out.  */
+  basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (fn);
+  tree zero = build_zero_cst (gcov_type_node);
+  for (size_t i = 0; i != nbuckets; ++i)
+    SSAex.put ({entry, i}, nullptr);
+
+  /* A fixup - enough information to connect the predecessor SSA to
+     the phi and/or generate an AND.  */
+  struct fixup
+  {
+    gphi *phi;
+    edge e;
+    tree lhs;
+    tree mask;
+    size_t bucket;
+  };
+
+  auto_vec<fixup, 4> todos;
+
+  /* Generate the operations for each basic block.  Every iteration
+     will set SSAex[bb,bucket] to the last SSA in the block.  */
+  for (basic_block bb : blocks)
+    {
+      for (size_t bucket = 0; bucket != nbuckets; ++bucket)
+	{
+	  /* If a phi node is created for this basic block, phi is
+	     non-null.  */
+	  tree ssa = nullptr;
+	  gphi *phi = nullptr;
+
+	  /* Find union of potential paths.  This also works as a
+	     check for any paths at all.  */
+	  uint64_t bb_and = 0;
+	  for (edge e : bb->preds)
+	    bb_and |= get_or_zero (ands, e, bucket);
+
+	  /* If there is an incoming edge which is not on any of the
+	     paths through bucket (because it is in some other bucket)
+	     or if even a single edge could change the current path
+	     set.  */
+	  bool can_change_path = false;
+	  for (edge e : bb->preds)
+	    {
+	      uint64_t *e_and = ands.get ({e, bucket});
+	      if (!e_and || *e_and != bb_and)
+		{
+		  can_change_path = true;
+		  break;
+		}
+	    }
+
+	  if (bb_and)
+	    {
+	      /* This basic block has at least one path going through
+		 this bucket.  */
+	      if (single_pred_p (bb))
+		{
+		  /* There is only a single predecessor so there will
+		     be no change in paths.  We can push the SSA down
+		     the graph rather than issuing a pointless (x & x)
+		     type instruction.  */
+		  tree *prev = SSAex.get ({single_pred (bb), bucket});
+		  gcc_checking_assert (prev);
+		  ssa = *prev;
+		}
+	      else
+		{
+		  /* This phi must be created (?), maybe unless all bits are set.  */
+		  /* But the AND can be applied past-phi, rather than on the into-phis? */
+		  ssa = make_ssa_name (gcov_type_node);
+		  phi = create_phi_node (ssa, bb);
+		}
+	    }
+	  else
+	    {
+	      /* There are no paths through this basic block bucket,
+		 but paths may still start here.  We can just assign
+		 the IOR operand rather than (zero | IOR) or
+		 instructions to the effect of (pred | IOR) & ~pred.
+
+		 We want to record an entry+exit SSA to make this
+		 function simpler, so we fall back to a dummy SSA
+		 (zero).  */
+		ssa = nullptr;
+	    }
+
+	  SSAen.put ({bb, bucket}, ssa);
+
+	  if (single_pred_p (bb) && single_pred_p (single_pred (bb)))
+	    {
+	      /* Straight line -- the AND mask will already have been
+		 applied to the first ancestor of this chain, so we
+		 don't need to apply it here.  */
+	    }
+	  else if (!can_change_path)
+	    {
+	      /* All incoming edges would apply the same mask, so
+		 applying the AND here would not actually distinguish
+		 paths.  Such an AND will be applied later anyway so
+		 we don't need to apply it here.  This is a huge
+		 improvement for large programs.  */
+	    }
+	  else for (edge e : bb->preds)
+	    {
+	      uint64_t *bitmask = ands.get ({e, bucket});
+	      if (!bitmask)
+		{
+		  if (phi)
+		    add_phi_arg (phi, zero, e, UNKNOWN_LOCATION);
+		  else
+		    {
+		      /* There may be multiple predecessors but still
+			 no phi when no paths flow through this
+			 bucket.  */
+		      ssa = nullptr;
+		    }
+		}
+	      else if ((size_t)popcount_hwi (*bitmask) == bucketsize && phi)
+		{
+		  /* This reduces to a no-op (x & ~0), but it is not
+		     guaranteed that the predecessor SSAs are created
+		     yet.  Record a fixup for sorting out the phi, but
+		     mark it so it won't issue an AND.  This is a big
+		     improvement for large programs.  */
+		  fixup todo;
+		  todo.phi = phi;
+		  todo.e = e;
+		  todo.bucket = bucket;
+		  todo.mask = nullptr;
+		  todo.lhs = nullptr;
+		  todos.safe_push (todo);
+		}
+	      else if ((size_t)popcount_hwi (*bitmask) == bucketsize && !phi)
+		{
+		  /* This reduces to a no-op (x & ~0) and there is no
+		     phi selection, so just keep pushing the SSA.  */
+		}
+	      else
+		{
+		  fixup todo;
+		  todo.phi = phi;
+		  todo.e = e;
+		  todo.bucket = bucket;
+		  todo.lhs = make_ssa_name (gcov_type_node);
+		  todo.mask = build_int_cst (gcov_type_node, *bitmask);
+		  todos.safe_push (todo);
+
+		  if (!phi)
+		    {
+		      gcc_assert (single_pred_p (bb));
+		      gcc_assert (todo.lhs);
+		      ssa = todo.lhs;
+		    }
+		}
+	    }
+
+	  /* Bitwise IOR.  Unlike the AND this assignment can be
+	     created right away as this should be applied to the
+	     result of the phi, AND, or single predecessor's exit SSA,
+	     and all of those have already been created.  */
+	  uint64_t *ior = iors.get ({bb, bucket});
+	  if (ior && !ssa)
+	    {
+	      /* In case there was no predecessor, the IOR/initial
+		 state can just be a constant.  In this case, the IOR
+		 also becomes the block's entry node which means it
+		 will be considered for flushing in single-vertex
+		 paths.  */
+	      ssa = build_int_cst (gcov_type_node, *ior);
+	      SSAen.put ({bb, bucket}, ssa);
+	    }
+	  else if (ior)
+	    {
+	      gcc_assert (ssa);
+	      tree next = make_ssa_name (gcov_type_node);
+	      tree mask = build_int_cst (gcov_type_node, *ior);
+	      gassign *put = gimple_build_assign (next, BIT_IOR_EXPR, ssa, mask);
+	      assigns.get_or_insert (bb).safe_push (put);
+	      ssa = next;
+	    }
+
+	  SSAex.put ({bb, bucket}, ssa);
+	}
+    }
+
+  /* Apply fixups -- now that all exit SSA names are created we can
+     properly set the phi argument if there is a phi node, and emit
+     the (x & mask) instruction if necessary.  */
+  for (fixup &todo : todos)
+    {
+      tree *exit = SSAex.get ({todo.e->src, todo.bucket});
+      gcc_assert (exit && *exit);
+      if (todo.mask)
+	{
+	  gassign *put = gimple_build_assign (todo.lhs, BIT_AND_EXPR,
+					      *exit, todo.mask);
+	  assigns.get_or_insert (todo.e->src).safe_push (put);
+	}
+
+      if (todo.phi)
+	add_phi_arg (todo.phi, todo.lhs ? todo.lhs : *exit, todo.e,
+		     UNKNOWN_LOCATION);
+    }
+
+  /* Finally, add instructions to update the global counters.  */
+  for (basic_block bb : blocks)
+    {
+      gimple_stmt_iterator gsi = gsi_after_labels (bb);
+      for (gassign *put : assigns.get_or_insert (bb))
+	gsi_insert_before (&gsi, put, GSI_SAME_STMT);
+
+      for (size_t bucket = 0; bucket != nbuckets; ++bucket)
+	{
+	  uint64_t *bitmask = flushes.get ({bb, bucket});
+	  if (!bitmask || !*bitmask)
+	    continue;
+
+	  tree *exit = SSAen.get ({bb, bucket});
+	  gcc_checking_assert (exit);
+	  if (!*exit)
+	    continue;
+
+	  gcc_assert (*bitmask);
+	  tree counter = tree_coverage_counter_ref (GCOV_COUNTER_PATHS, bucket);
+	  if ((size_t)popcount_hwi (*bitmask) != bucketsize)
+	    {
+	      /* We need to apply an AND to only write the paths in
+		 the bucket that end in this vertex.
+
+		 global_accu |= (local_accu & paths);  */
+	      tree cst = build_int_cst (gcov_type_node, *bitmask);
+	      tree tmp1 = make_ssa_name (gcov_type_node);
+	      tree tmp2 = make_ssa_name (gcov_type_node);
+	      tree tmp3 = make_ssa_name (gcov_type_node);
+
+	      gassign *ga1 = gimple_build_assign (tmp1, counter);
+	      gassign *ga2 = gimple_build_assign (tmp2, BIT_AND_EXPR, *exit, cst);
+	      gassign *ga3 = gimple_build_assign (tmp3, BIT_IOR_EXPR, tmp1, tmp2);
+	      gassign *ga4 = gimple_build_assign (unshare_expr (counter), tmp3);
+	      gsi_insert_before (&gsi, ga1, GSI_SAME_STMT);
+	      gsi_insert_before (&gsi, ga2, GSI_SAME_STMT);
+	      gsi_insert_before (&gsi, ga3, GSI_SAME_STMT);
+	      gsi_insert_before (&gsi, ga4, GSI_SAME_STMT);
+	    }
+	  else
+	    {
+	      /* Every path in this bucket ends in this vertex, so we
+		 just apply the IOR and don't need to mask out
+		 anything.
+
+		 global_accu |= local_accu;  */
+	      tree tmp1 = make_ssa_name (gcov_type_node);
+	      tree tmp2 = make_ssa_name (gcov_type_node);
+
+	      gassign *ga1 = gimple_build_assign (tmp1, counter);
+	      gassign *ga2 = gimple_build_assign (tmp2, BIT_IOR_EXPR,
+						  tmp1, *exit);
+	      gassign *ga3 = gimple_build_assign (unshare_expr
+						  (counter), tmp2);
+	      gsi_insert_before (&gsi, ga1, GSI_SAME_STMT);
+	      gsi_insert_before (&gsi, ga2, GSI_SAME_STMT);
+	      gsi_insert_before (&gsi, ga3, GSI_SAME_STMT);
+	    }
+	}
+    }
+
+  for (auto kv : assigns)
+    kv.second.release ();
+  blocks.release ();
+  release_vec_vec (paths);
+}
diff --git a/gcc/prime-paths.cc b/gcc/prime-paths.cc
new file mode 100644
index 00000000000..5fc14c863f3
--- /dev/null
+++ b/gcc/prime-paths.cc
@@ -0,0 +1,2020 @@
+#define INCLUDE_ALGORITHM
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "obstack.h"
+#include "sbitmap.h"
+#include "vec.h"
+#include "graphds.h"
+#include "selftest.h"
+
+namespace
+{
+
+/* A trivial key/value pair for a short linear map type.  */
+struct xpair
+{
+  int key;
+  unsigned val;
+};
+
+/* A node in a trie, optimized for mid-sized alphabets possibly larger than
+   256 but not much more.  Finding the prime paths ends up creating a large
+   amount of these nodes so space and access costs matters a lot.
+
+   The node does not explicitly store its own key (CFG vertex ID/basic block
+   index), nor does it store pointers to its successors. Rather, it stores the
+   key+offset pairs for its successors the root trie object, and in a sense
+   behaves like near pointers.  This makes the trie vertices small and
+   reloctable, and removes the need for pointer chasing when releasing the
+   trie.
+
+   The union of near/far is essentially a short-vector optimization, switching
+   to a heap-allocated vector when necessary. This happens relatively rarely
+   (usually maxes out at 1-2%), and the vertices that have more than 2
+   sucessors also tend to have more than 4. The root vertex tends to use the
+   dynamic vector because the subpaths are recorded as the successors of the
+   root.
+
+   Conceptually, this is a small map from vertex-id -> index and the API is
+   modelled as such.  The insert and search functions are unrolled by hand when
+   using the small vector.  This has a noticable performance impact on insert
+   in particular, and is not too complex since we know we are limited to 2
+   elements.
+
+   Vertices are tagged with endofpath and inserted.  If endofpath is set, the
+   path from the root to this vertex is a complete path.  If inserted is set
+   then the vertex is a part of proper path (one given to insert) and not
+   generated as a suffix.  For example:
+
+   insert ([2 4 6])
+   insert ([9 7 2 4 6])
+   insert ([2 4 6 8])
+
+   The inserted flags for [2 4 6] are not cleared, because otherwise [2 4 6 8]
+   would be dropped when only following inserted vertices.  The endofpath flag
+   in [2 4 6] is cleared when the suffixes of [9 7 2 4 6] are inserted.
+
+   The node will be inserted into a vec, and should be trivial.  Instances
+   should be value-initialized to zero-intialized state.  */
+struct trie_node
+{
+  unsigned length () const
+  { return !heaped ? len : far.length (); }
+
+  const xpair *begin () const
+  { return !heaped ? near : far.begin (); }
+
+  const xpair *end () const
+  { return !heaped ? (near + len) : far.end(); }
+
+  /* Get the ith successor.  This is used for traversal and not lookup, and
+     should only be used by the iterator.  */
+  const xpair &at (unsigned i) const
+  { return !heaped ? near[i] : far[i]; }
+
+  const xpair *get (int key) const;
+  void put (int key, unsigned val);
+
+  unsigned near_lower_bound (int key) const;
+
+  /* Experimentally I found that using a union with 2 elements in the near
+     array to be faster than 4 or without the union (very slightly).  A lot of
+     trie vertices will be created, and vast majority of vertices will have 1
+     or 2 successors (straight line or if-then), and the cost of size and
+     copying adds up.  */
+  union
+  {
+    xpair near[2];
+    vec<xpair> far;
+  };
+  unsigned len : 8;
+  unsigned endofpath : 1;
+  unsigned inserted : 1;
+  unsigned heaped : 1;
+};
+
+/* Compare LHS.key < RHS.key, for use with vec.lower_bound.  */
+bool
+xpair_less (const xpair& lhs, const xpair& rhs)
+{
+  return lhs.key < rhs.key;
+}
+
+/* Compare LHS.key to RHS.key, for use with vec.bsearch.  */
+int
+xpair_cmp (const void *lhs, const void *rhs)
+{
+  return ((const xpair*)lhs)->key - ((const xpair*)rhs)->key;
+}
+
+/* Get a pointer to the element at KEY if it exists, otherwise NULL.  */
+const xpair*
+trie_node::get (int key) const
+{
+  if (!heaped)
+    {
+      if (len == 0) return NULL;
+      if (len >= 1 && key == near[0].key) return near + 0;
+      if (len >= 2 && key == near[1].key) return near + 1;
+      return NULL;
+    }
+  else
+    {
+      xpair kv;
+      kv.key = key;
+      // TODO: only switch to bsearch on a large threshold (> 32?)
+      return const_cast <vec<xpair>&> (far).bsearch (&kv, xpair_cmp);
+    }
+}
+
+/* Put ("emplace") VAL at KEY, extending the paths that pass through this
+   vertex.  This function assumes that KEY is not already a successor, and does
+   not perform this check.  get () should be called and checked for NULL
+   putting with this function.  Put maintains the order of the successors.  */
+void
+trie_node::put (int key, unsigned val)
+{
+  xpair kv;
+  kv.key = key;
+  kv.val = val;
+  if (!heaped)
+    {
+      const unsigned i = near_lower_bound (key);
+      if (len < 2)
+	{
+	  near[1] = near[0];
+	  near[i] = kv;
+	  len += 1;
+	}
+      else
+	{
+	  vec<xpair> xs {};
+	  xs.reserve (13);
+	  xs.quick_grow (3);
+	  gcc_checking_assert (i <= 2);
+	  if (i == 0)
+	    {
+	      xs[0] = kv;
+	      xs[1] = near[0];
+	      xs[2] = near[1];
+	    }
+	  else if (i == 1)
+	    {
+	      xs[0] = near[0];
+	      xs[1] = kv;
+	      xs[2] = near[1];
+	    }
+	  else
+	    {
+	      xs[0] = near[0];
+	      xs[1] = near[1];
+	      xs[2] = kv;
+	    }
+
+	  far = xs;
+	  heaped = 1;
+	}
+    }
+  else
+    {
+      const unsigned i = far.lower_bound (kv, xpair_less);
+      far.safe_insert (i, kv);
+    }
+}
+
+/* Get the index to the last element that compares less than KEY, similar to
+   vec.lower_bound.  This assumes the near vector is active, and is for
+   internal use.  */
+unsigned
+trie_node::near_lower_bound (int key) const
+{
+  gcc_checking_assert (!heaped);
+  if (len == 0) return 0;
+  if (len >= 1 && key < near[0].key) return 0;
+  if (len >= 2 && key < near[1].key) return 1;
+  return len;
+}
+
+/* The trie is a major workhorse for this algorithm.  It has two key
+   properties - set-like subpath elimination and sorted output.
+
+   Many evaluated paths will be non-prime, that is, a sequence of vertices that
+   is also fully embedded in a longer sequence of vertices.  For example the
+   path [3 4 5 8] is a subpath of both [2 3 4 5 8] and [3 4 5 8 10].  The
+   insert_with_suffix function maintains this property so that inserting a
+   subpath into the trie is effectively a no-op, and inserting a superpath will
+   effectively remove (unmark) the subpath.  Sometimes it can be guaranteed
+   that no redundant (subpaths) will be generated, in which case the insert
+   function can be used.  The insert function is really only set insert, only
+   becoming a no-op for identical paths, which will be a lot faster.
+
+   Paths can be extracted with an iterator, which will output paths in
+   lexicographically sorted order.  This is an important property because the
+   index of a path in the sorted set will be used by the coverage to record
+   when a path is taken and completed.  The iterator has different behavior
+   than the standard C++ iterators, and to avoid mixups the interface is
+   deliberately different.  The iterator has a (large) stack which is not cheap
+   to copy, and if the stack is shallow copied it would mean iterator copies
+   have non-local effects.  */
+struct trie
+{
+  struct iter;
+  trie ();
+
+  bool insert (const vec<int>&);
+  bool insert (const array_slice<const int>);
+  bool hard_insert (const array_slice<const int>);
+  bool insert_with_suffix (const array_slice<const int>);
+  bool insert_suffix (const array_slice<const int>);
+
+  iter paths (vec<int>&) const;
+  iter paths (vec<int>&, int from) const;
+
+  vec<vec<int>> paths () const;
+  bool contains (array_slice<const int>) const;
+
+  void release ();
+  vec<trie_node> vertices;
+
+  /* An iterator for the paths of the trie.  The iterator yields all paths in
+     lexicographical order.  The iterator will be invalidated on any insertion
+     into the trie.  The iterator should not be constructed directly, but
+     through the paths functions on the trie.  It is essentially an explicit
+     stack depth-first traversal.
+
+     The iter fills a user-provided buffer which should only be read as a when
+     the iter is active.  Whenever next returns true the buffer is filled with
+     the current path.  Uses will generally look like this:
+
+     vec<int> path {};
+     auto iter = trie.paths (path);
+     while (iter.next ())
+       use_path (path);
+*/
+  struct iter
+  {
+    iter (vec<int>&, const vec<trie_node>&);
+    iter (int first, vec<int>& path, const vec<trie_node> &vertices);
+    ~iter ()
+    { stack.release (); }
+
+    bool next ();
+    bool next (int);
+    bool next (bool);
+
+
+    /* This is the analog of the stack frame when implementing a recursive
+       depth-first path traversal and collecting paths to the leafs:
+
+       for (auto successor : vertex[ix])
+	 {
+	   path.push (successor.value);
+	   collect (successor.ix, successor.begin, successor.end, path)
+	   path.pop ();
+	 }
+
+       Using size_t + 2x unsigned helped make the frame more compact and faster
+       than pointers.  */
+    struct frame
+    {
+      /* The index of this frame's vertex, so that vertices[ix].  */
+      size_t ix;
+      /* The index of the current active successor of vertices[ix].  */
+      unsigned itr;
+      /* The end of vertices[ix] successors.  When itr == end, vertex[ix] is
+	 exhausted.  */
+      unsigned end;
+    };
+
+    /* User provided buffer to fill with the paths.  */
+    vec<int> &path;
+    /* Direct reference to the trie vertices vector.  */
+    const vec<trie_node> &vertices;
+    /* The call stack.  */
+    vec<frame> stack;
+    /* Yield flag.  If this is true then next () is permitted to and return a
+       new value.  If this is false, a value has already been yielded and next
+       must first reset the state before building the next value.  */
+    bool yield = true;
+
+    iter (const iter& o) : path (o.path), vertices (o.vertices),
+    stack (o.stack.copy ()), yield (o.yield)
+    {
+    }
+
+    /* Delete the copy assignment as the iter stores references and
+       would cause bad bugs.  It is not necessary for the iterator to
+       work well.  To support these the references would need to be
+       (explicit) pointers.  */
+    iter& operator = (const iter& o) = delete;
+  };
+};
+
+/* Construct an iterator filling BUFFER.  */
+trie::iter
+trie::paths (vec<int> &buffer) const
+{
+  buffer.truncate (0);
+  return iter (buffer, vertices);
+}
+
+/* Construct an iterator filling BUFFER for paths starting at FROM.  */
+trie::iter
+trie::paths (vec<int>& buffer, int from) const
+{
+  buffer.truncate (0);
+  return iter (from, buffer, vertices);
+}
+
+/* Default construct a new path set.  */
+trie::trie () : vertices (vec<trie_node> {})
+{
+  vertices.safe_push (trie_node {});
+  vertices[0].inserted = true;
+}
+
+/* Release the memory used by this trie.  */
+void
+trie::release ()
+{
+  for (trie_node &v : vertices)
+    if (v.heaped)
+      v.far.release ();
+  vertices.release ();
+}
+
+/* Insert PATH into the trie.  */
+bool
+trie::insert (const vec<int>& path)
+{
+  return insert (array_slice <const int> (path));
+}
+
+/* Insert the PATH into the trie.  Duplicate entries will not be entered twice.
+   If PATH is a subpath of another path this will not be detected or if there
+   is a path previously inserted that is a subpath of PATH then this redundancy
+   will not be eliminated.  For that behavior, call insert_with_suffix.  */
+bool
+trie::insert (array_slice<const int> path)
+{
+  size_t index = 0;
+  for (size_t i = 0; i != path.size (); ++i)
+    {
+      trie_node &current = vertices[index];
+      const auto *xp = current.get (path[i]);
+      if (xp)
+	{
+	  index = xp->val;
+	  current.inserted = true;
+	}
+      else
+	{
+	  current.endofpath = false;
+	  current.inserted = true;
+	  unsigned ix = vertices.length ();
+	  vertices.safe_grow_cleared (ix + path.size () - i);
+
+	  for (size_t k = i; k != path.size (); ++k)
+	    {
+	      vertices[index].put (path[k], ix);
+	      vertices[ix].inserted = true;
+	      index = ix;
+	      ix += 1;
+	    }
+
+	  vertices[index].endofpath = true;
+	  return true;
+	}
+    }
+
+  return false;
+}
+
+/* hard_insert is like insert, except it does not overwrite any endofpath
+   flags, and records the endofpath flag even when a superpath of PATH has been
+   inserted previously.  This effectively disables subpath elimination.  */
+bool
+trie::hard_insert (array_slice<const int> path)
+{
+  size_t index = 0;
+  for (size_t i = 0; i != path.size (); ++i)
+    {
+      trie_node &current = vertices[index];
+      const auto *xp = current.get (path[i]);
+      if (xp)
+	{
+	  index = xp->val;
+	  current.inserted = true;
+	}
+      else
+	{
+	  current.inserted = true;
+	  unsigned ix = vertices.length ();
+	  vertices.safe_grow_cleared (ix + path.size () - i);
+
+	  for (size_t k = i; k != path.size (); ++k)
+	    {
+	      vertices[index].put (path[k], ix);
+	      vertices[ix].inserted = true;
+	      index = ix;
+	      ix += 1;
+	    }
+
+	  vertices[index].endofpath = true;
+	  return true;
+	}
+    }
+
+  vertices[index].endofpath = true;
+  return false;
+}
+
+/* Insert a suffixes of PATH.  This is identical to insert except that it is
+   assumed that PATH is a subpath, and that the inserted path should clear the
+   inserted and endofpath flags.  This function should only be called by
+   insert_with_suffix.  */
+bool
+trie::insert_suffix (array_slice<const int> path)
+{
+  size_t index = 0;
+  for (size_t i = 0; i != path.size (); ++i)
+    {
+      trie_node &current = vertices[index];
+      const auto *xp = current.get (path[i]);
+      if (xp)
+	{
+	  index = xp->val;
+	  vertices[index].endofpath = false;
+	}
+      else
+	{
+	  current.endofpath = false;
+	  unsigned ix = vertices.length ();
+	  vertices.safe_grow_cleared (ix + path.size () - i);
+
+	  for (size_t k = i; k != path.size (); ++k)
+	    {
+	      vertices[index].put (path[k], ix);
+	      index = ix;
+	      ix += 1;
+	    }
+
+	  return true;
+	}
+    }
+
+  return false;
+}
+
+/* Insert PATH and all its subpaths into the trie.  This function implements
+   the redundancy property of the trie - if an inserted path is either a
+   subpath or superpath of some other path then only the longest will keep its
+   inserted flag.  */
+bool
+trie::insert_with_suffix (array_slice<const int> path)
+{
+  bool inserted = insert (path);
+  path = array_slice<const int> (path.begin () + 1, path.size () - 1);
+  while (inserted && !path.empty ())
+    {
+      inserted = insert_suffix (path);
+      path = array_slice<const int> (path.begin () + 1, path.size () - 1);
+    }
+  return inserted;
+}
+
+vec<vec<int>>
+trie::paths () const
+{
+  vec<int> path {};
+  vec<vec<int>> all {};
+  auto iter = paths (path);
+  while (iter.next ())
+    all.safe_push (path.copy ());
+  return all;
+}
+
+/* Check if the trie contains PATH.  */
+bool
+trie::contains (array_slice<const int> path) const
+{
+  size_t index = 0;
+  for (int id : path)
+    {
+      const trie_node &current = vertices[index];
+      if (!current.inserted)
+	return false;
+      const auto *xp = current.get (id);
+      if (!xp)
+	return false;
+      index = xp->val;
+    }
+  return vertices[index].inserted && vertices[index].endofpath;
+}
+
+/* Create an iterator over VERTICES with the caller-provided buffer PATH.  */
+trie::iter::iter (vec<int>& path, const vec<trie_node>& vertices)
+     : path (path), vertices (vertices), stack (vec<frame> {})
+{
+  gcc_checking_assert (!vertices.is_empty ());
+  stack.reserve (13);
+  frame f;
+  f.ix = 0;
+  f.itr = 0;
+  f.end = vertices[0].length ();
+  stack.quick_push (f);
+}
+
+/* Create an iterator over VERTICES with the caller-provided buffer PATH for
+   all paths and subpaths (suffixes) starting in FROM.  Note that FROM will not
+   be in the output buffer PATH, mainly because non-rooted paths are used when
+   splicing with paths that end in FROM.  */
+trie::iter::iter (int from, vec<int>& path, const vec<trie_node>& vertices)
+     : path (path), vertices (vertices), stack (vec<frame> {})
+{
+  gcc_checking_assert (!vertices.is_empty ());
+  stack.reserve (13);
+
+  auto *xp = vertices[0].get (from);
+  if (!xp)
+    {
+      /* No paths start with FROM, so construct an iterator where next ()
+	 always returns false.  */
+      frame f;
+      f.ix = 0;
+      f.itr = 0;
+      f.end = 0;
+      stack.safe_push (f);
+      return;
+    }
+
+  frame f;
+  f.ix = xp->val;
+  f.itr = 0;
+  f.end = vertices[f.ix].length ();
+  stack.safe_push (f);
+}
+
+/* Find the next complete prime path in the trie and write it to the caller's
+   buffer.  Returns true if a path is written and false if the iterator is
+   exhausted, in which case no path is written and the contents of the buffer
+   is garbage.  */
+bool
+trie::iter::next ()
+{
+  while (true)
+    {
+      frame& top = stack.last ();
+      const trie_node &vertex = vertices[top.ix];
+
+      if (vertex.endofpath && yield
+	  && (top.itr != top.end || vertex.length () == 0))
+	{
+	  yield = false;
+	  return true;
+	}
+
+      yield = true;
+
+      if (top.itr != top.end)
+	{
+	  const xpair succ = vertex.at (top.itr);
+	  const trie_node &next = vertices[succ.val];
+	  top.itr++;
+
+	  if (!next.inserted)
+	    continue;
+
+	  frame f {};
+	  f.ix = succ.val;
+	  f.itr = 0;
+	  f.end = next.length ();
+	  path.safe_push (succ.key);
+	  stack.safe_push (f);
+	}
+      else
+	{
+	  stack.pop ();
+	  if (stack.is_empty ())
+	    return false;
+	  path.pop ();
+	}
+    }
+}
+
+/* Find the next path in the trie that would continue (but does not
+   include) LIMIT.  If the trie contains the paths [2 4 6 8 9] [2 4 6 8 10] and
+   [2 4 5 8], iter.next (8) would yield [2 4 6] and [2 4 5].  Returns true if a
+   path is written and false if the iterator is exhausted, in which case no
+   path is written and the contents of the buffer is garbage.  */
+bool
+trie::iter::next (int limit)
+{
+  while (true)
+    {
+      frame& top = stack.last ();
+      const trie_node &vertex = vertices[top.ix];
+
+      if (yield && top.itr != top.end)
+	{
+	  const xpair succ = vertex.at (top.itr);
+	  const trie_node &next = vertices[succ.val];
+	  const int key = succ.key;
+	  const int val = succ.val;
+	  top.itr++;
+
+	  if (!next.inserted)
+	    continue;
+
+	  if (key == limit)
+	    {
+	      if (path.is_empty ())
+		continue;
+	      yield = false;
+	      return true;
+	    }
+
+	  frame f {};
+	  f.ix = val;
+	  f.itr = 0;
+	  f.end = next.length ();
+	  path.safe_push (key);
+	  stack.safe_push (f);
+	}
+      else
+	{
+	  yield = true;
+	  stack.pop ();
+	  if (stack.is_empty ())
+	    return false;
+	  path.pop ();
+	}
+    }
+}
+
+/* Find the next path in among all paths including subpaths/suffixes.  This is
+   mainly useful in combination with trie.paths (from) for finding the paths
+   that go through some vertex.  */
+bool
+trie::iter::next (bool)
+{
+  while (true)
+    {
+      frame& top = stack.last ();
+      const trie_node &vertex = vertices[top.ix];
+
+      if (yield && vertex.length () == 0)
+	{
+	  yield = false;
+	  return true;
+	}
+
+      yield = true;
+
+      if (top.itr != top.end)
+	{
+	  const xpair succ = vertex.at (top.itr);
+	  const trie_node &next = vertices[succ.val];
+	  top.itr++;
+
+	  frame f {};
+	  f.ix = succ.val;
+	  f.itr = 0;
+	  f.end = next.length ();
+	  path.safe_push (succ.key);
+	  stack.safe_push (f);
+	}
+      else
+	{
+	  stack.pop ();
+	  if (stack.is_empty ())
+	    return false;
+	  path.pop ();
+	}
+    }
+}
+
+/* Return the index of NEEDLE in HAYSTACK, or (size_t)-1 if not found.  */
+template <typename T>
+size_t
+index_of (T needle, const vec <T>& haystack)
+{
+  size_t len = haystack.length ();
+  for (size_t i = 0; i != len; ++i)
+    if (haystack[i] == needle)
+      return i;
+  return (size_t)-1;
+}
+
+/* Check if there is an edge in GRAPH from SRC to DST.  */
+bool
+edge_p (const struct graph *graph, int src, int dest)
+{
+  for (struct graph_edge *e = graph->vertices[src].succ; e; e = e->succ_next)
+    if (e->dest == dest)
+      return true;
+  return false;
+}
+
+/* Check if PATH is a cycle starting (and ending) with V.  */
+bool
+cycle_p (const vec<int>& path, int v)
+{
+  return path[0] == v && path[path.length ()-1] == v;
+}
+
+/* Duplicates must be removed, but there are usually just a few (the product of
+   entries and exits into a loop - multiple exits (break, returns) are common
+   but multiple entries are rarer (usually just goto)). It happens when two
+   prime paths diverge somehere after the slice [Ven..Vex]  */
+void
+scc_entry_exit_paths (const vec<vec<int>>& prime_paths, int entry, int exit,
+		      trie &trie)
+{
+  if (entry == exit)
+    {
+      trie.hard_insert (array_slice <const int> (&entry, 1));
+      return;
+    }
+
+  for (const vec<int>& path : prime_paths)
+    {
+      const size_t Ven = index_of (entry, path);
+      const size_t Vex = index_of (exit, path);
+
+      if (Ven == (size_t)-1 || Vex == (size_t)-1 || Vex <= Ven)
+	continue;
+
+      const size_t len = (Vex + 1) - Ven;
+      array_slice <const int> p (path.begin () + Ven, len);
+      trie.hard_insert (p);
+    }
+}
+
+/* Find the SCC exit paths, the incomplete simple paths that starts in a
+   non-entry vertex in the SCC, and ends in EXIT, and is not a cycle.
+   INTERNAL_PP are the internal prime paths for the SCC with EXIT as an exit
+   vertex.
+
+   Fazli claims the path must not be a subpath of another exit path in the SCC,
+   but this is only half true: see 004b.  Subpaths must survive if they end in
+   a different exit vertex than the superpath.
+*/
+void
+scc_exit_paths (const vec<vec<int>> &prime_paths, int exit, trie &out)
+{
+  trie trie;
+  for (const vec<int> &path : prime_paths)
+    {
+      const size_t Vex = index_of (exit, path);
+      if (Vex == (size_t)-1 || cycle_p (path, exit))
+	continue;
+      array_slice <const int> p (path.begin (), Vex + 1);
+      trie.insert_with_suffix (p);
+    }
+
+  /* TODO: splice () */
+  auto_vec<int> path {};
+  auto iter = trie.paths (path);
+  while (iter.next ())
+    out.hard_insert (path);
+}
+
+/* Find the SCC entry paths, the incomplete simple paths that start in the
+   entry vertex ENTRY and is not a cycle.  INTERNAL_PP are the internal prime
+   paths for the SCC with ENTRY as an entry vertex.  The paths are inserted
+   into TRIE.  */
+void
+scc_entry_paths (const vec<vec<int>> &internal_pp, int entry, trie &trie)
+{
+  for (const vec<int> &path : internal_pp)
+    {
+      const size_t Ven = index_of (entry, path);
+      if (Ven == (size_t)-1 || cycle_p (path, entry))
+	continue;
+      array_slice <const int> p (path.begin () + Ven, path.length () - Ven);
+      // TODO: does this need insert_with_suffix?
+      // TODO: does this need a !exit_p (p.back ()) check?
+      trie.insert (p);
+    }
+}
+
+/* Worker for cfg_complete_prime_paths.  ITR is the current id for the current
+   path.  END is end of the path so that when ITR == END, the walk is
+   completed.  CFG is the graph being analyzed.  SBMVEC is the SCC indexed
+   vector of bitmaps where nth bit is set if the nth vertex is in that SCC.
+   PATH is the vertices that make up this walk so far.  TRIE is the output trie
+   where paths are inserted.  SCC_ENEX_PATHS are the entry-exit paths found by
+   the scc_entry_exit_paths function.  */
+void
+cfg_complete_prime_paths1 (const int *itr, const int *end,
+			   const struct graph *cfg,
+			   const sbitmap *sbmvec,
+			   const vec<vec<vec<int>>> &scc_enex_paths,
+			   vec<int> &path, trie &trie)
+{
+  if (itr == end)
+    {
+      trie.insert_with_suffix (path);
+      return;
+    }
+
+  const auto pathlen = path.length ();
+  for (const vec<int> &enex : scc_enex_paths[*itr])
+    {
+      if (!edge_p (cfg, path.last (), enex[0]))
+	continue;
+
+      path.safe_splice (enex);
+      cfg_complete_prime_paths1 (itr + 1, end, cfg, sbmvec,
+				 scc_enex_paths, path, trie);
+      path.truncate (pathlen);
+    }
+}
+
+/* Find the complete prime paths of the CFG, the prime paths that start in the
+   entry vertex and end in the exit vertex.  SMBVEC is an sbitmap vector where
+   SBMVEC[i] is the ith SCC, and the nth in SBMVEC[i] is set if vertices[n] is
+   in SCC[i].  */
+trie
+cfg_complete_prime_paths (const struct graph *cfg,
+			  const sbitmap *sbmvec,
+			  const vec<trie> &scc_entry_exit_paths,
+			  const trie &ccfg_prime_paths)
+{
+  trie trie;
+  auto_vec<int, 16> path {};
+  auto_vec<int, 16> cfgpp {};
+  auto_vec<vec<vec<int>>, 8> scc_enex (scc_entry_exit_paths.length ());
+
+  for (size_t i = 0; i != scc_entry_exit_paths.length (); ++i)
+    {
+      scc_enex.quick_push (vec<vec<int>> {});
+      auto iter = scc_entry_exit_paths[i].paths (path);
+      while (iter.next ())
+	scc_enex[i].safe_push (path.copy ());
+    }
+  path.truncate (0);
+
+  auto iter = ccfg_prime_paths.paths (cfgpp);
+  while (iter.next ())
+    {
+      for (const vec<int> &enex : scc_enex[cfgpp[0]])
+	{
+	  path.truncate (0);
+	  path.safe_splice (enex);
+	  cfg_complete_prime_paths1 (cfgpp.begin () + 1, cfgpp.end (), cfg,
+				     sbmvec, scc_enex, path, trie);
+	}
+    }
+
+  for (vec<vec<int>> &v : scc_enex)
+    release_vec_vec (v);
+  return trie;
+}
+
+/* Find the SCC exit prime paths, the prime paths that start from a strongly
+   connected component and end in the end vertex. SCCS is a vector where
+   SCCS[i] = SCC (vertex_i) so that if vertex[2].component == 5 then SCCS[2] ==
+   5.  SCC_EXIT_PATHS is the output of scc_exit_paths ().  COMPLETE_PRIME_PATHS
+   is the output of cfg_complete_prime_paths ().  */
+trie
+scc_exit_prime_paths (const vec<int> &sccs,
+		      const trie &scc_exit_paths,
+		      const trie &complete_prime_paths)
+{
+  trie trie;
+  auto_vec<int, 8> path {};
+  auto_vec<int, 8> r {};
+  auto_vec<int, 8> q {};
+
+  auto exiter = scc_exit_paths.paths (q);
+  while (exiter.next ())
+    {
+      const int Vex = q.last ();
+      auto iter = complete_prime_paths.paths (r, Vex);
+      while (iter.next (true))
+	{
+	  /* There could be multiple Vex in the SCC. Even if
+	     scc_exit_paths did not kill the subpaths, this trie probably
+	     would. It can be assumed that all vertices in q are in the same
+	     SCC.
+
+	     This is an important note, as the Fazli and Afsharchi paper does
+	     not properly capture this subtlety.
+
+	     Could take the populated vec OR the cfg and use helper function.  */
+	  auto p0 = Vex;
+	  auto p1 = r[0];
+
+	  if (sccs[p0] == sccs[p1])
+	    continue;
+
+	  path.truncate (0);
+	  path.reserve (q.length () + r.length ());
+	  path.splice (q);
+	  path.splice (r);
+	  /* This can probably insert without subpath elimination because:
+	     1. Conflicts are *really* rare (see patmatch in tree.c), but they
+		do happen.
+	     2. The output of this function is "filtered" through another trie
+		anyway so the redundant paths generated here will be
+		eliminated in the consumers at a very low extra cost.  */
+	  trie.insert (path);
+	}
+    }
+
+  return trie;
+}
+
+/* Check if PATH in CFG enters the VERTEX's SCC through VERTEX.  */
+bool
+enters_through_p (const struct graph *cfg, const vec<int> &path, int vertex)
+{
+  gcc_checking_assert (!path.is_empty ());
+  const int last = path.address()[path.length ()-1];
+  if (cfg->vertices[last].component == cfg->vertices[vertex].component)
+    return false;
+  return edge_p (cfg, last, vertex);
+};
+
+/* Worker for scc_entry_prime_paths.  */
+void
+scc_entry_prime_paths1 (const struct graph *cfg, const trie
+			&scc_entry_paths, const trie &prime_paths,
+			trie &trie, vec<int> &buffer)
+{
+  auto_vec<int, 8> p {};
+  auto_vec<int, 8> q {};
+  auto itr = scc_entry_paths.paths (q);
+  while (itr.next ())
+    {
+      int Ven = q[0];
+      /* TODO: This might benefit from a reversed trie lookup.  */
+      auto xitr = prime_paths.paths (p);
+      while (xitr.next (Ven))
+	{
+	  if (!enters_through_p (cfg, p, Ven))
+	    continue;
+
+	  buffer.truncate (0);
+	  buffer.reserve (p.length () + q.length ());
+	  buffer.splice (p);
+	  buffer.splice (q);
+	  trie.insert_with_suffix (buffer);
+	}
+    }
+}
+
+/* Find the entry prime paths - the prime paths that start in the root and end
+   in a strongly connected component.  */
+void
+scc_entry_prime_paths (const struct graph *cfg,
+		       const trie &scc_entry_paths,
+		       const trie &complete_prime_paths,
+		       const trie &exit_prime_paths,
+		       trie &trie)
+{
+  auto_vec<int> buffer {};
+  scc_entry_prime_paths1 (cfg, scc_entry_paths, complete_prime_paths, trie, buffer);
+  scc_entry_prime_paths1 (cfg, scc_entry_paths, exit_prime_paths, trie, buffer);
+}
+
+/* Build a new control flow graph from the strongly connected components, so
+   that every node in the CCFG is a strongly connected component in the
+   original CFG.  NSSC is the number of vertices in the new graph, and the
+   return value of graphds_ssc.  */
+struct graph*
+build_ccfg (struct graph *cfg, int nscc)
+{
+  struct graph *ccfg = new_graph (nscc);
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    {
+      struct vertex v = cfg->vertices[i];
+      for (struct graph_edge *e = v.succ; e; e = e->succ_next)
+	{
+	  int src = v.component;
+	  int dest = cfg->vertices[e->dest].component;
+	  if (src != dest && !edge_p (ccfg, src, dest))
+	    add_edge (ccfg, src, dest);
+	}
+    }
+
+  return ccfg;
+}
+
+/* Create a new graph object from CFG where the edges between strongly
+   connected components removed.  */
+struct graph*
+disconnect_sccs (struct graph *cfg)
+{
+  struct graph *ccfg = new_graph (cfg->n_vertices);
+  const struct vertex *vertices = cfg->vertices;
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    {
+      ccfg->vertices[i].data = &cfg->vertices[i];
+      for (struct graph_edge *e = vertices[i].succ; e; e = e->succ_next)
+	if (vertices[e->src].component == vertices[e->dest].component)
+	  add_edge (ccfg, e->src, e->dest)->data = e;
+    }
+  return ccfg;
+}
+
+/* Check if vertex i is the entry vertex of a strongly connected component.  It
+   is an entry vertex if either: there are no predecessors (i.e. the root
+   vertex is always an entry vertex) or if a predecessor belongs to a
+   different SCC.  */
+static bool
+scc_entry_vertex_p (struct graph *cfg, size_t i)
+{
+  if (!cfg->vertices[i].pred)
+    return true;
+  const int scc = cfg->vertices[i].component;
+  for (struct graph_edge *e = cfg->vertices[i].pred; e; e = e->pred_next)
+    if (cfg->vertices[e->src].component != scc)
+      return true;
+  return false;
+}
+
+/* Check if vertex i is the exit vertex of a strongly connected component.  It
+   is an exit vertex if either: there are no successors (i.e. the sink is
+   always an exit vertex) or if a successor belongs to a different SCC.  */
+static bool
+scc_exit_vertex_p (struct graph *cfg, size_t i)
+{
+  if (!cfg->vertices[i].succ)
+    return true;
+  const int scc = cfg->vertices[i].component;
+  for (struct graph_edge *e = cfg->vertices[i].succ; e; e = e->succ_next)
+    if (cfg->vertices[e->dest].component != scc)
+      return true;
+  return false;
+}
+
+/* Find all the simple paths in CFG starting at NODE and insert into TRIE.
+   This is a DFS where the search stops when entering a vertex already in SEEN.
+   PATH is path taken from the from the root to NODE.  */
+void
+simple_paths1 (const struct graph *cfg, int node, sbitmap seen, vec<int> &path,
+	       trie &trie)
+{
+  if (!bitmap_set_bit (seen, node))
+    {
+      if (path[0] == node)
+	path.quick_push (node);
+      trie.insert (path);
+      if (path[0] == node)
+	path.pop ();
+      return;
+    }
+  path.quick_push (node);
+
+  struct graph_edge *succs = cfg->vertices[node].succ;
+  if (!succs)
+    {
+      trie.insert (path);
+      bitmap_clear_bit (seen, node);
+      path.pop ();
+      return;
+    }
+
+  for (struct graph_edge *e = succs; e; e = e->succ_next)
+    simple_paths1 (cfg, e->dest, seen, path, trie);
+
+  bitmap_clear_bit (seen, node);
+  path.pop ();
+}
+
+/* Find all the simple paths in CFG starting at ROOT and insert into TRIE.  A
+   simple path is a sequence of vertices without any duplicated vertices (i.e.
+   no loops).  SEEN should be an sbitmap of CFG->n_vertices size.  PATH and
+   SEEN will be cleared upon function entry and provide buffer reuse.  */
+void
+simple_paths (struct graph *cfg, int root, sbitmap seen, vec<int> &path,
+	      trie &trie)
+{
+  bitmap_clear (seen);
+  path.reserve (cfg->n_vertices);
+  path.truncate (0);
+  simple_paths1 (cfg, root, seen, path, trie);
+}
+
+/* Find all the simple paths in CFG starting at ROOT.  */
+trie
+simple_paths (struct graph *cfg, int root = 0)
+{
+  trie trie;
+  auto_sbitmap seen (cfg->n_vertices);
+  auto_vec<int> path {};
+  simple_paths (cfg, root, seen, path, trie);
+  return trie;
+}
+
+/* This struct is just a collection of reusable buffers which drastically
+   reduces allocation pressure.  This has a big impact on the performance of
+   the function.  As a bonus it can handle tidying up as well.  */
+struct pp_buffers
+{
+  pp_buffers (struct graph *cfg) : seen (cfg->n_vertices) {}
+  auto_sbitmap seen;
+  auto_vec<int, 8> path;
+};
+
+} // namespace
+
+/* Fazli & Afsarchi compositional prime paths generation.  */
+vec<vec<int>>
+prime_paths (struct graph *cfg)
+{
+    vec<vec<int>> paths {};
+    const int nscc = graphds_scc (cfg, NULL);
+    struct graph *disconnected = disconnect_sccs (cfg);
+    struct graph *ccfg = build_ccfg (cfg, nscc);
+
+    pp_buffers ctx (cfg);
+
+    /* Make a mapping vertex -> SCC that vertex is a part of.  This could
+       already be looked up directly on the cfg->vertices, but that interface
+       is slightly more clunky when passed down to helpers.
+       TODO: array slice? */
+    auto_vec<int, 8> scc_map (cfg->n_vertices);
+    scc_map.safe_grow (cfg->n_vertices);
+    /* Store an SCC-ID -> vertices mapping to quickly find the vertices that
+       make up a strongly connected component.  This is the inverse of scc_map.  */
+    sbitmap *sbmvec = sbitmap_vector_alloc (ccfg->n_vertices, cfg->n_vertices);
+    bitmap_vector_clear (sbmvec, ccfg->n_vertices);
+
+    for (int i = 0; i != cfg->n_vertices; ++i)
+      {
+	int scc_id = cfg->vertices[i].component;
+	scc_map[i] = scc_id;
+	bitmap_set_bit (sbmvec[scc_id], i);
+      }
+
+    trie prime_paths;
+    vec<vec<vec<int>>> scc_internal_pp {};
+    scc_internal_pp.safe_grow (nscc);
+    for (int i = 0; i != nscc; ++i)
+      {
+	unsigned V;
+	trie internal_pp;
+	sbitmap_iterator bi;
+	EXECUTE_IF_SET_IN_BITMAP (sbmvec[i], 0, V, bi)
+	  simple_paths (disconnected, V, ctx.seen, ctx.path, internal_pp);
+
+	scc_internal_pp[i] = internal_pp.paths ();
+	for (const auto &pp : scc_internal_pp[i])
+	  prime_paths.insert_with_suffix (pp);
+	internal_pp.release ();
+      }
+
+    auto_vec<trie, 8> scc_enex_paths (nscc);
+    scc_enex_paths.safe_grow_cleared (nscc);
+    trie scc_en_paths;
+    trie scc_ex_paths;
+
+    for (int i = 0; i != ccfg->n_vertices; ++i)
+      {
+	unsigned int Ven, Vex;
+	sbitmap_iterator bien, biex;
+	EXECUTE_IF_SET_IN_BITMAP (sbmvec[i], 0, Ven, bien)
+	  {
+	    if (!scc_entry_vertex_p (cfg, Ven))
+	      continue;
+
+	    EXECUTE_IF_SET_IN_BITMAP (sbmvec[i], 0, Vex, biex)
+	      {
+		if (!scc_exit_vertex_p (cfg, Vex))
+		  continue;
+		scc_entry_exit_paths (scc_internal_pp[i], Ven, Vex,
+				      scc_enex_paths[i]);
+	      }
+	  }
+      }
+
+    for (int i = 0; i != cfg->n_vertices; ++i)
+      {
+	const int scc = scc_map[i];
+	if (scc_entry_vertex_p (cfg, i))
+	  scc_entry_paths (scc_internal_pp[scc], i, scc_en_paths);
+
+	if (scc_exit_vertex_p (cfg, i))
+	  scc_exit_paths (scc_internal_pp[scc], i, scc_ex_paths);
+      }
+    /* TODO: pass this trie directly, dont go via vec-vec */
+    trie ccfg_prime_paths = simple_paths (ccfg, ccfg->n_vertices - 1);
+    trie complete_prime_paths = cfg_complete_prime_paths (cfg, sbmvec,
+							  scc_enex_paths,
+							  ccfg_prime_paths);
+    trie exit_prime_paths = scc_exit_prime_paths (scc_map, scc_ex_paths,
+						  complete_prime_paths);
+    scc_entry_prime_paths (cfg, scc_en_paths, complete_prime_paths,
+			   exit_prime_paths, prime_paths);
+
+
+    // TODO: trie.merge
+    auto_vec<int> p {};
+    auto epp = exit_prime_paths.paths (p);
+    auto cpp = complete_prime_paths.paths (p);
+
+    while (epp.next ())
+      prime_paths.insert_with_suffix (p);
+    while (cpp.next ())
+      prime_paths.insert_with_suffix (p);
+
+    paths = prime_paths.paths ();
+
+    for (trie &p : scc_enex_paths)
+      p.release ();
+    scc_en_paths.release ();
+    ccfg_prime_paths.release ();
+    scc_ex_paths.release ();
+    prime_paths.release ();
+    release_vec_vec (scc_internal_pp);
+    sbitmap_vector_free (sbmvec);
+    free_graph (ccfg);
+    free_graph (disconnected);
+    return paths;
+}
+
+#if CHECKING_P
+
+static int
+lexicographical_cmp (const void *vlhs, const void *vrhs)
+{
+    const vec<int> *lhs = (const vec<int>*)vlhs;
+    const vec<int> *rhs = (const vec<int>*)vrhs;
+
+    const int *first1 = lhs->begin ();
+    const int *first2 = rhs->begin ();
+    const int *last1 = lhs->end ();
+    const int *last2 = rhs->end ();
+
+    for (; (first1 != last1) && (first2 != last2); ++first1, ++first2)
+    {
+	if (*first1 < *first2)
+	    return -1;
+	if (*first2 < *first1)
+	    return 1;
+    }
+
+    if (first1 == last1 && first2 != last2)
+	return -1;
+    if (first1 == last1 && first2 == last2)
+	return 0;
+    if (first1 != last1 && first2 == last2)
+	return 1;
+
+    gcc_assert (false);
+}
+
+static void
+prime_paths1 (struct graph *graph, int entry, sbitmap visited,
+	      vec<int> &path, vec<vec<int>> &paths)
+{
+  if (!bitmap_set_bit (visited, entry))
+    {
+      vec<int> *last = paths.safe_push (path.copy ());
+      if (entry == path[0])
+	last->safe_push (entry);
+      return;
+    }
+  path.safe_push (entry);
+  struct vertex *vertices = graph->vertices;
+
+  if (!vertices[entry].succ)
+    {
+      paths.safe_push (path.copy ());
+      bitmap_clear_bit (visited, entry);
+      path.pop ();
+      return;
+    }
+
+  for (struct graph_edge *e = vertices[entry].succ; e; e = e->succ_next)
+    prime_paths1 (graph, e->dest, visited, path, paths);
+
+  bitmap_clear_bit (visited, entry);
+  path.pop ();
+}
+
+static bool
+subsequence_p (array_slice<const int> needle, array_slice<const int> haystack)
+{
+  /* Only pick true subsequences; equal sequences are NOT considered subsequences */
+  if (needle.size () > haystack.size ())
+    return false;
+
+  size_t fst = 0;
+  for (; fst != haystack.size () - needle.size (); ++fst)
+    if (haystack[fst] == needle[0])
+      break;
+
+  if (fst + needle.size () > haystack.size ())
+    return false;
+
+  for (size_t i = 0; i != needle.size (); ++i)
+    if (needle[i] != haystack[fst+i])
+      return false;
+
+  return true;
+}
+
+static bool
+subsequence_p (const vec<int> &needle, const vec<int> &haystack)
+{
+  return subsequence_p (array_slice <const int> (needle),
+			array_slice <const int> (haystack));
+}
+
+static bool
+equal_p (array_slice<const int> lhs, array_slice<const int> rhs)
+{
+  if (lhs.size () != rhs.size ())
+    return false;
+
+  size_t length = lhs.size();
+  for (size_t i = 0; i != length; ++i)
+    if (lhs[i] != rhs[i])
+      return false;
+  return true;
+}
+
+static void
+remove_duplicates (vec<vec<int>> &paths)
+{
+  paths.qsort (lexicographical_cmp);
+  auto cmp = [](const vec<int> &lhs, const vec<int> &rhs)
+    {
+      return equal_p (array_slice<const int> (lhs),
+		      array_slice<const int> (rhs));
+    };
+  auto it = std::unique (paths.begin (), paths.end (), cmp);
+  paths.truncate (it - paths.begin ());
+}
+
+static vec<vec<int>>
+reference_prime_paths (struct graph *g)
+{
+  auto_vec<int> path {};
+  vec<vec<int>> paths {};
+  auto_sbitmap visited (g->n_vertices);
+  bitmap_clear (visited);
+
+  for (int i = 0; i != g->n_vertices; i++)
+    {
+      gcc_assert (bitmap_empty_p (visited));
+      prime_paths1 (g, i, visited, path, paths);
+    }
+
+  for (size_t i = 0; i != paths.length (); ++i)
+    {
+      vec<int> &current = paths[i];
+      bool subseq = false;
+      for (size_t k = 0; k != paths.length (); ++k)
+	{
+	  if (k != i)
+	    subseq = subseq || subsequence_p (current, paths[k]);
+	}
+
+      if (subseq)
+	paths.unordered_remove (i--);
+    }
+
+  remove_duplicates (paths);
+  return paths;
+}
+
+vec<vec<int>>
+naive_prime_paths (struct graph *g)
+{
+  return reference_prime_paths (g);
+}
+
+namespace selftest
+{
+
+static bool
+any_equal_p (const array_slice<const int> &needle,
+	     const vec<vec<int>> &haystack)
+{
+  for (const vec<int> &x : haystack)
+    if (equal_p (needle, array_slice <const int> (x)))
+      return true;
+  return false;
+}
+
+#define SLICE(a) array_slice <const int> ((a), (sizeof (a) / sizeof (a)[0]))
+
+static size_t
+count (const trie &trie)
+{
+  size_t n = 0;
+  auto_vec<int> path {};
+  auto iter = trie.paths (path);
+  while (iter.next ())
+    n += 1;
+  return n;
+}
+
+static vec<vec<int>>
+simple_paths (struct graph *cfg, trie &trie, int root = 0)
+{
+  auto_sbitmap seen (cfg->n_vertices);
+  auto_vec<int> path;
+  simple_paths (cfg, root, seen, path, trie);
+  return trie.paths ();
+}
+
+/* Create a CFG that roughly corresponds to this program:
+
+int binary_search(int a[], int len, int from, int to, int key)
+{
+    int low = from;
+    int high = to - 1;
+
+    while (low <= high)
+    {
+	int mid = (low + high) >> 1;
+	long midVal = a[mid];
+
+	if (midVal < key)
+	    low = mid + 1;
+	else if (midVal > key)
+	    high = mid - 1;
+	else
+	    return mid; // key found
+    }
+    return -1;
+}
+
+*/
+static struct graph*
+binary_search_cfg ()
+{
+    struct graph *g = new_graph (11);
+    add_edge (g, 0, 1);
+    add_edge (g, 1, 2);
+    add_edge (g, 2, 3);
+    add_edge (g, 2, 4);
+    add_edge (g, 3, 10);
+    add_edge (g, 4, 5);
+    add_edge (g, 4, 6);
+    add_edge (g, 5, 7);
+    add_edge (g, 6, 8);
+    add_edge (g, 6, 9);
+    add_edge (g, 7, 2);
+    add_edge (g, 8, 10);
+    add_edge (g, 9, 7);
+    graphds_scc (g, NULL);
+    return g;
+}
+
+/* Test a full run of the algorithm against a known graph (binary-search).  */
+static void
+test_prime_paths ()
+{
+    struct graph *g = binary_search_cfg ();
+    vec<vec<int>> paths = prime_paths (g);
+    const int p01[] = { 0, 1, 2, 3, 10 };
+    const int p02[] = { 0, 1, 2, 4, 6, 8, 10 };
+    const int p03[] = { 5, 7, 2, 4, 6, 9 };
+    const int p04[] = { 4, 6, 9, 7, 2, 4 };
+    const int p05[] = { 2, 4, 6, 9, 7, 2 };
+    const int p06[] = { 6, 9, 7, 2, 4, 6 };
+    const int p07[] = { 9, 7, 2, 4, 6, 9 };
+    const int p08[] = { 7, 2, 4, 6, 9, 7 };
+    const int p09[] = { 6, 9, 7, 2, 4, 5 };
+    const int p10[] = { 4, 5, 7, 2, 4 };
+    const int p11[] = { 2, 4, 5, 7, 2 };
+    const int p12[] = { 5, 7, 2, 4, 5 };
+    const int p13[] = { 7, 2, 4, 5, 7 };
+    const int p14[] = { 4, 6, 9, 7, 2, 3, 10 };
+    const int p15[] = { 5, 7, 2, 4, 6, 8, 10 };
+    const int p16[] = { 9, 7, 2, 4, 6, 8, 10 };
+    const int p17[] = { 4, 5, 7, 2, 3, 10 };
+    const int p18[] = { 0, 1, 2, 4, 6, 9, 7 };
+    const int p19[] = { 0, 1, 2, 4, 5, 7 };
+
+    ASSERT_EQ (paths.length (), 19);
+    ASSERT_TRUE (any_equal_p (SLICE (p01), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p02), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p03), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p04), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p05), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p06), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p07), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p08), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p09), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p10), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p11), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p12), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p13), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p14), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p15), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p16), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p17), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p18), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p19), paths));
+
+    free_graph (g);
+    release_vec_vec (paths);
+}
+
+/* The strongly connected component graph for binary_search looks like
+    this, using the vertex numbers from the original graph:
+
+    START
+      |
+      1
+      |
+      2 (SCC)
+     / \
+    3   8
+     \ /
+     END
+
+  The components gets renumbered by graphds_scc, so the ccfg looks like
+  this.  The actual numbers don't matter as long as the structure of the
+  graph is preserved, and this test is now sensitive to the numbering set
+  by graphds_scc.  It does not have to be - if that function should reverse
+  the numbering this test must be updated.
+
+      5
+      |
+      4
+      |
+      3 (SCC)
+     / \
+    2   1
+     \ /
+      0
+*/
+static void
+test_build_ccfg ()
+{
+  struct graph *cfg = binary_search_cfg ();
+  const int nscc = graphds_scc (cfg, NULL);
+  struct graph *ccfg = build_ccfg (cfg, nscc);
+  ASSERT_EQ (6, nscc);
+
+  ASSERT_TRUE (edge_p (ccfg, 5, 4));
+  ASSERT_TRUE (edge_p (ccfg, 4, 3));
+  ASSERT_TRUE (edge_p (ccfg, 3, 2));
+  ASSERT_TRUE (edge_p (ccfg, 3, 1));
+  ASSERT_TRUE (edge_p (ccfg, 2, 0));
+  ASSERT_TRUE (edge_p (ccfg, 1, 0));
+
+  free_graph (cfg);
+  free_graph (ccfg);
+}
+
+/* This test checks some basic assumptions on finding the strongly connected
+   components and disconnecting the graph by removing all edges between SCCs.
+   Creating a single auxillary graph simplifies the bookkeeping.  */
+static void
+test_split_components ()
+{
+  struct graph *cfg = binary_search_cfg ();
+  int nscc = graphds_scc (cfg, NULL);
+  struct graph *ccfg = disconnect_sccs (cfg);
+
+  vec<vec<int>> entries {};
+  vec<vec<int>> exits {};
+  entries.safe_grow_cleared (nscc);
+  exits.safe_grow_cleared (nscc);
+
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    {
+      if (scc_entry_vertex_p (cfg, i))
+	entries[cfg->vertices[i].component].safe_push (i);
+      if (scc_exit_vertex_p (cfg, i))
+	exits[cfg->vertices[i].component].safe_push (i);
+    }
+
+  const int p10[] = { 10 };
+  const int p08[] = { 8 };
+  const int p03[] = { 3 };
+  const int p02[] = { 2 };
+  const int p01[] = { 1 };
+  const int p00[] = { 0 };
+  const int p26[] = { 2, 6 };
+
+  ASSERT_EQ (entries.length (), 6);
+  ASSERT_TRUE (any_equal_p (SLICE (p10), entries));
+  ASSERT_TRUE (any_equal_p (SLICE (p08), entries));
+  ASSERT_TRUE (any_equal_p (SLICE (p03), entries));
+  ASSERT_TRUE (any_equal_p (SLICE (p02), entries));
+  ASSERT_TRUE (any_equal_p (SLICE (p01), entries));
+  ASSERT_TRUE (any_equal_p (SLICE (p00), entries));
+
+  ASSERT_EQ (exits.length (), 6);
+  ASSERT_TRUE (any_equal_p (SLICE (p10), exits));
+  ASSERT_TRUE (any_equal_p (SLICE (p08), exits));
+  ASSERT_TRUE (any_equal_p (SLICE (p03), exits));
+  ASSERT_TRUE (any_equal_p (SLICE (p26), exits));
+  ASSERT_TRUE (any_equal_p (SLICE (p01), exits));
+  ASSERT_TRUE (any_equal_p (SLICE (p00), exits));
+
+  /* The result of disconnect_sccs () disconnects the graph into its strongly
+     connected components. The subgraphs are either singletons (a single
+     vertex with no edges) or graphs with cycles.  The SCC internal prime
+     paths can be found by running a DFS from every SCC vertex, terminating
+     on a duplicated vertex.  This may create some redundant paths still,
+     which must be filtered out.
+
+     Singletons can either be detected and skipped (requires counting the
+     components) or filtered after.  For this test case they are skipped
+     because other graph inconsistencies are easier to detect.  */
+
+  /* Count and check singleton components.  */
+  vec<int> scc_size {};
+  scc_size.safe_grow_cleared (nscc);
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    scc_size[cfg->vertices[i].component]++;
+  ASSERT_EQ (nscc, 6);
+  ASSERT_EQ (scc_size[0], 1);
+  ASSERT_EQ (scc_size[1], 1);
+  ASSERT_EQ (scc_size[2], 1);
+  ASSERT_EQ (scc_size[3], 6);
+  ASSERT_EQ (scc_size[4], 1);
+  ASSERT_EQ (scc_size[5], 1);
+
+  /* Manually unroll the loop finding the simple paths starting at the
+     vertices in the SCCs.  In this case there is only the one SCC.  */
+  trie ccfg_paths;
+  simple_paths (ccfg, ccfg_paths, 2);
+  simple_paths (ccfg, ccfg_paths, 4);
+  simple_paths (ccfg, ccfg_paths, 5);
+  simple_paths (ccfg, ccfg_paths, 6);
+  simple_paths (ccfg, ccfg_paths, 7);
+  simple_paths (ccfg, ccfg_paths, 9);
+  /* then in+out of trie */
+  vec<vec<int>> xscc_internal_pp = ccfg_paths.paths ();
+  trie scc_internal_pp;
+  for (auto &p : xscc_internal_pp)
+    scc_internal_pp.insert_with_suffix (p);
+
+  const int pp01[] = { 5, 7, 2, 4, 6, 9 };
+  const int pp02[] = { 4, 5, 7, 2, 4 };
+  const int pp03[] = { 4, 6, 9, 7, 2, 4 };
+  const int pp04[] = { 2, 4, 5, 7, 2 };
+  const int pp05[] = { 2, 4, 6, 9, 7, 2 };
+  const int pp06[] = { 5, 7, 2, 4, 5 };
+  const int pp07[] = { 6, 9, 7, 2, 4, 6 };
+  const int pp08[] = { 7, 2, 4, 5, 7 };
+  const int pp09[] = { 9, 7, 2, 4, 6, 9 };
+  const int pp10[] = { 7, 2, 4, 6, 9, 7 };
+  const int pp11[] = { 6, 9, 7, 2, 4, 5 };
+
+  ASSERT_EQ (count (scc_internal_pp), 11);
+  ASSERT_TRUE (scc_internal_pp.contains (pp01));
+  ASSERT_TRUE (scc_internal_pp.contains (pp02));
+  ASSERT_TRUE (scc_internal_pp.contains (pp03));
+  ASSERT_TRUE (scc_internal_pp.contains (pp04));
+  ASSERT_TRUE (scc_internal_pp.contains (pp05));
+  ASSERT_TRUE (scc_internal_pp.contains (pp06));
+  ASSERT_TRUE (scc_internal_pp.contains (pp07));
+  ASSERT_TRUE (scc_internal_pp.contains (pp08));
+  ASSERT_TRUE (scc_internal_pp.contains (pp09));
+  ASSERT_TRUE (scc_internal_pp.contains (pp10));
+  ASSERT_TRUE (scc_internal_pp.contains (pp11));
+
+  free_graph (ccfg);
+  free_graph (cfg);
+}
+
+/* The remaining tests deconstructs the algorithm and runs only a single phase
+   with good inputs at that point.  This makes it easier to pinpoint where
+   things go wrong, and helps show in steps how the algorithm works and the
+   phases relate.
+
+   The phases and their inputs and outputs are bazed on Fazli & Afsarchi.  */
+
+static void
+test_scc_internal_prime_paths ()
+{
+  /* This graph has only the SCC subgraph.  The result of running prime-paths
+     on it should be the SCC internal prime paths of the full graph.  */
+  struct graph *scc = new_graph (11);
+  add_edge (scc, 0, 2);
+  add_edge (scc, 2, 4);
+  add_edge (scc, 4, 5);
+  add_edge (scc, 4, 6);
+  add_edge (scc, 5, 7);
+  add_edge (scc, 6, 9);
+  add_edge (scc, 9, 7);
+  add_edge (scc, 7, 2);
+
+  vec<vec<int>> paths = prime_paths (scc);
+  const int p01[] = { 5, 7, 2, 4, 6, 9 };
+  const int p02[] = { 4, 6, 9, 7, 2, 4 };
+  const int p03[] = { 2, 4, 6, 9, 7, 2 };
+  const int p04[] = { 6, 9, 7, 2, 4, 6 };
+  const int p05[] = { 9, 7, 2, 4, 6, 9 };
+  const int p06[] = { 7, 2, 4, 6, 9, 7 };
+  const int p07[] = { 6, 9, 7, 2, 4, 5 };
+  const int p08[] = { 4, 5, 7, 2, 4 };
+  const int p09[] = { 2, 4, 5, 7, 2 };
+  const int p10[] = { 5, 7, 2, 4, 5 };
+  const int p11[] = { 7, 2, 4, 5, 7 };
+
+  ASSERT_TRUE (any_equal_p (SLICE (p01), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p02), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p03), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p04), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p05), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p06), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p07), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p08), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p09), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p10), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p11), paths));
+
+  release_vec_vec (paths);
+  free_graph (scc);
+}
+
+/* Test the entry/exit path helpers for the strongly connected component in
+   binary_search.  The SCC has one entry (2, the loop header) and two exits (2,
+   6, the loop exit and return).  */
+static void
+test_scc_entry_exit_paths ()
+{
+  struct graph *scc = new_graph (11);
+  add_edge (scc, 2, 4);
+  add_edge (scc, 4, 5);
+  add_edge (scc, 4, 6);
+  add_edge (scc, 5, 7);
+  add_edge (scc, 6, 9);
+  add_edge (scc, 9, 7);
+  add_edge (scc, 7, 2);
+
+  trie scc_internal_trie;
+  simple_paths (scc, scc_internal_trie, 2);
+  simple_paths (scc, scc_internal_trie, 4);
+  simple_paths (scc, scc_internal_trie, 5);
+  simple_paths (scc, scc_internal_trie, 6);
+  simple_paths (scc, scc_internal_trie, 7);
+  simple_paths (scc, scc_internal_trie, 9);
+  vec<vec<int>> scc_prime_paths = scc_internal_trie.paths ();
+
+  trie entry_exits {};
+  scc_entry_exit_paths (scc_prime_paths, 2, 2, entry_exits);
+  scc_entry_exit_paths (scc_prime_paths, 2, 6, entry_exits);
+
+  const int p01[] = { 2 };
+  const int p02[] = { 2, 4, 6 };
+
+  ASSERT_EQ (count (entry_exits), 2);
+  ASSERT_TRUE (entry_exits.contains (p01));
+  ASSERT_TRUE (entry_exits.contains (p02));
+
+  trie exits;
+  scc_exit_paths (scc_prime_paths, 2, exits);
+  scc_exit_paths (scc_prime_paths, 6, exits);
+
+  const int p03[] = { 4, 6, 9, 7, 2 };
+  const int p04[] = { 5, 7, 2, 4, 6 };
+  const int p05[] = { 9, 7, 2, 4, 6 };
+  const int p06[] = { 4, 5, 7, 2 };
+
+  ASSERT_EQ (count (exits), 4);
+  ASSERT_TRUE (exits.contains (p03));
+  ASSERT_TRUE (exits.contains (p04));
+  ASSERT_TRUE (exits.contains (p05));
+  ASSERT_TRUE (exits.contains (p06));
+
+  trie entries;
+  scc_entry_paths (scc_prime_paths, 2, entries);
+
+  const int p07[] = { 2, 4, 6, 9, 7 };
+  const int p08[] = { 2, 4, 5, 7 };
+  ASSERT_EQ (count (entries), 2);
+  ASSERT_TRUE (entries.contains (p07));
+  ASSERT_TRUE (entries.contains (p08));
+
+  free_graph (scc);
+  release_vec_vec (scc_prime_paths);
+  exits.release ();
+  entry_exits.release ();
+}
+
+static void
+test_complete_prime_paths ()
+{
+  const int ccfgpp0[] = { 0, 1, 2, 3, 5 };
+  const int ccfgpp1[] = { 0, 1, 2, 4, 5 };
+  trie ccfg_prime_paths {};
+  ccfg_prime_paths.insert (ccfgpp0);
+  ccfg_prime_paths.insert (ccfgpp1);
+
+  const int scc0[] = { 2 };
+  const int scc1[] = { 2, 4, 6 };
+
+  struct graph *cfg = binary_search_cfg ();
+  const int ccfg_single[] = { 0, 1, 3, 8, 10 };
+
+  vec<trie> ccfg_paths {};
+  ccfg_paths.safe_grow_cleared (6);
+  ccfg_paths[0].insert (array_slice <const int> (ccfg_single + 0, 1));
+  ccfg_paths[1].insert (array_slice <const int> (ccfg_single + 1, 1));
+  ccfg_paths[3].insert (array_slice <const int> (ccfg_single + 2, 1));
+  ccfg_paths[4].insert (array_slice <const int> (ccfg_single + 3, 1));
+  ccfg_paths[5].insert (array_slice <const int> (ccfg_single + 4, 1));
+
+  /* The paths for the SCC would need to be updated in ccfg pre pass.  This
+     feels like a clumsy interface and should maybe be disconnected from the
+     struct graph.  */
+  ccfg_paths[2].hard_insert (scc0);
+  ccfg_paths[2].hard_insert (scc1);
+
+  sbitmap *sbmvec = sbitmap_vector_alloc (6, cfg->n_vertices);
+  bitmap_vector_clear (sbmvec, 6);
+  bitmap_set_bit (sbmvec[0], 0);
+  bitmap_set_bit (sbmvec[1], 1);
+  bitmap_set_bit (sbmvec[3], 3);
+  bitmap_set_bit (sbmvec[4], 8);
+  bitmap_set_bit (sbmvec[5], 10);
+
+  bitmap_set_bit (sbmvec[2], 2);
+  bitmap_set_bit (sbmvec[2], 4);
+  bitmap_set_bit (sbmvec[2], 6);
+
+  trie cpp = cfg_complete_prime_paths (cfg, sbmvec, ccfg_paths,
+					   ccfg_prime_paths);
+
+  const int p01[] = { 0, 1, 2, 3, 10 };
+  const int p02[] = { 0, 1, 2, 4, 6, 8, 10 };
+
+  ASSERT_EQ (count (cpp), 2);
+  ASSERT_TRUE (cpp.contains(p01));
+  ASSERT_TRUE (cpp.contains(p02));
+
+  free_graph (cfg);
+  ccfg_paths.release ();
+  cpp.release ();
+}
+
+static vec<int>
+binary_search_scc_map ()
+{
+  vec<int> sccs {};
+  sccs.safe_grow (11);
+  sccs[0] = 0;
+  sccs[1] = 1;
+  sccs[2] = 2;
+  sccs[3] = 3;
+  sccs[4] = 2;
+  sccs[5] = 2;
+  sccs[6] = 2;
+  sccs[7] = 2;
+  sccs[8] = 4;
+  sccs[9] = 2;
+  sccs[10] = 5;
+  return sccs;
+}
+
+static void
+test_exit_prime_paths ()
+{
+  const int cpp01[] = { 0, 1, 2, 3, 10 };
+  const int cpp02[] = { 0, 1, 2, 4, 6, 8, 10 };
+  trie complete_prime_paths {};
+  complete_prime_paths.insert_with_suffix (cpp01);
+  complete_prime_paths.insert_with_suffix (cpp02);
+
+  const int ep01[] = { 4, 6, 9, 7, 2 };
+  const int ep02[] = { 5, 7, 2, 4, 6 };
+  const int ep03[] = { 9, 7, 2, 4, 6 };
+  const int ep04[] = { 4, 5, 7, 2 };
+  trie exit_paths;
+  exit_paths.insert (ep01);
+  exit_paths.insert (ep02);
+  exit_paths.insert (ep03);
+  exit_paths.insert (ep04);
+
+  vec<int> sccs = binary_search_scc_map ();
+  trie epp = scc_exit_prime_paths (sccs, exit_paths, complete_prime_paths);
+
+  const int pp01[] = { 4, 6, 9, 7, 2, 3, 10 };
+  const int pp02[] = { 5, 7, 2, 4, 6, 8, 10 };
+  const int pp03[] = { 9, 7, 2, 4, 6, 8, 10 };
+  const int pp04[] = { 4, 5, 7, 2, 3, 10 };
+
+  ASSERT_EQ (count (epp), 4);
+  ASSERT_TRUE (epp.contains (pp01));
+  ASSERT_TRUE (epp.contains (pp02));
+  ASSERT_TRUE (epp.contains (pp03));
+  ASSERT_TRUE (epp.contains (pp04));
+
+  sccs.release ();
+  epp.release ();
+  exit_paths.release ();
+}
+
+static void
+test_entry_prime_paths ()
+{
+  struct graph *cfg = binary_search_cfg ();
+
+  static int sccep01[] = { 2, 4, 6, 9, 7 };
+  static int sccep02[] = { 2, 4, 5, 7 };
+  trie scc_entry_paths;
+  scc_entry_paths.insert (sccep01);
+  scc_entry_paths.insert (sccep02);
+
+  const int cpp01[] = { 0, 1, 2, 3, 10 };
+  const int cpp02[] = { 0, 1, 2, 4, 6, 8, 10 };
+  trie complete_prime_paths {};
+  complete_prime_paths.insert (cpp01);
+  complete_prime_paths.insert (cpp02);
+
+  const int ep01[] = { 4, 6, 9, 7, 2, 3, 10 };
+  const int ep02[] = { 5, 7, 2, 4, 6, 8, 10 };
+  const int ep03[] = { 9, 7, 2, 4, 6, 8, 10 };
+  const int ep04[] = { 4, 5, 7, 2, 3, 10 };
+  trie exit_prime_paths {};
+  exit_prime_paths.insert (ep01);
+  exit_prime_paths.insert (ep02);
+  exit_prime_paths.insert (ep03);
+  exit_prime_paths.insert (ep04);
+
+  vec<int> sccs = binary_search_scc_map ();
+
+  trie epp;
+  scc_entry_prime_paths (cfg, scc_entry_paths, complete_prime_paths,
+			 exit_prime_paths, epp);
+
+  /* The 0 (start node) does not show up in the Fazli & Afschardi paper and
+     kinda, but has no real impact on the result.  The prime-paths functions
+     do not care about these vertices, but the path-coverage instrumentation
+     filters out the ENTRY/EXIT blocks from all the paths.  */
+  const int pp01[] = { 0, 1, 2, 4, 6, 9, 7 };
+  const int pp02[] = { 0, 1, 2, 4, 5, 7 };
+  ASSERT_EQ (count (epp), 2);
+  ASSERT_TRUE (epp.contains (pp01));
+  ASSERT_TRUE (epp.contains (pp02));
+}
+
+static void
+test_prime_path_algorithm_equivalence ()
+{
+  struct graph *cfg = binary_search_cfg ();
+  auto fast = prime_paths (cfg);
+  auto slow = reference_prime_paths (cfg);
+
+  ASSERT_EQ (fast.length (), slow.length ());
+  for (const vec<int>& path : fast)
+    ASSERT_TRUE (any_equal_p (path, slow));
+  for (const vec<int>& path : slow)
+    ASSERT_TRUE (any_equal_p (path, fast));
+
+  free_graph (cfg);
+}
+
+/* A straight-line graph with one vertex should yield a single path of length 1
+   with just the non-exit non-entry node in it.  */
+void
+test_singleton_path ()
+{
+  struct graph *cfg = new_graph (3);
+  add_edge (cfg, 0, 2);
+  add_edge (cfg, 2, 1);
+
+  auto best = prime_paths (cfg);
+  auto slow = reference_prime_paths (cfg);
+
+  ASSERT_EQ (best.length (), 1);
+  ASSERT_EQ (best[0].length (), 3);
+  ASSERT_EQ (best[0][0], 0);
+  ASSERT_EQ (best[0][1], 2);
+  ASSERT_EQ (best[0][2], 1);
+
+  ASSERT_EQ (slow.length (), 1);
+  ASSERT_EQ (slow[0].length (), 3);
+  ASSERT_EQ (slow[0][0], 0);
+  ASSERT_EQ (slow[0][1], 2);
+  ASSERT_EQ (slow[0][2], 1);
+
+  free_graph (cfg);
+}
+
+void
+path_coverage_cc_tests ()
+{
+  test_prime_paths ();
+  test_build_ccfg ();
+  test_split_components ();
+  test_scc_internal_prime_paths ();
+  test_scc_entry_exit_paths ();
+  test_complete_prime_paths ();
+  test_exit_prime_paths ();
+  test_entry_prime_paths ();
+  test_prime_path_algorithm_equivalence ();
+  test_singleton_path ();
+}
+
+} // namespace selftest
+
+#endif /* #if CHECKING_P */
diff --git a/gcc/profile.cc b/gcc/profile.cc
index 25d4f4a4b86..c3d06c4d387 100644
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -1545,7 +1545,7 @@ branch_prob (bool thunk)
 
   remove_fake_edges ();
 
-  if (condition_coverage_flag || profile_arc_flag)
+  if (condition_coverage_flag || path_coverage_flag || profile_arc_flag)
       gimple_init_gcov_profiler ();
 
   if (condition_coverage_flag)
@@ -1597,6 +1597,10 @@ branch_prob (bool thunk)
 	instrument_values (values);
     }
 
+  void find_paths(struct function*);
+  if (path_coverage_flag)
+    find_paths(cfun);
+
   free_aux_for_edges ();
 
   values.release ();
diff --git a/gcc/selftest-run-tests.cc b/gcc/selftest-run-tests.cc
index e6779206c47..9361e43ccdf 100644
--- a/gcc/selftest-run-tests.cc
+++ b/gcc/selftest-run-tests.cc
@@ -105,6 +105,7 @@ selftest::run_tests ()
   diagnostic_path_cc_tests ();
   simple_diagnostic_path_cc_tests ();
   attribs_cc_tests ();
+  path_coverage_cc_tests ();
 
   /* This one relies on most of the above.  */
   function_tests_cc_tests ();
diff --git a/gcc/selftest.h b/gcc/selftest.h
index dcb1463ed90..4df9c0dd836 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -241,6 +241,7 @@ extern void json_cc_tests ();
 extern void optinfo_emit_json_cc_tests ();
 extern void opts_cc_tests ();
 extern void ordered_hash_map_tests_cc_tests ();
+extern void path_coverage_cc_tests ();
 extern void predict_cc_tests ();
 extern void pretty_print_cc_tests ();
 extern void range_tests ();
diff --git a/gcc/testsuite/gcc.misc-tests/gcov-29.c b/gcc/testsuite/gcc.misc-tests/gcov-29.c
new file mode 100644
index 00000000000..283ecf45783
--- /dev/null
+++ b/gcc/testsuite/gcc.misc-tests/gcov-29.c
@@ -0,0 +1,862 @@
+/* { dg-options "--coverage -fpath-coverage" } */
+/* { dg-do run { target native } } */
+
+void
+pathcov001a ()
+{
+  /* Empty functions should not be problematic.  */
+}
+
+/* Straight line, which should have only one path.  */
+/* BEGIN paths
+   summary: 1/1
+*/
+void
+pathcov001b ()
+/* END */
+{
+  int a = 0;
+  ++a;
+}
+
+/* Same as b, but not executed.  */
+/* BEGIN paths
+   summary: 0/1
+   expect: :33
+*/
+void
+pathcov001c ()
+/* END */
+{
+  int a = 0;
+  ++a;
+}
+
+/* 002 is a simple baseline test, with no complicated control flow and no
+   loops, run with different combinations of inputs that tests the paths in
+   isolation.  */
+/* BEGIN paths
+   summary: 0/2
+   expect: :48(true) :49 :52
+   expect: :48(false) :51 :52
+*/
+void
+pathcov002a (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* BEGIN paths
+    summary: 1/2
+    expect: :63(false) :66 :67
+*/
+void
+pathcov002c (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* BEGIN paths
+   summary: 1/2
+   expect: :78(true) :79 :82
+*/
+void
+pathcov002b (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* Identical to 002*, but run for both inputs.  This should achieve full
+   coverage.
+
+   BEGIN paths
+    summary: 2/2
+*/
+void
+pathcov002d (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* Test individual control flow structures in isolation.  */
+
+/* BEGIN paths
+   summary: 0/2
+   expect: :112(true) :113 :114
+   expect: :112(false) :114
+*/
+void
+pathcov003a (int a)
+/* END */
+{
+  if (a)
+    a++;
+}
+
+/* BEGIN paths
+   summary: 0/2
+   expect: :125(true) :126 :129
+   expect: :125(false) :128 :129
+*/
+void
+pathcov003b (int a)
+/* END */
+{
+  if (a)
+    a++;
+  else
+    a--;
+}
+
+/* BEGIN paths
+   summary: 0/3
+   expect: :141(true) :142 :147
+   expect: :141(false) :143(true) :144 :147
+   expect: :141(false) :143(false) :146 :147
+*/
+void
+pathcov003c (int a)
+/* END */
+{
+  if (a > 10)
+    a++;
+  else if (a > 20)
+    a--;
+  else
+    a += 2;
+}
+
+/* BEGIN paths
+   summary: 0/5
+   expect: :162 :162(true) :163
+   expect: :162 :162(false) :164
+   expect: :163 :162(true) :163
+   expect: :163 :162(false) :164
+   expect: :162(true) :163 :162
+*/
+void
+pathcov003d (int a)
+/* END */
+{
+  int x = 0;
+  for (int i = 0; i < a; ++i)
+    ++x;
+}
+
+/* BEGIN paths
+   summary: 0/5
+   expect: :180 :180(true) :181
+   expect: :180 :180(false) :182
+   expect: :181 :180(true) :181
+   expect: :181 :180(false) :182
+   expect: :180(true) :181 :180
+*/
+void
+pathcov003e (int a)
+/* END */
+{
+  int x = 0;
+  int i = 0;
+  while (i++ < a)
+    x++;
+}
+
+/* BEGIN paths
+   summary: 0/2
+   expect: :194 :197(false) :198
+   expect: :197(true) :197
+*/
+void
+pathcov003f (int a)
+/* END */
+{
+  int x = 0;
+  int i = 0;
+  do {
+      x++;
+  } while (i++ < a);
+}
+
+/* BEGIN paths
+   summary: 0/5
+   expect: :213 :216(true) :220
+   expect: :213 :216(false) :222
+   expect: :216(true) :220 :216
+   expect: :220 :216(true) :220
+   expect: :220 :216(false) :222
+*/
+void
+pathcov003g (int a)
+/* END */
+{
+  int i = 0;
+  int x = 0;
+
+top:
+  if (i < a)
+    {
+      x++;
+      i++;
+      goto top;
+    }
+}
+
+/* This example has a good mix of control flow structures which makes it nice
+   at identifying problems with the bookeeping/instrumentation.  */
+
+/* BEGIN paths
+   summary: 0/9
+   expect: :243(false) :247 :247(true) :247(true) :249(true) :250 :253
+   expect: :243(false) :247 :247(true) :247(false) :253
+   expect: :243(false) :247 :247(false) :253
+   expect: :243(true) :253
+   expect: :249(false) :247(true) :247(true) :249
+   expect: :249(false) :247(true) :247(false) :253
+   expect: :247(true) :247(true) :249(false) :247
+   expect: :247(true) :249(false) :247(true) :247
+   expect: :247(true) :249(false) :247(false) :253
+*/
+void
+pathcov004a (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* BEGIN paths
+   args: (1, 0, 0, 0)
+   summary: 1/9 */
+void
+pathcov004b (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* BEGIN paths
+   args: (0, 0, 0, 0)
+   summary: 1/9 */
+void
+pathcov004c (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* BEGIN paths
+   args: (0, 1, 0, 0)
+   summary: 1/9 */
+void
+pathcov004d (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* BEGIN paths
+   args: (0, 1, 1, 0)
+   summary: 2/9 */
+void
+pathcov004e (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* BEGIN paths
+   args: (0, 2, 1, 0)
+   summary: 3/9 */
+void
+pathcov004f (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* Funny loop exits.  */
+
+/* BEGIN paths
+   summary: 0/14
+*/
+void
+pathcov005a (int a, int b, int c)
+/* END */
+{
+  while (a)
+    {
+      if (b)
+	return;
+      while (c--)
+	a++;
+    }
+}
+
+void
+pathcov005b (int a, int b, int c)
+/* END */
+{
+  if (a)
+    goto loop;
+
+  while (b > 0)
+    {
+      b--;
+loop:
+      while (c--)
+	a++;
+    }
+}
+
+void
+pathcov005c (int a, int b, int c)
+/* END */
+{
+  while (a-- > 0) c++;
+  while (b-- > 0) c++;
+}
+
+/* BEGIN paths
+   summary: 0/67
+
+   This is a sanity check and baseline and should not be executed.
+
+   With >64 cases we should have >64 paths which guarantees we cannot fit the
+   full bitset within a in a single gcov type.  We want to only update the
+   relevant counters because extra instructions are expensive in compile time
+   and binary bloat, and verify that only taken paths are recorded.  */
+void
+pathcov006a (int a)
+/* END */
+{
+  int x = 0;
+  switch (a)
+  {
+    case 0: x++; break;
+    case 1: x++; break;
+    case 2: x++; break;
+    case 3: x++; break;
+    case 4: x++; break;
+    case 5: x++; break;
+    case 6: x++; break;
+    case 7: x++; break;
+    case 8: x++; break;
+    case 9: x++; break;
+    case 10: x++; break;
+    case 11: x++; break;
+    case 12: x++; break;
+    case 13: x++; break;
+    case 14: x++; break;
+    case 15: x++; break;
+    case 16: x++; break;
+    case 17: x++; break;
+    case 18: x++; break;
+    case 19: x++; break;
+    case 20: x++; break;
+    case 21: x++; break;
+    case 22: x++; break;
+    case 23: x++; break;
+    case 24: x++; break;
+    case 25: x++; break;
+    case 26: x++; break;
+    case 27: x++; break;
+    case 28: x++; break;
+    case 29: x++; break;
+    case 30: x++; break;
+    case 31: x++; break;
+    case 32: x++; break;
+    case 33: x++; break;
+    case 34: x++; break;
+    case 35: x++; break;
+    case 36: x++; break;
+    case 37: x++; break;
+    case 38: x++; break;
+    case 39: x++; break;
+    case 40: x++; break;
+    case 41: x++; break;
+    case 42: x++; break;
+    case 43: x++; break;
+    case 44: x++; break;
+    case 45: x++; break;
+    case 46: x++; break;
+    case 47: x++; break;
+    case 48: x++; break;
+    case 49: x++; break;
+    case 50: x++; break;
+    case 51: x++; break;
+    case 52: x++; break;
+    case 53: x++; break;
+    case 54: x++; break;
+    case 55: x++; break;
+    case 56: x++; break;
+    case 57: x++; break;
+    case 58: x++; break;
+    case 59: x++; break;
+    case 60: x++; break;
+    case 61: x++; break;
+    case 62: x++; break;
+    case 63: x++; break;
+    case 64: x++; break;
+    case 65: x++; break;
+  }
+}
+
+/* BEGIN paths
+   args: (0)
+   summary: 1/67
+*/
+void
+pathcov006b (int a)
+/* END */
+{
+  int x = 0;
+  switch (a)
+  {
+    case 0: x++; break;
+    case 1: x++; break;
+    case 2: x++; break;
+    case 3: x++; break;
+    case 4: x++; break;
+    case 5: x++; break;
+    case 6: x++; break;
+    case 7: x++; break;
+    case 8: x++; break;
+    case 9: x++; break;
+    case 10: x++; break;
+    case 11: x++; break;
+    case 12: x++; break;
+    case 13: x++; break;
+    case 14: x++; break;
+    case 15: x++; break;
+    case 16: x++; break;
+    case 17: x++; break;
+    case 18: x++; break;
+    case 19: x++; break;
+    case 20: x++; break;
+    case 21: x++; break;
+    case 22: x++; break;
+    case 23: x++; break;
+    case 24: x++; break;
+    case 25: x++; break;
+    case 26: x++; break;
+    case 27: x++; break;
+    case 28: x++; break;
+    case 29: x++; break;
+    case 30: x++; break;
+    case 31: x++; break;
+    case 32: x++; break;
+    case 33: x++; break;
+    case 34: x++; break;
+    case 35: x++; break;
+    case 36: x++; break;
+    case 37: x++; break;
+    case 38: x++; break;
+    case 39: x++; break;
+    case 40: x++; break;
+    case 41: x++; break;
+    case 42: x++; break;
+    case 43: x++; break;
+    case 44: x++; break;
+    case 45: x++; break;
+    case 46: x++; break;
+    case 47: x++; break;
+    case 48: x++; break;
+    case 49: x++; break;
+    case 50: x++; break;
+    case 51: x++; break;
+    case 52: x++; break;
+    case 53: x++; break;
+    case 54: x++; break;
+    case 55: x++; break;
+    case 56: x++; break;
+    case 57: x++; break;
+    case 58: x++; break;
+    case 59: x++; break;
+    case 60: x++; break;
+    case 61: x++; break;
+    case 62: x++; break;
+    case 63: x++; break;
+    case 64: x++; break;
+    case 65: x++; break;
+  }
+}
+
+/* BEGIN paths
+   args: (64)
+   summary: 1/67 */
+void
+pathcov006c (int a)
+/* END */
+{
+  int x = 0;
+  switch (a)
+  {
+    case 0: x++; break;
+    case 1: x++; break;
+    case 2: x++; break;
+    case 3: x++; break;
+    case 4: x++; break;
+    case 5: x++; break;
+    case 6: x++; break;
+    case 7: x++; break;
+    case 8: x++; break;
+    case 9: x++; break;
+    case 10: x++; break;
+    case 11: x++; break;
+    case 12: x++; break;
+    case 13: x++; break;
+    case 14: x++; break;
+    case 15: x++; break;
+    case 16: x++; break;
+    case 17: x++; break;
+    case 18: x++; break;
+    case 19: x++; break;
+    case 20: x++; break;
+    case 21: x++; break;
+    case 22: x++; break;
+    case 23: x++; break;
+    case 24: x++; break;
+    case 25: x++; break;
+    case 26: x++; break;
+    case 27: x++; break;
+    case 28: x++; break;
+    case 29: x++; break;
+    case 30: x++; break;
+    case 31: x++; break;
+    case 32: x++; break;
+    case 33: x++; break;
+    case 34: x++; break;
+    case 35: x++; break;
+    case 36: x++; break;
+    case 37: x++; break;
+    case 38: x++; break;
+    case 39: x++; break;
+    case 40: x++; break;
+    case 41: x++; break;
+    case 42: x++; break;
+    case 43: x++; break;
+    case 44: x++; break;
+    case 45: x++; break;
+    case 46: x++; break;
+    case 47: x++; break;
+    case 48: x++; break;
+    case 49: x++; break;
+    case 50: x++; break;
+    case 51: x++; break;
+    case 52: x++; break;
+    case 53: x++; break;
+    case 54: x++; break;
+    case 55: x++; break;
+    case 56: x++; break;
+    case 57: x++; break;
+    case 58: x++; break;
+    case 59: x++; break;
+    case 60: x++; break;
+    case 61: x++; break;
+    case 62: x++; break;
+    case 63: x++; break;
+    case 64: x++; break;
+    case 65: x++; break;
+  }
+}
+
+/* BEGIN paths
+   args: (2, 65)
+   summary: 2/67
+
+   In this case we should record a path in both halves of the accumulator
+   bitsets.  Note that the paths don't overlap with the single-half examples in
+   006b and 006c to reduce the chance of accidental passes.  */
+void
+pathcov006d (int a)
+/* END */
+{
+  int x = 0;
+  switch (a)
+  {
+    case 0: x++; break;
+    case 1: x++; break;
+    case 2: x++; break;
+    case 3: x++; break;
+    case 4: x++; break;
+    case 5: x++; break;
+    case 6: x++; break;
+    case 7: x++; break;
+    case 8: x++; break;
+    case 9: x++; break;
+    case 10: x++; break;
+    case 11: x++; break;
+    case 12: x++; break;
+    case 13: x++; break;
+    case 14: x++; break;
+    case 15: x++; break;
+    case 16: x++; break;
+    case 17: x++; break;
+    case 18: x++; break;
+    case 19: x++; break;
+    case 20: x++; break;
+    case 21: x++; break;
+    case 22: x++; break;
+    case 23: x++; break;
+    case 24: x++; break;
+    case 25: x++; break;
+    case 26: x++; break;
+    case 27: x++; break;
+    case 28: x++; break;
+    case 29: x++; break;
+    case 30: x++; break;
+    case 31: x++; break;
+    case 32: x++; break;
+    case 33: x++; break;
+    case 34: x++; break;
+    case 35: x++; break;
+    case 36: x++; break;
+    case 37: x++; break;
+    case 38: x++; break;
+    case 39: x++; break;
+    case 40: x++; break;
+    case 41: x++; break;
+    case 42: x++; break;
+    case 43: x++; break;
+    case 44: x++; break;
+    case 45: x++; break;
+    case 46: x++; break;
+    case 47: x++; break;
+    case 48: x++; break;
+    case 49: x++; break;
+    case 50: x++; break;
+    case 51: x++; break;
+    case 52: x++; break;
+    case 53: x++; break;
+    case 54: x++; break;
+    case 55: x++; break;
+    case 56: x++; break;
+    case 57: x++; break;
+    case 58: x++; break;
+    case 59: x++; break;
+    case 60: x++; break;
+    case 61: x++; break;
+    case 62: x++; break;
+    case 63: x++; break;
+    case 64: x++; break;
+    case 65: x++; break;
+  }
+}
+
+/* BEGIN paths
+   args: (66)
+   summary: 1/67
+
+   This case should hit the empty default.  */
+void
+pathcov006e (int a)
+/* END */
+{
+  int x = 0;
+  switch (a)
+  {
+    case 0: x++; break;
+    case 1: x++; break;
+    case 2: x++; break;
+    case 3: x++; break;
+    case 4: x++; break;
+    case 5: x++; break;
+    case 6: x++; break;
+    case 7: x++; break;
+    case 8: x++; break;
+    case 9: x++; break;
+    case 10: x++; break;
+    case 11: x++; break;
+    case 12: x++; break;
+    case 13: x++; break;
+    case 14: x++; break;
+    case 15: x++; break;
+    case 16: x++; break;
+    case 17: x++; break;
+    case 18: x++; break;
+    case 19: x++; break;
+    case 20: x++; break;
+    case 21: x++; break;
+    case 22: x++; break;
+    case 23: x++; break;
+    case 24: x++; break;
+    case 25: x++; break;
+    case 26: x++; break;
+    case 27: x++; break;
+    case 28: x++; break;
+    case 29: x++; break;
+    case 30: x++; break;
+    case 31: x++; break;
+    case 32: x++; break;
+    case 33: x++; break;
+    case 34: x++; break;
+    case 35: x++; break;
+    case 36: x++; break;
+    case 37: x++; break;
+    case 38: x++; break;
+    case 39: x++; break;
+    case 40: x++; break;
+    case 41: x++; break;
+    case 42: x++; break;
+    case 43: x++; break;
+    case 44: x++; break;
+    case 45: x++; break;
+    case 46: x++; break;
+    case 47: x++; break;
+    case 48: x++; break;
+    case 49: x++; break;
+    case 50: x++; break;
+    case 51: x++; break;
+    case 52: x++; break;
+    case 53: x++; break;
+    case 54: x++; break;
+    case 55: x++; break;
+    case 56: x++; break;
+    case 57: x++; break;
+    case 58: x++; break;
+    case 59: x++; break;
+    case 60: x++; break;
+    case 61: x++; break;
+    case 62: x++; break;
+    case 63: x++; break;
+    case 64: x++; break;
+    case 65: x++; break;
+    default:
+  }
+}
+
+void *alloc (int) { return 0; }
+void *getcwd (void *, int) { return 0; }
+void release (void *) {}
+
+/* Based on gnu_getcwd from tree.c  */
+void *gnu_getcwd()
+{
+  int size = 100;
+  void *buffer = alloc (size);
+
+  while (1) {
+    void *value = getcwd (buffer, size);
+    if (value != 0) return buffer;
+    size *= 2;
+    release (buffer);
+    buffer = (void *) alloc (size);
+  }
+}
+
+void loopy (int a)
+{
+  goto mid;
+  while (1)
+    {
+      a--;
+    mid:
+      a--;
+      if (a < -5)
+	break;
+      a--;
+    }
+}
+
+int
+main ()
+{
+  pathcov001a ();
+  pathcov001b ();
+  /* not called: pathcov001c (); */
+
+  /* not called: pathcov002a (); */
+  pathcov002b (0);
+  pathcov002c (1);
+  pathcov002d (0);
+  pathcov002d (1);
+
+  pathcov004b (1, 0, 0, 0);
+  pathcov004c (0, 0, 0, 0);
+  pathcov004d (0, 1, 0, 0);
+  pathcov004e (0, 1, 1, 0);
+  pathcov004f (0, 2, 1, 0);
+
+  /* not called: pathcov006a (); */
+  pathcov006b (0);
+  pathcov006c (64);
+  pathcov006d (2);
+  pathcov006d (65);
+  pathcov006e (66);
+}
+
+/* { dg-final { run-gcov prime-paths { --prime-paths gcov-25.c } } } */
+
diff --git a/gcc/testsuite/lib/gcov.exp b/gcc/testsuite/lib/gcov.exp
index e49f1011884..a8be70a5566 100644
--- a/gcc/testsuite/lib/gcov.exp
+++ b/gcc/testsuite/lib/gcov.exp
@@ -533,6 +533,89 @@ proc verify-filters { testname testcase file expected unexpected } {
     return [expr [llength $expected] + [llength $unexpected]]
 }
 
+proc verify-prime-paths { testname testcase file } {
+    set failed 0
+    set fd [open $file r]
+
+    set expected_n -1
+    set expected_m -1
+    set recording 0
+    set expected ""
+
+    while { [gets $fd line] >= 0 } {
+        regexp "^\[^:\]+: *(\[0-9\]+):" "$line" all lineno
+        set prefix "$testname line $lineno"
+
+        if {[regexp "BEGIN *paths" $line]} {
+            set recording 1
+            set expected ""
+            set expected_n -1
+            set expected_m -1
+            set seen ""
+            continue
+        }
+        
+        if { $recording != 1 } {
+            continue
+        }
+
+        if [regexp {summary: *(\d+)/(\d+)} $line _ n m] {
+            set expected_n $n
+            set expected_m $m
+        }
+
+        if [regexp "expect: *(.*)" $line all ln] {
+            set cases ""
+            set ln [regsub -all {\s+} $ln " "]
+            foreach case [split  $ln " "] {
+                lappend cases $case
+            }
+            lappend expected $cases
+        }
+
+        if [regexp "END" $line] {
+            if {$recording != 1} {
+                incr failed
+                fail "unexpected END at line $lineno, missing BEGIN"
+
+                # Abort the test if there is a mismatch, to avoid creating
+                # unecessary errors.  At this point the test itself is broken.
+                break
+            }
+            set recording 0
+
+            if {[llength $expected] > 0} {
+                incr failed
+                fail "expected: '$expected'"
+            }
+        }
+
+        if [regexp {paths covered (\d+) of (\d+)} $line _ n m] {
+            if { $n ne $expected_n || $m ne $expected_m } {
+                incr failed
+                fail "$prefix: expected $expected_n/$expected_m covered paths, was $n/$m"
+            }
+        }
+
+        if [regexp {path *\d+ not covered: (.*)} $line _ path] {
+            set pathl ""
+            foreach ln [split $path " "] {
+                if [regexp {.*(:.*) *} $ln _ key] {
+                    lappend pathl $key
+                }
+            }
+            set i [lsearch $expected $pathl]
+            set expected [lreplace $expected $i $i]
+        }
+    }
+    #incr failed
+
+    # if recording: fail () (unterminated)
+
+    close $fd
+    return $failed
+}
+
 proc gcov-pytest-format-line { args } {
     global subdir
 
@@ -606,6 +689,7 @@ proc run-gcov { args } {
     set gcov_verify_calls 0
     set gcov_verify_branches 0
     set gcov_verify_conditions 0
+    set gcov_verify_prime_paths 0
     set gcov_verify_lines 1
     set gcov_verify_intermediate 0
     set gcov_verify_filters 0
@@ -623,6 +707,8 @@ proc run-gcov { args } {
 	  set gcov_verify_filters 1
 	  set verify_filters_expected [lindex $a 1]
 	  set verify_filters_unexpected [lindex $a 2]
+	} elseif { $a == "prime-paths" } {
+	  set gcov_verify_prime_paths 1
 	} elseif { $a == "intermediate" } {
 	  set gcov_verify_intermediate 1
 	  set gcov_verify_calls 0
@@ -703,6 +789,11 @@ proc run-gcov { args } {
     } else {
 	set cdfailed 0
     }
+    if { $gcov_verify_prime_paths } {
+	set ppfailed [verify-prime-paths $testname $testcase $testcase.gcov]
+    } else {
+	set ppfailed 0
+    }
     if { $gcov_verify_calls } {
 	set cfailed [verify-calls $testname $testcase $testcase.gcov]
     } else {
@@ -722,12 +813,12 @@ proc run-gcov { args } {
 
     # Report whether the gcov test passed or failed.  If there were
     # multiple failures then the message is a summary.
-    set tfailed [expr $lfailed + $bfailed + $cdfailed + $cfailed + $ifailed + $ffailed]
+    set tfailed [expr $lfailed + $bfailed + $cdfailed + $ppfailed + $cfailed + $ifailed + $ffailed]
     if { $xfailed } {
 	setup_xfail "*-*-*"
     }
     if { $tfailed > 0 } {
-	fail "$testname gcov: $lfailed failures in line counts, $bfailed in branch percentages, $cdfailed in condition/decision, $cfailed in return percentages, $ifailed in intermediate format, $ffailed failed in filters"
+	fail "$testname gcov: $lfailed failures in line counts, $bfailed in branch percentages, $cdfailed in condition/decision, $ppfailed in prime-paths, $cfailed in return percentages, $ifailed in intermediate format, $ffailed failed in filters"
 	if { $xfailed } {
 	    clean-gcov $testcase
 	}
diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index 18f48e8d04e..f2f72e8e086 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -1911,7 +1911,7 @@ tree_profiling (void)
 	  thunk = true;
 	  /* When generate profile, expand thunk to gimple so it can be
 	     instrumented same way as other functions.  */
-	  if (profile_arc_flag || condition_coverage_flag)
+	  if (profile_arc_flag || condition_coverage_flag || path_coverage_flag)
 	    expand_thunk (node, false, true);
 	  /* Read cgraph profile but keep function as thunk at profile-use
 	     time.  */
@@ -1956,7 +1956,8 @@ tree_profiling (void)
   release_profile_file_filtering ();
 
   /* Drop pure/const flags from instrumented functions.  */
-  if (profile_arc_flag || condition_coverage_flag || flag_test_coverage)
+  if (profile_arc_flag || condition_coverage_flag || path_coverage_flag
+      || flag_test_coverage)
     FOR_EACH_DEFINED_FUNCTION (node)
       {
 	if (!gimple_has_body_p (node->decl)
@@ -1988,7 +1989,8 @@ tree_profiling (void)
 
       push_cfun (DECL_STRUCT_FUNCTION (node->decl));
 
-      if (profile_arc_flag || condition_coverage_flag || flag_test_coverage)
+      if (profile_arc_flag || condition_coverage_flag || path_coverage_flag
+	  || flag_test_coverage)
 	FOR_EACH_BB_FN (bb, cfun)
 	  {
 	    gimple_stmt_iterator gsi;
@@ -2073,7 +2075,8 @@ pass_ipa_tree_profile::gate (function *)
      disabled.  */
   return (!in_lto_p && !flag_auto_profile
 	  && (flag_branch_probabilities || flag_test_coverage
-	      || profile_arc_flag || condition_coverage_flag)
+	      || profile_arc_flag || condition_coverage_flag
+	      || path_coverage_flag)
 	  && !seen_error ());
 }
 
-- 
2.39.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] Record edge true/false value for gcov
  2024-06-25  8:04 [PATCH 1/2] Record edge true/false value for gcov Jørgen Kvalsvik
  2024-06-25  8:04 ` [PATCH 2/2] [RFC] Prime path coverage in gcc/gcov Jørgen Kvalsvik
@ 2024-06-25 21:37 ` Jeff Law
  2024-06-26 10:20   ` Jørgen Kvalsvik
  1 sibling, 1 reply; 4+ messages in thread
From: Jeff Law @ 2024-06-25 21:37 UTC (permalink / raw)
  To: Jørgen Kvalsvik, gcc-patches; +Cc: hubicka



On 6/25/24 2:04 AM, Jørgen Kvalsvik wrote:
> Make gcov aware which edges are the true/false to more accurately
> reconstruct the CFG.  There are plenty of bits left in arc_info and it
> opens up for richer reporting.
> 
> gcc/ChangeLog:
> 
> 	* gcov-io.h (GCOV_ARC_TRUE): New.
> 	(GCOV_ARC_FALSE): New.
> 	* gcov.cc (struct arc_info): Add true_value, false_value.
> 	(read_graph_file): Read true_value, false_value.
> 	* profile.cc (branch_prob): Write GCOV_ARC_TRUE, GCOV_ARC_FALSE.
I thought I'd already acked this patch.

So OK, again :-)

jeff


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] Record edge true/false value for gcov
  2024-06-25 21:37 ` [PATCH 1/2] Record edge true/false value for gcov Jeff Law
@ 2024-06-26 10:20   ` Jørgen Kvalsvik
  0 siblings, 0 replies; 4+ messages in thread
From: Jørgen Kvalsvik @ 2024-06-26 10:20 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: hubicka

On 6/25/24 23:37, Jeff Law wrote:
> 
> 
> On 6/25/24 2:04 AM, Jørgen Kvalsvik wrote:
>> Make gcov aware which edges are the true/false to more accurately
>> reconstruct the CFG.  There are plenty of bits left in arc_info and it
>> opens up for richer reporting.
>>
>> gcc/ChangeLog:
>>
>>     * gcov-io.h (GCOV_ARC_TRUE): New.
>>     (GCOV_ARC_FALSE): New.
>>     * gcov.cc (struct arc_info): Add true_value, false_value.
>>     (read_graph_file): Read true_value, false_value.
>>     * profile.cc (branch_prob): Write GCOV_ARC_TRUE, GCOV_ARC_FALSE.
> I thought I'd already acked this patch.
> 
> So OK, again :-)
> 
> jeff
> 

Thanks! Pushed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-26 10:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-25  8:04 [PATCH 1/2] Record edge true/false value for gcov Jørgen Kvalsvik
2024-06-25  8:04 ` [PATCH 2/2] [RFC] Prime path coverage in gcc/gcov Jørgen Kvalsvik
2024-06-25 21:37 ` [PATCH 1/2] Record edge true/false value for gcov Jeff Law
2024-06-26 10:20   ` Jørgen Kvalsvik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).