public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Surprising CFG construction with goto from then to else
@ 2022-09-02  9:49 Jørgen Kvalsvik
  2022-09-02 12:22 ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Jørgen Kvalsvik @ 2022-09-02  9:49 UTC (permalink / raw)
  To: gcc; +Cc: mliska, sebastian.huber


Hello,

I played some more with odd programs and the effect on control flow 
graph construction (as a part of condition coverage support [1]) and 
came across this:

int fn (int a, int b, int c) {
     int x = 0;
     if (a && b) {
         if (c) {
             goto a_;
         } else {
             x = a;
         }
     } else {
a_:
         x = (a - b);
     }

     return x;
}

Run through gcov --conditions I get:

         4:    5:    if (a && b) {
condition outcomes covered 2/2
condition outcomes covered 2/2
         2:    6:        if (c) {
condition outcomes covered 2/2

Which is clearly not correct. So I started digging into why and dump the 
CFG as the coverage profiling sees it https://i.imgur.com/d0q72rA.png 
[2]. I apologize for the labeling, but A2 = a, A3 = b, A5 = c and A9 the 
else block. The problem, which is what confuses the algorithm, is that a 
and b don't share A9 as a successor (on false) as I would expect.

If I add some operation before the label the problem disappears and a 
and b share false-destination again https://i.imgur.com/PSrfaLC.png [3].

     } else {
         x++;
a_:
         x = (a - b);
     }

         4:    5:    if (a && b) {
condition outcomes covered 4/4
         2:    6:        if (c) {
condition outcomes covered 2/2


When dumping the cfg in the former case with -fdump-tree-cfg-graph I get 
a CFG without the split destinations in a and b 
https://i.imgur.com/05MCjzp.png [3]. I would assume from this that the 
graph dump happens after _more_ CFG transformations than the branch 
profiling.

So my questions are:

1. Is the control flow graph expected to be constructed as such where a 
and b don't share outcome, or is it to be considered a bug?
2. If yes, would it be problematic to push the branch coverage and 
condition profiling to a later stage where the cfg has been fixed?

Thanks,
Jørgen

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598165.html

[2] dot file:

digraph {
     A0 -> A2
     A2 -> A3
     A2 -> A8
     A3 -> A5
     A3 -> A4
     A8 -> A9
     A9 -> A10
     A10 -> A11
     A11 -> A1
     A5 -> A6
     A5 -> A7
     A4 -> A9
     A6 -> A9
     A7 -> A10
}

[3] dot file:

digraph {
     A0 -> A2
     A2 -> A3
     A2 -> A7
     A3 -> A4
     A3 -> A7
     A7 -> A8
     A8 -> A9
     A9 -> A10
     A10 -> A1
     A4 -> A5
     A4 -> A6
     A5 -> A8
     A6 -> A9
}

[3] dot file:


overlap=false;
subgraph "cluster_fn" {
         style="dashed";
         color="black";
         label="fn ()";
         fn_0_basic_block_0 
[shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];

         fn_0_basic_block_1 
[shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];

         fn_0_basic_block_2 
[shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 2\>:\l\
|x\ =\ 0;\l\
|if\ (a\ !=\ 0)\l\
\ \ goto\ \<bb\ 3\>;\ [INV]\l\
else\l\
\ \ goto\ \<bb\ 7\>;\ [INV]\l\
}"];

         fn_0_basic_block_3 
[shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 3\>:\l\
|if\ (b\ !=\ 0)\l\
\ \ goto\ \<bb\ 4\>;\ [INV]\l\
else\l\
\ \ goto\ \<bb\ 7\>;\ [INV]\l\
}"];

         fn_0_basic_block_4 
[shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 4\>:\l\
|if\ (c\ !=\ 0)\l\
\ \ goto\ \<bb\ 5\>;\ [INV]\l\
else\l\
\ \ goto\ \<bb\ 6\>;\ [INV]\l\
}"];

         fn_0_basic_block_5 
[shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 5\>:\l\
|//\ predicted\ unlikely\ by\ goto\ predictor.\l\
goto\ \<bb\ 7\>;\ [INV]\l\
}"];

         fn_0_basic_block_6 
[shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 6\>:\l\
|x\ =\ a;\l\
goto\ \<bb\ 8\>;\ [INV]\l\
}"];

         fn_0_basic_block_7 
[shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 7\>:\l\
|a_:\l\
|x\ =\ a\ -\ b;\l\
}"];

         fn_0_basic_block_8 
[shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 8\>:\l\
|D.2398\ =\ x;\l\
}"];

         fn_0_basic_block_9 
[shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 9\>:\l\
|\<L7\>:\l\
|return\ D.2398;\l\
}"];

         fn_0_basic_block_0:s -> fn_0_basic_block_2:n 
[style="solid,bold",color=black,weight=100,constraint=true];
         fn_0_basic_block_2:s -> fn_0_basic_block_3:n 
[style="solid,bold",color=forestgreen,weight=10,constraint=true];
         fn_0_basic_block_2:s -> fn_0_basic_block_7:n 
[style="solid,bold",color=darkorange,weight=10,constraint=true];
         fn_0_basic_block_3:s -> fn_0_basic_block_4:n 
[style="solid,bold",color=forestgreen,weight=10,constraint=true];
         fn_0_basic_block_3:s -> fn_0_basic_block_7:n 
[style="solid,bold",color=darkorange,weight=10,constraint=true];
         fn_0_basic_block_4:s -> fn_0_basic_block_5:n 
[style="solid,bold",color=forestgreen,weight=10,constraint=true];
         fn_0_basic_block_4:s -> fn_0_basic_block_6:n 
[style="solid,bold",color=darkorange,weight=10,constraint=true];
         fn_0_basic_block_5:s -> fn_0_basic_block_7:n 
[style="solid,bold",color=black,weight=100,constraint=true];
         fn_0_basic_block_6:s -> fn_0_basic_block_8:n 
[style="solid,bold",color=black,weight=100,constraint=true];
         fn_0_basic_block_7:s -> fn_0_basic_block_8:n 
[style="solid,bold",color=black,weight=100,constraint=true];
         fn_0_basic_block_8:s -> fn_0_basic_block_9:n 
[style="solid,bold",color=black,weight=100,constraint=true];
         fn_0_basic_block_9:s -> fn_0_basic_block_1:n 
[style="solid,bold",color=black,weight=10,constraint=true];
         fn_0_basic_block_0:s -> fn_0_basic_block_1:n 
[style="invis",constraint=true];
}
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprising CFG construction with goto from then to else
  2022-09-02  9:49 Surprising CFG construction with goto from then to else Jørgen Kvalsvik
@ 2022-09-02 12:22 ` Richard Biener
  2022-09-08 10:30   ` Jørgen Kvalsvik
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2022-09-02 12:22 UTC (permalink / raw)
  To: Jørgen Kvalsvik; +Cc: GCC Development

On Fri, Sep 2, 2022 at 11:50 AM Jørgen Kvalsvik <j@lambda.is> wrote:
>
>
> Hello,
>
> I played some more with odd programs and the effect on control flow
> graph construction (as a part of condition coverage support [1]) and
> came across this:
>
> int fn (int a, int b, int c) {
>      int x = 0;
>      if (a && b) {
>          if (c) {
>              goto a_;
>          } else {
>              x = a;
>          }
>      } else {
> a_:
>          x = (a - b);
>      }
>
>      return x;
> }
>
> Run through gcov --conditions I get:
>
>          4:    5:    if (a && b) {
> condition outcomes covered 2/2
> condition outcomes covered 2/2
>          2:    6:        if (c) {
> condition outcomes covered 2/2
>
> Which is clearly not correct. So I started digging into why and dump the
> CFG as the coverage profiling sees it https://i.imgur.com/d0q72rA.png
> [2]. I apologize for the labeling, but A2 = a, A3 = b, A5 = c and A9 the
> else block. The problem, which is what confuses the algorithm, is that a
> and b don't share A9 as a successor (on false) as I would expect.
>
> If I add some operation before the label the problem disappears and a
> and b share false-destination again https://i.imgur.com/PSrfaLC.png [3].
>
>      } else {
>          x++;
> a_:
>          x = (a - b);
>      }
>
>          4:    5:    if (a && b) {
> condition outcomes covered 4/4
>          2:    6:        if (c) {
> condition outcomes covered 2/2
>
>
> When dumping the cfg in the former case with -fdump-tree-cfg-graph I get
> a CFG without the split destinations in a and b
> https://i.imgur.com/05MCjzp.png [3]. I would assume from this that the
> graph dump happens after _more_ CFG transformations than the branch
> profiling.
>
> So my questions are:
>
> 1. Is the control flow graph expected to be constructed as such where a
> and b don't share outcome, or is it to be considered a bug?
> 2. If yes, would it be problematic to push the branch coverage and
> condition profiling to a later stage where the cfg has been fixed?

I would say you should only see more nodes merged.  It's a bit hard to follow
what you say with the namings - I usually run cc1 in gdb, breaking at
execute_build_cfg where you can do, after build_gimple_cfg finished
(and before cleanup_tree_cfg ()) do a 'dot-fn' in gdb which produces a nice
picture of the CFG and code with graphviz.

It looks like I would have expected, in particular we do not force a
new basic-block to be generated for a_: after the D.1991: artificial
label we have for the else.  That might be premature optimization
for your case (but the cleanup_tree_cfg () would immediately do
that as well).

Richard.

> Thanks,
> Jørgen
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598165.html
>
> [2] dot file:
>
> digraph {
>      A0 -> A2
>      A2 -> A3
>      A2 -> A8
>      A3 -> A5
>      A3 -> A4
>      A8 -> A9
>      A9 -> A10
>      A10 -> A11
>      A11 -> A1
>      A5 -> A6
>      A5 -> A7
>      A4 -> A9
>      A6 -> A9
>      A7 -> A10
> }
>
> [3] dot file:
>
> digraph {
>      A0 -> A2
>      A2 -> A3
>      A2 -> A7
>      A3 -> A4
>      A3 -> A7
>      A7 -> A8
>      A8 -> A9
>      A9 -> A10
>      A10 -> A1
>      A4 -> A5
>      A4 -> A6
>      A5 -> A8
>      A6 -> A9
> }
>
> [3] dot file:
>
>
> overlap=false;
> subgraph "cluster_fn" {
>          style="dashed";
>          color="black";
>          label="fn ()";
>          fn_0_basic_block_0
> [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];
>
>          fn_0_basic_block_1
> [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];
>
>          fn_0_basic_block_2
> [shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 2\>:\l\
> |x\ =\ 0;\l\
> |if\ (a\ !=\ 0)\l\
> \ \ goto\ \<bb\ 3\>;\ [INV]\l\
> else\l\
> \ \ goto\ \<bb\ 7\>;\ [INV]\l\
> }"];
>
>          fn_0_basic_block_3
> [shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 3\>:\l\
> |if\ (b\ !=\ 0)\l\
> \ \ goto\ \<bb\ 4\>;\ [INV]\l\
> else\l\
> \ \ goto\ \<bb\ 7\>;\ [INV]\l\
> }"];
>
>          fn_0_basic_block_4
> [shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 4\>:\l\
> |if\ (c\ !=\ 0)\l\
> \ \ goto\ \<bb\ 5\>;\ [INV]\l\
> else\l\
> \ \ goto\ \<bb\ 6\>;\ [INV]\l\
> }"];
>
>          fn_0_basic_block_5
> [shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 5\>:\l\
> |//\ predicted\ unlikely\ by\ goto\ predictor.\l\
> goto\ \<bb\ 7\>;\ [INV]\l\
> }"];
>
>          fn_0_basic_block_6
> [shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 6\>:\l\
> |x\ =\ a;\l\
> goto\ \<bb\ 8\>;\ [INV]\l\
> }"];
>
>          fn_0_basic_block_7
> [shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 7\>:\l\
> |a_:\l\
> |x\ =\ a\ -\ b;\l\
> }"];
>
>          fn_0_basic_block_8
> [shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 8\>:\l\
> |D.2398\ =\ x;\l\
> }"];
>
>          fn_0_basic_block_9
> [shape=record,style=filled,fillcolor=lightgrey,label="{\<bb\ 9\>:\l\
> |\<L7\>:\l\
> |return\ D.2398;\l\
> }"];
>
>          fn_0_basic_block_0:s -> fn_0_basic_block_2:n
> [style="solid,bold",color=black,weight=100,constraint=true];
>          fn_0_basic_block_2:s -> fn_0_basic_block_3:n
> [style="solid,bold",color=forestgreen,weight=10,constraint=true];
>          fn_0_basic_block_2:s -> fn_0_basic_block_7:n
> [style="solid,bold",color=darkorange,weight=10,constraint=true];
>          fn_0_basic_block_3:s -> fn_0_basic_block_4:n
> [style="solid,bold",color=forestgreen,weight=10,constraint=true];
>          fn_0_basic_block_3:s -> fn_0_basic_block_7:n
> [style="solid,bold",color=darkorange,weight=10,constraint=true];
>          fn_0_basic_block_4:s -> fn_0_basic_block_5:n
> [style="solid,bold",color=forestgreen,weight=10,constraint=true];
>          fn_0_basic_block_4:s -> fn_0_basic_block_6:n
> [style="solid,bold",color=darkorange,weight=10,constraint=true];
>          fn_0_basic_block_5:s -> fn_0_basic_block_7:n
> [style="solid,bold",color=black,weight=100,constraint=true];
>          fn_0_basic_block_6:s -> fn_0_basic_block_8:n
> [style="solid,bold",color=black,weight=100,constraint=true];
>          fn_0_basic_block_7:s -> fn_0_basic_block_8:n
> [style="solid,bold",color=black,weight=100,constraint=true];
>          fn_0_basic_block_8:s -> fn_0_basic_block_9:n
> [style="solid,bold",color=black,weight=100,constraint=true];
>          fn_0_basic_block_9:s -> fn_0_basic_block_1:n
> [style="solid,bold",color=black,weight=10,constraint=true];
>          fn_0_basic_block_0:s -> fn_0_basic_block_1:n
> [style="invis",constraint=true];
> }
> }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprising CFG construction with goto from then to else
  2022-09-02 12:22 ` Richard Biener
@ 2022-09-08 10:30   ` Jørgen Kvalsvik
  2022-09-16 11:05     ` Jørgen Kvalsvik
  2022-10-03 12:22     ` Jørgen Kvalsvik
  0 siblings, 2 replies; 7+ messages in thread
From: Jørgen Kvalsvik @ 2022-09-08 10:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development, jh, mliska

On 02/09/2022 14:22, Richard Biener wrote:
> On Fri, Sep 2, 2022 at 11:50 AM Jørgen Kvalsvik <j@lambda.is> wrote:
>>
>>
>> Hello,
>>
>> I played some more with odd programs and the effect on control flow
>> graph construction (as a part of condition coverage support [1]) and
>> came across this:
>>
>> int fn (int a, int b, int c) {
>>       int x = 0;
>>       if (a && b) {
>>           if (c) {
>>               goto a_;
>>           } else {
>>               x = a;
>>           }
>>       } else {
>> a_:
>>           x = (a - b);
>>       }
>>
>>       return x;
>> }
>>
>> Run through gcov --conditions I get:
>>
>>           4:    5:    if (a && b) {
>> condition outcomes covered 2/2
>> condition outcomes covered 2/2
>>           2:    6:        if (c) {
>> condition outcomes covered 2/2
>>
>> Which is clearly not correct. So I started digging into why and dump the
>> CFG as the coverage profiling sees it https://i.imgur.com/d0q72rA.png
>> [2]. I apologize for the labeling, but A2 = a, A3 = b, A5 = c and A9 the
>> else block. The problem, which is what confuses the algorithm, is that a
>> and b don't share A9 as a successor (on false) as I would expect.
>>
>> If I add some operation before the label the problem disappears and a
>> and b share false-destination again https://i.imgur.com/PSrfaLC.png [3].
>>
>>       } else {
>>           x++;
>> a_:
>>           x = (a - b);
>>       }
>>
>>           4:    5:    if (a && b) {
>> condition outcomes covered 4/4
>>           2:    6:        if (c) {
>> condition outcomes covered 2/2
>>
>>
>> When dumping the cfg in the former case with -fdump-tree-cfg-graph I get
>> a CFG without the split destinations in a and b
>> https://i.imgur.com/05MCjzp.png [3]. I would assume from this that the
>> graph dump happens after _more_ CFG transformations than the branch
>> profiling.
>>
>> So my questions are:
>>
>> 1. Is the control flow graph expected to be constructed as such where a
>> and b don't share outcome, or is it to be considered a bug?
>> 2. If yes, would it be problematic to push the branch coverage and
>> condition profiling to a later stage where the cfg has been fixed?
> 
> I would say you should only see more nodes merged.  It's a bit hard to follow
> what you say with the namings - I usually run cc1 in gdb, breaking at
> execute_build_cfg where you can do, after build_gimple_cfg finished
> (and before cleanup_tree_cfg ()) do a 'dot-fn' in gdb which produces a nice
> picture of the CFG and code with graphviz.
> 
> It looks like I would have expected, in particular we do not force a
> new basic-block to be generated for a_: after the D.1991: artificial
> label we have for the else.  That might be premature optimization
> for your case (but the cleanup_tree_cfg () would immediately do
> that as well).
> 
> Richard.

I did some more digging into this and have isolated the problem to edge 
splitting inside the branch_prob () function itself.

gcc/profile.cc:1248

   if (last
       && gimple_has_location (last)
       && !RESERVED_LOCATION_P (e->goto_locus)
       && !single_succ_p (bb)
       && (LOCATION_FILE (e->goto_locus)
           != LOCATION_FILE (gimple_location (last))
           || (LOCATION_LINE (e->goto_locus)
               != LOCATION_LINE (gimple_location (last)))))
   {
       basic_block new_bb = split_edge (e);
       edge ne = single_succ_edge (new_bb);
       ne->goto_locus = e->goto_locus;
   }

Based on the cleaned-up cfg that gcc dumps later it looks like this 
split only lives through the branch coverage/profiling phase (it may 
bleed slightly later but it shouldn't be of significance).

Out of curiosity I removed the splitting altogether and no tests failed 
when running make check-gcc check-g++ RUNTESTFLAGS="gcov.exp". Either it 
was not covered by tests in the first place, or whatever behaviour this 
check is meant to fix is resolved elsewhere. I have to admit I don't 
really see a difference with/without this patch, but I don't know what 
to look for.

The check was first introduced in 2005 by Jan (cc):

commit d783b2a2dc91e1d2c1fea78cac2b6c6c73b3680d
Author: Jan Hubicka <jh@suse.cz>
Date:   Thu Aug 4 00:10:54 2005 +0200

     profile.c (branch_prob): Split edges with goto locus on them to get 
proper line counts.


             * profile.c (branch_prob): Split edges with goto locus on them
             to get proper line counts.
             * tree-cfg.c (make_cond_expr_edges): Record user goto 
locuses, if any.

             * gcov-1.C: Fix switch counts.
             * gcov-4b.c: Likewise.

What stands out to me in the check is that it uses location-file and 
location-line to decide if to split the edge. I added a few prints to 
see when the file/line is set:

    2 int goto1 (int a) { 
 
 

    3     if (a)
    4         goto end;
    5
    6     return 1;
    7 end:
    8     x += a;
    9     return 0;
   10 }

     if (a_5(D) != 0) 
 
 

         edge (true)
         last goto2.c:3
         goto (null):0

     if (a_5(D) != 0)
         edge (false)
         last goto2.c:3
     	goto (null):0

     // predicted unlikely by goto predictor.
     	edge (fallthru)
     	last goto2.c:4
     	goto goto2.c:4

The goto statement is the only with with a location for both the basic 
block and the edge.

   12 int goto2 (int a) { 
 
 

   13     if (a) { goto end; }
   14     else   { label: a++; }
   15
   16     return 1;
   17 end:
   18     x += a;
   19     return 0;
   20 }

     if (a_5(D) != 0)
         edge (true)
         last goto2.c:13
         goto (null):0

     if (a_5(D) != 0)
         edge (false)
         last goto2.c:13
         goto goto2.c:14

     // predicted unlikely by goto predictor.
         edge (fallthru)
         last goto2.c:13
         goto goto2.c:13

Now the else block has two locations as well, with the edge label 
e->goto_locus being inside the else block. Note that this label is 
_unrelated_ to the edge jump a (false) -> else.

Now a function without gotos:

   22 int goto3 (int a, int b) {
   23     if (a && b) {
   24         x += a * b;
   25     } else {
   26         x -= 1;
   27     }
   28     return 0;
   29 }

     if (a_7(D) != 0)
         edge (true)
         last goto2.c:23
         goto (null):0

     if (a_7(D) != 0)
         edge (false)
         last goto2.c:23
         goto (null):0

     if (b_8(D) != 0)
         edge (true)
         last goto2.c:23
         goto (null):0

     if (b_8(D) != 0)
         edge (false)
         last goto2.c:23
         goto (null):0

     x = _3;
         edge (fallthru)
         last goto2.c:24
         goto goto2.c:24

     x = _5;
         edge (fallthru)
         last goto2.c:26
         goto (null):0

Now the checks if (a) and (b) don't have goto_locus on the edges. For 
completeness I included the implied jumps in the then/else blocks which 
_do_ have edge locus in the then case.

Finally, the case that expose the problem for me:

   31 int goto4 (int a, int b) {
   32     if (a) {
   33         if (b) goto elseblock;
   34         else a++;
   35     } else {
   36 elseblock:
   37         a--;
   38     }
   39
   40     return a;
   41 }

     if (a_2(D) != 0)
         edge (true)
         last goto2.c:32
         goto (null):0

     if (a_2(D) != 0)
         edge (false)
         last goto2.c:32
         goto goto2.c:36

     if (b_3(D) != 0)
         edge (true)
         last goto2.c:33
         goto (null):0

     if (b_3(D) != 0)
         edge (false)
         last goto2.c:33
         goto (null):0

Again, a (false) has a goto_locus because there is an unrelated label at 
the top of the else block. For completeness, it also applies happens 
when there's a label on top of then:

   43 int goto5 (int a, int b) {
   44     if (a) {
   45 then:
   46         a++;
   47     } else {
   48         a--; 
 
 

   49     }
   50
   51     return a;
   52 }

     if (a_2(D) != 0)
         edge (true)
         last goto2.c:44
         goto goto2.c:45

     if (a_2(D) != 0)
         edge (false)
         last goto2.c:44
         goto (null):0

This causes the edge to split which probably isn't a problem for the 
branch coverage, but it is problematic for my condition coverage 
algorithm. So how do we solve this?

1. Is the edge splitting necessary? I didn't find a test that covers 
this (there might be one, any idea?). If the edge splitting is not 
necessary anymore then removing it should be fine for my coverage needs.
2. Assuming the edge split is necessary making a decision on source file 
+ line seems vulnerable to source formatting:

   54 int goto6 (int a) {
   55     if (a) label: goto end; 
 
 

   56     return 0;
   57 end:
   58     return 1;
   59
   60 }

     if (a_2(D) != 0)
         edge (true)
         last goto2.c:55
         goto goto2.c:55

     if (a_2(D) != 0)
         edge (false)
         last goto2.c:55
         goto (null):0

     // predicted unlikely by goto predictor.
         edge (fallthru)
         last goto2.c:55
         goto goto2.c:55

Now unrelated gotos all have the same file/line signature. Since the 
edge to the goto_locus is unrelated to the label itself.
3. Assuming the test is fine and necessary I *could* work around the 
problem by recording the original edge somewhere (it is very important 
that all conditions in an expression share the same then/else basic 
blocks), or enough metadata to make a virtual edge, but that is a hack I 
hope I don't have to do.

Anyway, the problem is not with the cfg construction itself, but an edge 
splitting that happens specifically for profiling.

CC Martin, maybe you have any idea?

Thanks,
Jørgen.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprising CFG construction with goto from then to else
  2022-09-08 10:30   ` Jørgen Kvalsvik
@ 2022-09-16 11:05     ` Jørgen Kvalsvik
  2022-10-03 10:59       ` Martin Liška
  2022-10-03 12:22     ` Jørgen Kvalsvik
  1 sibling, 1 reply; 7+ messages in thread
From: Jørgen Kvalsvik @ 2022-09-16 11:05 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development, jh, mliska

Gentle ping. Any idea if the edge split is still useful and/or how to 
test for it?

Thanks,
Jørgen

On 08/09/2022 12:30, Jørgen Kvalsvik wrote:
> On 02/09/2022 14:22, Richard Biener wrote:
>> On Fri, Sep 2, 2022 at 11:50 AM Jørgen Kvalsvik <j@lambda.is> wrote:
>>>
>>>
>>> Hello,
>>>
>>> I played some more with odd programs and the effect on control flow
>>> graph construction (as a part of condition coverage support [1]) and
>>> came across this:
>>>
>>> int fn (int a, int b, int c) {
>>>       int x = 0;
>>>       if (a && b) {
>>>           if (c) {
>>>               goto a_;
>>>           } else {
>>>               x = a;
>>>           }
>>>       } else {
>>> a_:
>>>           x = (a - b);
>>>       }
>>>
>>>       return x;
>>> }
>>>
>>> Run through gcov --conditions I get:
>>>
>>>           4:    5:    if (a && b) {
>>> condition outcomes covered 2/2
>>> condition outcomes covered 2/2
>>>           2:    6:        if (c) {
>>> condition outcomes covered 2/2
>>>
>>> Which is clearly not correct. So I started digging into why and dump the
>>> CFG as the coverage profiling sees it https://i.imgur.com/d0q72rA.png
>>> [2]. I apologize for the labeling, but A2 = a, A3 = b, A5 = c and A9 the
>>> else block. The problem, which is what confuses the algorithm, is that a
>>> and b don't share A9 as a successor (on false) as I would expect.
>>>
>>> If I add some operation before the label the problem disappears and a
>>> and b share false-destination again https://i.imgur.com/PSrfaLC.png [3].
>>>
>>>       } else {
>>>           x++;
>>> a_:
>>>           x = (a - b);
>>>       }
>>>
>>>           4:    5:    if (a && b) {
>>> condition outcomes covered 4/4
>>>           2:    6:        if (c) {
>>> condition outcomes covered 2/2
>>>
>>>
>>> When dumping the cfg in the former case with -fdump-tree-cfg-graph I get
>>> a CFG without the split destinations in a and b
>>> https://i.imgur.com/05MCjzp.png [3]. I would assume from this that the
>>> graph dump happens after _more_ CFG transformations than the branch
>>> profiling.
>>>
>>> So my questions are:
>>>
>>> 1. Is the control flow graph expected to be constructed as such where a
>>> and b don't share outcome, or is it to be considered a bug?
>>> 2. If yes, would it be problematic to push the branch coverage and
>>> condition profiling to a later stage where the cfg has been fixed?
>>
>> I would say you should only see more nodes merged.  It's a bit hard to 
>> follow
>> what you say with the namings - I usually run cc1 in gdb, breaking at
>> execute_build_cfg where you can do, after build_gimple_cfg finished
>> (and before cleanup_tree_cfg ()) do a 'dot-fn' in gdb which produces a 
>> nice
>> picture of the CFG and code with graphviz.
>>
>> It looks like I would have expected, in particular we do not force a
>> new basic-block to be generated for a_: after the D.1991: artificial
>> label we have for the else.  That might be premature optimization
>> for your case (but the cleanup_tree_cfg () would immediately do
>> that as well).
>>
>> Richard.
> 
> I did some more digging into this and have isolated the problem to edge 
> splitting inside the branch_prob () function itself.
> 
> gcc/profile.cc:1248
> 
>    if (last
>        && gimple_has_location (last)
>        && !RESERVED_LOCATION_P (e->goto_locus)
>        && !single_succ_p (bb)
>        && (LOCATION_FILE (e->goto_locus)
>            != LOCATION_FILE (gimple_location (last))
>            || (LOCATION_LINE (e->goto_locus)
>                != LOCATION_LINE (gimple_location (last)))))
>    {
>        basic_block new_bb = split_edge (e);
>        edge ne = single_succ_edge (new_bb);
>        ne->goto_locus = e->goto_locus;
>    }
> 
> Based on the cleaned-up cfg that gcc dumps later it looks like this 
> split only lives through the branch coverage/profiling phase (it may 
> bleed slightly later but it shouldn't be of significance).
> 
> Out of curiosity I removed the splitting altogether and no tests failed 
> when running make check-gcc check-g++ RUNTESTFLAGS="gcov.exp". Either it 
> was not covered by tests in the first place, or whatever behaviour this 
> check is meant to fix is resolved elsewhere. I have to admit I don't 
> really see a difference with/without this patch, but I don't know what 
> to look for.
> 
> The check was first introduced in 2005 by Jan (cc):
> 
> commit d783b2a2dc91e1d2c1fea78cac2b6c6c73b3680d
> Author: Jan Hubicka <jh@suse.cz>
> Date:   Thu Aug 4 00:10:54 2005 +0200
> 
>      profile.c (branch_prob): Split edges with goto locus on them to get 
> proper line counts.
> 
> 
>              * profile.c (branch_prob): Split edges with goto locus on them
>              to get proper line counts.
>              * tree-cfg.c (make_cond_expr_edges): Record user goto 
> locuses, if any.
> 
>              * gcov-1.C: Fix switch counts.
>              * gcov-4b.c: Likewise.
> 
> What stands out to me in the check is that it uses location-file and 
> location-line to decide if to split the edge. I added a few prints to 
> see when the file/line is set:
> 
>     2 int goto1 (int a) {
> 
> 
>     3     if (a)
>     4         goto end;
>     5
>     6     return 1;
>     7 end:
>     8     x += a;
>     9     return 0;
>    10 }
> 
>      if (a_5(D) != 0)
> 
> 
>          edge (true)
>          last goto2.c:3
>          goto (null):0
> 
>      if (a_5(D) != 0)
>          edge (false)
>          last goto2.c:3
>          goto (null):0
> 
>      // predicted unlikely by goto predictor.
>          edge (fallthru)
>          last goto2.c:4
>          goto goto2.c:4
> 
> The goto statement is the only with with a location for both the basic 
> block and the edge.
> 
>    12 int goto2 (int a) {
> 
> 
>    13     if (a) { goto end; }
>    14     else   { label: a++; }
>    15
>    16     return 1;
>    17 end:
>    18     x += a;
>    19     return 0;
>    20 }
> 
>      if (a_5(D) != 0)
>          edge (true)
>          last goto2.c:13
>          goto (null):0
> 
>      if (a_5(D) != 0)
>          edge (false)
>          last goto2.c:13
>          goto goto2.c:14
> 
>      // predicted unlikely by goto predictor.
>          edge (fallthru)
>          last goto2.c:13
>          goto goto2.c:13
> 
> Now the else block has two locations as well, with the edge label 
> e->goto_locus being inside the else block. Note that this label is 
> _unrelated_ to the edge jump a (false) -> else.
> 
> Now a function without gotos:
> 
>    22 int goto3 (int a, int b) {
>    23     if (a && b) {
>    24         x += a * b;
>    25     } else {
>    26         x -= 1;
>    27     }
>    28     return 0;
>    29 }
> 
>      if (a_7(D) != 0)
>          edge (true)
>          last goto2.c:23
>          goto (null):0
> 
>      if (a_7(D) != 0)
>          edge (false)
>          last goto2.c:23
>          goto (null):0
> 
>      if (b_8(D) != 0)
>          edge (true)
>          last goto2.c:23
>          goto (null):0
> 
>      if (b_8(D) != 0)
>          edge (false)
>          last goto2.c:23
>          goto (null):0
> 
>      x = _3;
>          edge (fallthru)
>          last goto2.c:24
>          goto goto2.c:24
> 
>      x = _5;
>          edge (fallthru)
>          last goto2.c:26
>          goto (null):0
> 
> Now the checks if (a) and (b) don't have goto_locus on the edges. For 
> completeness I included the implied jumps in the then/else blocks which 
> _do_ have edge locus in the then case.
> 
> Finally, the case that expose the problem for me:
> 
>    31 int goto4 (int a, int b) {
>    32     if (a) {
>    33         if (b) goto elseblock;
>    34         else a++;
>    35     } else {
>    36 elseblock:
>    37         a--;
>    38     }
>    39
>    40     return a;
>    41 }
> 
>      if (a_2(D) != 0)
>          edge (true)
>          last goto2.c:32
>          goto (null):0
> 
>      if (a_2(D) != 0)
>          edge (false)
>          last goto2.c:32
>          goto goto2.c:36
> 
>      if (b_3(D) != 0)
>          edge (true)
>          last goto2.c:33
>          goto (null):0
> 
>      if (b_3(D) != 0)
>          edge (false)
>          last goto2.c:33
>          goto (null):0
> 
> Again, a (false) has a goto_locus because there is an unrelated label at 
> the top of the else block. For completeness, it also applies happens 
> when there's a label on top of then:
> 
>    43 int goto5 (int a, int b) {
>    44     if (a) {
>    45 then:
>    46         a++;
>    47     } else {
>    48         a--;
> 
> 
>    49     }
>    50
>    51     return a;
>    52 }
> 
>      if (a_2(D) != 0)
>          edge (true)
>          last goto2.c:44
>          goto goto2.c:45
> 
>      if (a_2(D) != 0)
>          edge (false)
>          last goto2.c:44
>          goto (null):0
> 
> This causes the edge to split which probably isn't a problem for the 
> branch coverage, but it is problematic for my condition coverage 
> algorithm. So how do we solve this?
> 
> 1. Is the edge splitting necessary? I didn't find a test that covers 
> this (there might be one, any idea?). If the edge splitting is not 
> necessary anymore then removing it should be fine for my coverage needs.
> 2. Assuming the edge split is necessary making a decision on source file 
> + line seems vulnerable to source formatting:
> 
>    54 int goto6 (int a) {
>    55     if (a) label: goto end;
> 
> 
>    56     return 0;
>    57 end:
>    58     return 1;
>    59
>    60 }
> 
>      if (a_2(D) != 0)
>          edge (true)
>          last goto2.c:55
>          goto goto2.c:55
> 
>      if (a_2(D) != 0)
>          edge (false)
>          last goto2.c:55
>          goto (null):0
> 
>      // predicted unlikely by goto predictor.
>          edge (fallthru)
>          last goto2.c:55
>          goto goto2.c:55
> 
> Now unrelated gotos all have the same file/line signature. Since the 
> edge to the goto_locus is unrelated to the label itself.
> 3. Assuming the test is fine and necessary I *could* work around the 
> problem by recording the original edge somewhere (it is very important 
> that all conditions in an expression share the same then/else basic 
> blocks), or enough metadata to make a virtual edge, but that is a hack I 
> hope I don't have to do.
> 
> Anyway, the problem is not with the cfg construction itself, but an edge 
> splitting that happens specifically for profiling.
> 
> CC Martin, maybe you have any idea?
> 
> Thanks,
> Jørgen.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprising CFG construction with goto from then to else
  2022-09-16 11:05     ` Jørgen Kvalsvik
@ 2022-10-03 10:59       ` Martin Liška
  0 siblings, 0 replies; 7+ messages in thread
From: Martin Liška @ 2022-10-03 10:59 UTC (permalink / raw)
  To: Jørgen Kvalsvik, Richard Biener; +Cc: GCC Development, jh

On 9/16/22 13:05, Jørgen Kvalsvik wrote:
> Gentle ping. Any idea if the edge split is still useful and/or how to test for it?

@Honza: Can you please reply here?

Thanks,
Martin

> 
> Thanks,
> Jørgen
> 
> On 08/09/2022 12:30, Jørgen Kvalsvik wrote:
>> On 02/09/2022 14:22, Richard Biener wrote:
>>> On Fri, Sep 2, 2022 at 11:50 AM Jørgen Kvalsvik <j@lambda.is> wrote:
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I played some more with odd programs and the effect on control flow
>>>> graph construction (as a part of condition coverage support [1]) and
>>>> came across this:
>>>>
>>>> int fn (int a, int b, int c) {
>>>>       int x = 0;
>>>>       if (a && b) {
>>>>           if (c) {
>>>>               goto a_;
>>>>           } else {
>>>>               x = a;
>>>>           }
>>>>       } else {
>>>> a_:
>>>>           x = (a - b);
>>>>       }
>>>>
>>>>       return x;
>>>> }
>>>>
>>>> Run through gcov --conditions I get:
>>>>
>>>>           4:    5:    if (a && b) {
>>>> condition outcomes covered 2/2
>>>> condition outcomes covered 2/2
>>>>           2:    6:        if (c) {
>>>> condition outcomes covered 2/2
>>>>
>>>> Which is clearly not correct. So I started digging into why and dump the
>>>> CFG as the coverage profiling sees it https://i.imgur.com/d0q72rA.png
>>>> [2]. I apologize for the labeling, but A2 = a, A3 = b, A5 = c and A9 the
>>>> else block. The problem, which is what confuses the algorithm, is that a
>>>> and b don't share A9 as a successor (on false) as I would expect.
>>>>
>>>> If I add some operation before the label the problem disappears and a
>>>> and b share false-destination again https://i.imgur.com/PSrfaLC.png [3].
>>>>
>>>>       } else {
>>>>           x++;
>>>> a_:
>>>>           x = (a - b);
>>>>       }
>>>>
>>>>           4:    5:    if (a && b) {
>>>> condition outcomes covered 4/4
>>>>           2:    6:        if (c) {
>>>> condition outcomes covered 2/2
>>>>
>>>>
>>>> When dumping the cfg in the former case with -fdump-tree-cfg-graph I get
>>>> a CFG without the split destinations in a and b
>>>> https://i.imgur.com/05MCjzp.png [3]. I would assume from this that the
>>>> graph dump happens after _more_ CFG transformations than the branch
>>>> profiling.
>>>>
>>>> So my questions are:
>>>>
>>>> 1. Is the control flow graph expected to be constructed as such where a
>>>> and b don't share outcome, or is it to be considered a bug?
>>>> 2. If yes, would it be problematic to push the branch coverage and
>>>> condition profiling to a later stage where the cfg has been fixed?
>>>
>>> I would say you should only see more nodes merged.  It's a bit hard to follow
>>> what you say with the namings - I usually run cc1 in gdb, breaking at
>>> execute_build_cfg where you can do, after build_gimple_cfg finished
>>> (and before cleanup_tree_cfg ()) do a 'dot-fn' in gdb which produces a nice
>>> picture of the CFG and code with graphviz.
>>>
>>> It looks like I would have expected, in particular we do not force a
>>> new basic-block to be generated for a_: after the D.1991: artificial
>>> label we have for the else.  That might be premature optimization
>>> for your case (but the cleanup_tree_cfg () would immediately do
>>> that as well).
>>>
>>> Richard.
>>
>> I did some more digging into this and have isolated the problem to edge splitting inside the branch_prob () function itself.
>>
>> gcc/profile.cc:1248
>>
>>    if (last
>>        && gimple_has_location (last)
>>        && !RESERVED_LOCATION_P (e->goto_locus)
>>        && !single_succ_p (bb)
>>        && (LOCATION_FILE (e->goto_locus)
>>            != LOCATION_FILE (gimple_location (last))
>>            || (LOCATION_LINE (e->goto_locus)
>>                != LOCATION_LINE (gimple_location (last)))))
>>    {
>>        basic_block new_bb = split_edge (e);
>>        edge ne = single_succ_edge (new_bb);
>>        ne->goto_locus = e->goto_locus;
>>    }
>>
>> Based on the cleaned-up cfg that gcc dumps later it looks like this split only lives through the branch coverage/profiling phase (it may bleed slightly later but it shouldn't be of significance).
>>
>> Out of curiosity I removed the splitting altogether and no tests failed when running make check-gcc check-g++ RUNTESTFLAGS="gcov.exp". Either it was not covered by tests in the first place, or whatever behaviour this check is meant to fix is resolved elsewhere. I have to admit I don't really see a difference with/without this patch, but I don't know what to look for.
>>
>> The check was first introduced in 2005 by Jan (cc):
>>
>> commit d783b2a2dc91e1d2c1fea78cac2b6c6c73b3680d
>> Author: Jan Hubicka <jh@suse.cz>
>> Date:   Thu Aug 4 00:10:54 2005 +0200
>>
>>      profile.c (branch_prob): Split edges with goto locus on them to get proper line counts.
>>
>>
>>              * profile.c (branch_prob): Split edges with goto locus on them
>>              to get proper line counts.
>>              * tree-cfg.c (make_cond_expr_edges): Record user goto locuses, if any.
>>
>>              * gcov-1.C: Fix switch counts.
>>              * gcov-4b.c: Likewise.
>>
>> What stands out to me in the check is that it uses location-file and location-line to decide if to split the edge. I added a few prints to see when the file/line is set:
>>
>>     2 int goto1 (int a) {
>>
>>
>>     3     if (a)
>>     4         goto end;
>>     5
>>     6     return 1;
>>     7 end:
>>     8     x += a;
>>     9     return 0;
>>    10 }
>>
>>      if (a_5(D) != 0)
>>
>>
>>          edge (true)
>>          last goto2.c:3
>>          goto (null):0
>>
>>      if (a_5(D) != 0)
>>          edge (false)
>>          last goto2.c:3
>>          goto (null):0
>>
>>      // predicted unlikely by goto predictor.
>>          edge (fallthru)
>>          last goto2.c:4
>>          goto goto2.c:4
>>
>> The goto statement is the only with with a location for both the basic block and the edge.
>>
>>    12 int goto2 (int a) {
>>
>>
>>    13     if (a) { goto end; }
>>    14     else   { label: a++; }
>>    15
>>    16     return 1;
>>    17 end:
>>    18     x += a;
>>    19     return 0;
>>    20 }
>>
>>      if (a_5(D) != 0)
>>          edge (true)
>>          last goto2.c:13
>>          goto (null):0
>>
>>      if (a_5(D) != 0)
>>          edge (false)
>>          last goto2.c:13
>>          goto goto2.c:14
>>
>>      // predicted unlikely by goto predictor.
>>          edge (fallthru)
>>          last goto2.c:13
>>          goto goto2.c:13
>>
>> Now the else block has two locations as well, with the edge label e->goto_locus being inside the else block. Note that this label is _unrelated_ to the edge jump a (false) -> else.
>>
>> Now a function without gotos:
>>
>>    22 int goto3 (int a, int b) {
>>    23     if (a && b) {
>>    24         x += a * b;
>>    25     } else {
>>    26         x -= 1;
>>    27     }
>>    28     return 0;
>>    29 }
>>
>>      if (a_7(D) != 0)
>>          edge (true)
>>          last goto2.c:23
>>          goto (null):0
>>
>>      if (a_7(D) != 0)
>>          edge (false)
>>          last goto2.c:23
>>          goto (null):0
>>
>>      if (b_8(D) != 0)
>>          edge (true)
>>          last goto2.c:23
>>          goto (null):0
>>
>>      if (b_8(D) != 0)
>>          edge (false)
>>          last goto2.c:23
>>          goto (null):0
>>
>>      x = _3;
>>          edge (fallthru)
>>          last goto2.c:24
>>          goto goto2.c:24
>>
>>      x = _5;
>>          edge (fallthru)
>>          last goto2.c:26
>>          goto (null):0
>>
>> Now the checks if (a) and (b) don't have goto_locus on the edges. For completeness I included the implied jumps in the then/else blocks which _do_ have edge locus in the then case.
>>
>> Finally, the case that expose the problem for me:
>>
>>    31 int goto4 (int a, int b) {
>>    32     if (a) {
>>    33         if (b) goto elseblock;
>>    34         else a++;
>>    35     } else {
>>    36 elseblock:
>>    37         a--;
>>    38     }
>>    39
>>    40     return a;
>>    41 }
>>
>>      if (a_2(D) != 0)
>>          edge (true)
>>          last goto2.c:32
>>          goto (null):0
>>
>>      if (a_2(D) != 0)
>>          edge (false)
>>          last goto2.c:32
>>          goto goto2.c:36
>>
>>      if (b_3(D) != 0)
>>          edge (true)
>>          last goto2.c:33
>>          goto (null):0
>>
>>      if (b_3(D) != 0)
>>          edge (false)
>>          last goto2.c:33
>>          goto (null):0
>>
>> Again, a (false) has a goto_locus because there is an unrelated label at the top of the else block. For completeness, it also applies happens when there's a label on top of then:
>>
>>    43 int goto5 (int a, int b) {
>>    44     if (a) {
>>    45 then:
>>    46         a++;
>>    47     } else {
>>    48         a--;
>>
>>
>>    49     }
>>    50
>>    51     return a;
>>    52 }
>>
>>      if (a_2(D) != 0)
>>          edge (true)
>>          last goto2.c:44
>>          goto goto2.c:45
>>
>>      if (a_2(D) != 0)
>>          edge (false)
>>          last goto2.c:44
>>          goto (null):0
>>
>> This causes the edge to split which probably isn't a problem for the branch coverage, but it is problematic for my condition coverage algorithm. So how do we solve this?
>>
>> 1. Is the edge splitting necessary? I didn't find a test that covers this (there might be one, any idea?). If the edge splitting is not necessary anymore then removing it should be fine for my coverage needs.
>> 2. Assuming the edge split is necessary making a decision on source file + line seems vulnerable to source formatting:
>>
>>    54 int goto6 (int a) {
>>    55     if (a) label: goto end;
>>
>>
>>    56     return 0;
>>    57 end:
>>    58     return 1;
>>    59
>>    60 }
>>
>>      if (a_2(D) != 0)
>>          edge (true)
>>          last goto2.c:55
>>          goto goto2.c:55
>>
>>      if (a_2(D) != 0)
>>          edge (false)
>>          last goto2.c:55
>>          goto (null):0
>>
>>      // predicted unlikely by goto predictor.
>>          edge (fallthru)
>>          last goto2.c:55
>>          goto goto2.c:55
>>
>> Now unrelated gotos all have the same file/line signature. Since the edge to the goto_locus is unrelated to the label itself.
>> 3. Assuming the test is fine and necessary I *could* work around the problem by recording the original edge somewhere (it is very important that all conditions in an expression share the same then/else basic blocks), or enough metadata to make a virtual edge, but that is a hack I hope I don't have to do.
>>
>> Anyway, the problem is not with the cfg construction itself, but an edge splitting that happens specifically for profiling.
>>
>> CC Martin, maybe you have any idea?
>>
>> Thanks,
>> Jørgen.
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprising CFG construction with goto from then to else
  2022-09-08 10:30   ` Jørgen Kvalsvik
  2022-09-16 11:05     ` Jørgen Kvalsvik
@ 2022-10-03 12:22     ` Jørgen Kvalsvik
  2022-10-06  7:55       ` Richard Biener
  1 sibling, 1 reply; 7+ messages in thread
From: Jørgen Kvalsvik @ 2022-10-03 12:22 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development, jh, mliska


On 9/8/22 12:30, Jørgen Kvalsvik wrote:
> On 02/09/2022 14:22, Richard Biener wrote:
>> On Fri, Sep 2, 2022 at 11:50 AM Jørgen Kvalsvik <j@lambda.is> wrote:
>>>
>>>
>>> Hello,
>>>
>>> I played some more with odd programs and the effect on control flow
>>> graph construction (as a part of condition coverage support [1]) and
>>> came across this:
>>>
>>> int fn (int a, int b, int c) {
>>>       int x = 0;
>>>       if (a && b) {
>>>           if (c) {
>>>               goto a_;
>>>           } else {
>>>               x = a;
>>>           }
>>>       } else {
>>> a_:
>>>           x = (a - b);
>>>       }
>>>
>>>       return x;
>>> }
>>>
>>> Run through gcov --conditions I get:
>>>
>>>           4:    5:    if (a && b) {
>>> condition outcomes covered 2/2
>>> condition outcomes covered 2/2
>>>           2:    6:        if (c) {
>>> condition outcomes covered 2/2
>>>
>>> Which is clearly not correct. So I started digging into why and dump the
>>> CFG as the coverage profiling sees it https://i.imgur.com/d0q72rA.png
>>> [2]. I apologize for the labeling, but A2 = a, A3 = b, A5 = c and A9 the
>>> else block. The problem, which is what confuses the algorithm, is that a
>>> and b don't share A9 as a successor (on false) as I would expect.
>>>
>>> If I add some operation before the label the problem disappears and a
>>> and b share false-destination again https://i.imgur.com/PSrfaLC.png [3].
>>>
>>>       } else {
>>>           x++;
>>> a_:
>>>           x = (a - b);
>>>       }
>>>
>>>           4:    5:    if (a && b) {
>>> condition outcomes covered 4/4
>>>           2:    6:        if (c) {
>>> condition outcomes covered 2/2
>>>
>>>
>>> When dumping the cfg in the former case with -fdump-tree-cfg-graph I get
>>> a CFG without the split destinations in a and b
>>> https://i.imgur.com/05MCjzp.png [3]. I would assume from this that the
>>> graph dump happens after _more_ CFG transformations than the branch
>>> profiling.
>>>
>>> So my questions are:
>>>
>>> 1. Is the control flow graph expected to be constructed as such where a
>>> and b don't share outcome, or is it to be considered a bug?
>>> 2. If yes, would it be problematic to push the branch coverage and
>>> condition profiling to a later stage where the cfg has been fixed?
>>
>> I would say you should only see more nodes merged.  It's a bit hard to 
>> follow
>> what you say with the namings - I usually run cc1 in gdb, breaking at
>> execute_build_cfg where you can do, after build_gimple_cfg finished
>> (and before cleanup_tree_cfg ()) do a 'dot-fn' in gdb which produces a 
>> nice
>> picture of the CFG and code with graphviz.
>>
>> It looks like I would have expected, in particular we do not force a
>> new basic-block to be generated for a_: after the D.1991: artificial
>> label we have for the else.  That might be premature optimization
>> for your case (but the cleanup_tree_cfg () would immediately do
>> that as well).
>>
>> Richard.
> 
> I did some more digging into this and have isolated the problem to edge 
> splitting inside the branch_prob () function itself.
> 
> gcc/profile.cc:1248
> 
>    if (last
>        && gimple_has_location (last)
>        && !RESERVED_LOCATION_P (e->goto_locus)
>        && !single_succ_p (bb)
>        && (LOCATION_FILE (e->goto_locus)
>            != LOCATION_FILE (gimple_location (last))
>            || (LOCATION_LINE (e->goto_locus)
>                != LOCATION_LINE (gimple_location (last)))))
>    {
>        basic_block new_bb = split_edge (e);
>        edge ne = single_succ_edge (new_bb);
>        ne->goto_locus = e->goto_locus;
>    }
> 
> Based on the cleaned-up cfg that gcc dumps later it looks like this 
> split only lives through the branch coverage/profiling phase (it may 
> bleed slightly later but it shouldn't be of significance).
> 
> Out of curiosity I removed the splitting altogether and no tests failed 
> when running make check-gcc check-g++ RUNTESTFLAGS="gcov.exp". Either it 
> was not covered by tests in the first place, or whatever behaviour this 
> check is meant to fix is resolved elsewhere. I have to admit I don't 
> really see a difference with/without this patch, but I don't know what 
> to look for.
> 
> The check was first introduced in 2005 by Jan (cc):
> 
> commit d783b2a2dc91e1d2c1fea78cac2b6c6c73b3680d
> Author: Jan Hubicka <jh@suse.cz>
> Date:   Thu Aug 4 00:10:54 2005 +0200
> 
>      profile.c (branch_prob): Split edges with goto locus on them to get 
> proper line counts.
> 
> 
>              * profile.c (branch_prob): Split edges with goto locus on them
>              to get proper line counts.
>              * tree-cfg.c (make_cond_expr_edges): Record user goto 
> locuses, if any.
> 
>              * gcov-1.C: Fix switch counts.
>              * gcov-4b.c: Likewise.
> 
> What stands out to me in the check is that it uses location-file and 
> location-line to decide if to split the edge. I added a few prints to 
> see when the file/line is set:
> 
>     2 int goto1 (int a) {
> 
> 
>     3     if (a)
>     4         goto end;
>     5
>     6     return 1;
>     7 end:
>     8     x += a;
>     9     return 0;
>    10 }
> 
>      if (a_5(D) != 0)
> 
> 
>          edge (true)
>          last goto2.c:3
>          goto (null):0
> 
>      if (a_5(D) != 0)
>          edge (false)
>          last goto2.c:3
>          goto (null):0
> 
>      // predicted unlikely by goto predictor.
>          edge (fallthru)
>          last goto2.c:4
>          goto goto2.c:4
> 
> The goto statement is the only with with a location for both the basic 
> block and the edge.
> 
>    12 int goto2 (int a) {
> 
> 
>    13     if (a) { goto end; }
>    14     else   { label: a++; }
>    15
>    16     return 1;
>    17 end:
>    18     x += a;
>    19     return 0;
>    20 }
> 
>      if (a_5(D) != 0)
>          edge (true)
>          last goto2.c:13
>          goto (null):0
> 
>      if (a_5(D) != 0)
>          edge (false)
>          last goto2.c:13
>          goto goto2.c:14
> 
>      // predicted unlikely by goto predictor.
>          edge (fallthru)
>          last goto2.c:13
>          goto goto2.c:13
> 
> Now the else block has two locations as well, with the edge label 
> e->goto_locus being inside the else block. Note that this label is 
> _unrelated_ to the edge jump a (false) -> else.
> 
> Now a function without gotos:
> 
>    22 int goto3 (int a, int b) {
>    23     if (a && b) {
>    24         x += a * b;
>    25     } else {
>    26         x -= 1;
>    27     }
>    28     return 0;
>    29 }
> 
>      if (a_7(D) != 0)
>          edge (true)
>          last goto2.c:23
>          goto (null):0
> 
>      if (a_7(D) != 0)
>          edge (false)
>          last goto2.c:23
>          goto (null):0
> 
>      if (b_8(D) != 0)
>          edge (true)
>          last goto2.c:23
>          goto (null):0
> 
>      if (b_8(D) != 0)
>          edge (false)
>          last goto2.c:23
>          goto (null):0
> 
>      x = _3;
>          edge (fallthru)
>          last goto2.c:24
>          goto goto2.c:24
> 
>      x = _5;
>          edge (fallthru)
>          last goto2.c:26
>          goto (null):0
> 
> Now the checks if (a) and (b) don't have goto_locus on the edges. For 
> completeness I included the implied jumps in the then/else blocks which 
> _do_ have edge locus in the then case.
> 
> Finally, the case that expose the problem for me:
> 
>    31 int goto4 (int a, int b) {
>    32     if (a) {
>    33         if (b) goto elseblock;
>    34         else a++;
>    35     } else {
>    36 elseblock:
>    37         a--;
>    38     }
>    39
>    40     return a;
>    41 }
> 
>      if (a_2(D) != 0)
>          edge (true)
>          last goto2.c:32
>          goto (null):0
> 
>      if (a_2(D) != 0)
>          edge (false)
>          last goto2.c:32
>          goto goto2.c:36
> 
>      if (b_3(D) != 0)
>          edge (true)
>          last goto2.c:33
>          goto (null):0
> 
>      if (b_3(D) != 0)
>          edge (false)
>          last goto2.c:33
>          goto (null):0
> 
> Again, a (false) has a goto_locus because there is an unrelated label at 
> the top of the else block. For completeness, it also applies happens 
> when there's a label on top of then:
> 
>    43 int goto5 (int a, int b) {
>    44     if (a) {
>    45 then:
>    46         a++;
>    47     } else {
>    48         a--;
> 
> 
>    49     }
>    50
>    51     return a;
>    52 }
> 
>      if (a_2(D) != 0)
>          edge (true)
>          last goto2.c:44
>          goto goto2.c:45
> 
>      if (a_2(D) != 0)
>          edge (false)
>          last goto2.c:44
>          goto (null):0
> 
> This causes the edge to split which probably isn't a problem for the 
> branch coverage, but it is problematic for my condition coverage 
> algorithm. So how do we solve this?
> 
> 1. Is the edge splitting necessary? I didn't find a test that covers 
> this (there might be one, any idea?). If the edge splitting is not 
> necessary anymore then removing it should be fine for my coverage needs.
> 2. Assuming the edge split is necessary making a decision on source file 
> + line seems vulnerable to source formatting:
> 
>    54 int goto6 (int a) {
>    55     if (a) label: goto end;
> 
> 
>    56     return 0;
>    57 end:
>    58     return 1;
>    59
>    60 }
> 
>      if (a_2(D) != 0)
>          edge (true)
>          last goto2.c:55
>          goto goto2.c:55
> 
>      if (a_2(D) != 0)
>          edge (false)
>          last goto2.c:55
>          goto (null):0
> 
>      // predicted unlikely by goto predictor.
>          edge (fallthru)
>          last goto2.c:55
>          goto goto2.c:55
> 
> Now unrelated gotos all have the same file/line signature. Since the 
> edge to the goto_locus is unrelated to the label itself.
> 3. Assuming the test is fine and necessary I *could* work around the 
> problem by recording the original edge somewhere (it is very important 
> that all conditions in an expression share the same then/else basic 
> blocks), or enough metadata to make a virtual edge, but that is a hack I 
> hope I don't have to do.
> 
> Anyway, the problem is not with the cfg construction itself, but an edge 
> splitting that happens specifically for profiling.
> 
> CC Martin, maybe you have any idea?
> 
> Thanks,
> Jørgen.

So I finally found some time to tinker with this and made some good 
progress I think.

Like I wrote before, the problem is the edge split. I tried removing it 
but could not produce any test failures (or so I thought), but it turns 
out not much is missing for it work. I started by running the test suite 
twice and compare the .gcov files before/after applying this patch:


diff --git a/gcc/profile.cc b/gcc/profile.cc
index a4751279571..ff9f6ef2df2 100644
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -1244,19 +1244,7 @@ branch_prob (bool thunk)
                  Don't do that when the locuses match, so
                  if (blah) goto something;
                  is not computed twice.  */
-             if (last
-                 && gimple_has_location (last)
-                 && !RESERVED_LOCATION_P (e->goto_locus)
-                 && !single_succ_p (bb)
-                 && (LOCATION_FILE (e->goto_locus)
-                     != LOCATION_FILE (gimple_location (last))
-                     || (LOCATION_LINE (e->goto_locus)
-                         != LOCATION_LINE (gimple_location (last)))))
-               {
-                 basic_block new_bb = split_edge (e);
-                 edge ne = single_succ_edge (new_bb);
-                 ne->goto_locus = e->goto_locus;
-               }
+
               if ((e->flags & (EDGE_ABNORMAL | EDGE_ABNORMAL_CALL))
                    && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
                 need_exit_edge = 1;

--- a/gcc/testsuite/lib/gcov.exp
+++ b/gcc/testsuite/lib/gcov.exp
@@ -42,7 +42,7 @@ proc clean-gcov { testcase } {
      clean-gcov-file $testcase "gcno"
      clean-gcov-file $testcase "gcda"
      clean-gcov-file $testcase "h.gcov"
-    remote_file host delete "$testcase.gcov"
+    #remote_file host delete "$testcase.gcov"
  }


Run the tests and diff (filtered and most relevant section included):

$ make check-gcc RUNTESTFLAGS=gcov.exp
$ diff -r sans-split-edge with-split-edge | grep -C 2 -E "^[<>]\s\s"
diff -r sans-split-edge/gcc/gcov-4.c.gcov with-split-edge/gcc/gcov-4.c.gcov
   228c228
   <         -:  224:        break;
   ---
   >         1:  224:        break;
   231c231
   <         -:  227:        break;
   ---
   >     #####:  227:        break;
   237c237
   <         -:  233:        break;
   ---
   >         2:  233:        break;

Ok, so there are some breaks that are counted with the edge and not 
without. I add the count (N) and lo and behold it fails predictably:

FAIL: gcc.misc-tests/gcov-4.c line 224: is -:should be 1
FAIL: gcc.misc-tests/gcov-4.c line 233: is -:should be 2

These tests pass when the splitting is re-enabled.

Ok, so the splitting is necessary (or at the very least a more 
substantial change), but check is based on locus. I wrote a few test 
programs and printed some diagnostics (file:line:col) when the edge splits.

This program breaks condition coverage because of the edge split:

      1  int fn (int a, int b, int c) {
      2        int x = 0;
      3        if (a && b) {
      4                x = a;
      5        } else {
      6  a_:
      7            x = (a - b);
      8        }
      9
     10        return x;
     11  }

loc (src): t.c:  3:10 -> if (a_3(D) != 0)
e->goto:   t.c:  6: 1 -> a_:
loc (dst): t.c:  6: 1 -> a_:
loc (src): t.c:  3:13 -> if (b_4(D) != 0)
e->goto:   t.c:  6: 1 -> a_:
loc (dst): t.c:  6: 1 -> a_:


and this is extract from gcov-4b.c:

    205  int
    206  test_switch (int i, int j)
    207  {
    208    int result = 0;
    209
    210    switch (i)                            /* branch(80 25) */
    211                                          /* branch(end) */
    212      {
    213        case 1:
    214          result = do_something (2);
    215          break;
    216        case 2:
    217          result = do_something (1024);
    218          break;
    219        case 3:
    220        case 4:
    221          if (j == 2)                     /* branch(67) */
    222                                          /* branch(end) */
    223            return do_something (4);
    224          result = do_something (8);
    225          break;
    226        default:
    227          result = do_something (32);
    228          switch_m++;

loc (src): gcov-4b.c:214:18 -> result_18 = do_something (2);
e->goto:   gcov-4b.c:215: 9 -> _22 = result_3;
loc (dst): gcov-4b.c:231:10 -> _22 = result_3;
loc (src): gcov-4b.c:217:18 -> result_16 = do_something (1024);
e->goto:   gcov-4b.c:218: 9 -> _22 = result_3;
loc (dst): gcov-4b.c:231:10 -> _22 = result_3;
loc (src): gcov-4b.c:224:18 -> result_12 = do_something (8);
e->goto:   gcov-4b.c:225: 9 -> _22 = result_3;
loc (dst): gcov-4b.c:231:10 -> _22 = result_3;

The former is a "bad" split and the latter a good one. Notice that for 
the former the goto (edge) locus is identical to the first statement in 
the destination block, contrary to good split where the goto and dst 
locus are different.

Which leads me to think the problem can be fixed by either extending the 
check to also consider the locus of the destination block, or maybe 
simpler by comparing the e->locus to first_stmt (e->dest)->locus.

In other words, by applying something like this patch we can have 
condition coverage and still have the test suite pass for all other 
cases. I don't think this will be a regression.

diff --git a/gcc/profile.cc b/gcc/profile.cc
index a4751279571..c1b1028599a 100644
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -1239,19 +1240,28 @@ branch_prob (bool thunk)
                     break;
                 }

+               gimple *fst = nullptr;
+             for (gsi = gsi_start_nondebug_bb (e->dest);
+                  !gsi_end_p (gsi);
+                  gsi_next_nondebug (&gsi))
+               {
+                 fst = gsi_stmt (gsi);
+                 if (!RESERVED_LOCATION_P (gimple_location (fst)))
+                   break;
+               }
               /* Edge with goto locus might get wrong coverage info unless
                  it is the only edge out of BB.
                  Don't do that when the locuses match, so
                  if (blah) goto something;
                  is not computed twice.  */
-             if (last
-                 && gimple_has_location (last)
+             if (fst
+                 && gimple_has_location (fst)
                   && !RESERVED_LOCATION_P (e->goto_locus)
                   && !single_succ_p (bb)
                   && (LOCATION_FILE (e->goto_locus)
-                     != LOCATION_FILE (gimple_location (last))
+                     != LOCATION_FILE (gimple_location (fst))
                       || (LOCATION_LINE (e->goto_locus)
-                         != LOCATION_LINE (gimple_location (last)))))
+                         != LOCATION_LINE (gimple_location (fst)))))
                 {
                   basic_block new_bb = split_edge (e);
                   edge ne = single_succ_edge (new_bb);

If this sounds reasonably I will clean up my tree and submit the patches 
and tests. What do you think?

Thanks,
Jørgen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprising CFG construction with goto from then to else
  2022-10-03 12:22     ` Jørgen Kvalsvik
@ 2022-10-06  7:55       ` Richard Biener
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Biener @ 2022-10-06  7:55 UTC (permalink / raw)
  To: Jørgen Kvalsvik; +Cc: GCC Development, jh, mliska

On Mon, Oct 3, 2022 at 2:22 PM Jørgen Kvalsvik <j@lambda.is> wrote:
>
>
> On 9/8/22 12:30, Jørgen Kvalsvik wrote:
> > On 02/09/2022 14:22, Richard Biener wrote:
> >> On Fri, Sep 2, 2022 at 11:50 AM Jørgen Kvalsvik <j@lambda.is> wrote:
> >>>
> >>>
> >>> Hello,
> >>>
> >>> I played some more with odd programs and the effect on control flow
> >>> graph construction (as a part of condition coverage support [1]) and
> >>> came across this:
> >>>
> >>> int fn (int a, int b, int c) {
> >>>       int x = 0;
> >>>       if (a && b) {
> >>>           if (c) {
> >>>               goto a_;
> >>>           } else {
> >>>               x = a;
> >>>           }
> >>>       } else {
> >>> a_:
> >>>           x = (a - b);
> >>>       }
> >>>
> >>>       return x;
> >>> }
> >>>
> >>> Run through gcov --conditions I get:
> >>>
> >>>           4:    5:    if (a && b) {
> >>> condition outcomes covered 2/2
> >>> condition outcomes covered 2/2
> >>>           2:    6:        if (c) {
> >>> condition outcomes covered 2/2
> >>>
> >>> Which is clearly not correct. So I started digging into why and dump the
> >>> CFG as the coverage profiling sees it https://i.imgur.com/d0q72rA.png
> >>> [2]. I apologize for the labeling, but A2 = a, A3 = b, A5 = c and A9 the
> >>> else block. The problem, which is what confuses the algorithm, is that a
> >>> and b don't share A9 as a successor (on false) as I would expect.
> >>>
> >>> If I add some operation before the label the problem disappears and a
> >>> and b share false-destination again https://i.imgur.com/PSrfaLC.png [3].
> >>>
> >>>       } else {
> >>>           x++;
> >>> a_:
> >>>           x = (a - b);
> >>>       }
> >>>
> >>>           4:    5:    if (a && b) {
> >>> condition outcomes covered 4/4
> >>>           2:    6:        if (c) {
> >>> condition outcomes covered 2/2
> >>>
> >>>
> >>> When dumping the cfg in the former case with -fdump-tree-cfg-graph I get
> >>> a CFG without the split destinations in a and b
> >>> https://i.imgur.com/05MCjzp.png [3]. I would assume from this that the
> >>> graph dump happens after _more_ CFG transformations than the branch
> >>> profiling.
> >>>
> >>> So my questions are:
> >>>
> >>> 1. Is the control flow graph expected to be constructed as such where a
> >>> and b don't share outcome, or is it to be considered a bug?
> >>> 2. If yes, would it be problematic to push the branch coverage and
> >>> condition profiling to a later stage where the cfg has been fixed?
> >>
> >> I would say you should only see more nodes merged.  It's a bit hard to
> >> follow
> >> what you say with the namings - I usually run cc1 in gdb, breaking at
> >> execute_build_cfg where you can do, after build_gimple_cfg finished
> >> (and before cleanup_tree_cfg ()) do a 'dot-fn' in gdb which produces a
> >> nice
> >> picture of the CFG and code with graphviz.
> >>
> >> It looks like I would have expected, in particular we do not force a
> >> new basic-block to be generated for a_: after the D.1991: artificial
> >> label we have for the else.  That might be premature optimization
> >> for your case (but the cleanup_tree_cfg () would immediately do
> >> that as well).
> >>
> >> Richard.
> >
> > I did some more digging into this and have isolated the problem to edge
> > splitting inside the branch_prob () function itself.
> >
> > gcc/profile.cc:1248
> >
> >    if (last
> >        && gimple_has_location (last)
> >        && !RESERVED_LOCATION_P (e->goto_locus)
> >        && !single_succ_p (bb)
> >        && (LOCATION_FILE (e->goto_locus)
> >            != LOCATION_FILE (gimple_location (last))
> >            || (LOCATION_LINE (e->goto_locus)
> >                != LOCATION_LINE (gimple_location (last)))))
> >    {
> >        basic_block new_bb = split_edge (e);
> >        edge ne = single_succ_edge (new_bb);
> >        ne->goto_locus = e->goto_locus;
> >    }
> >
> > Based on the cleaned-up cfg that gcc dumps later it looks like this
> > split only lives through the branch coverage/profiling phase (it may
> > bleed slightly later but it shouldn't be of significance).
> >
> > Out of curiosity I removed the splitting altogether and no tests failed
> > when running make check-gcc check-g++ RUNTESTFLAGS="gcov.exp". Either it
> > was not covered by tests in the first place, or whatever behaviour this
> > check is meant to fix is resolved elsewhere. I have to admit I don't
> > really see a difference with/without this patch, but I don't know what
> > to look for.
> >
> > The check was first introduced in 2005 by Jan (cc):
> >
> > commit d783b2a2dc91e1d2c1fea78cac2b6c6c73b3680d
> > Author: Jan Hubicka <jh@suse.cz>
> > Date:   Thu Aug 4 00:10:54 2005 +0200
> >
> >      profile.c (branch_prob): Split edges with goto locus on them to get
> > proper line counts.
> >
> >
> >              * profile.c (branch_prob): Split edges with goto locus on them
> >              to get proper line counts.
> >              * tree-cfg.c (make_cond_expr_edges): Record user goto
> > locuses, if any.
> >
> >              * gcov-1.C: Fix switch counts.
> >              * gcov-4b.c: Likewise.
> >
> > What stands out to me in the check is that it uses location-file and
> > location-line to decide if to split the edge. I added a few prints to
> > see when the file/line is set:
> >
> >     2 int goto1 (int a) {
> >
> >
> >     3     if (a)
> >     4         goto end;
> >     5
> >     6     return 1;
> >     7 end:
> >     8     x += a;
> >     9     return 0;
> >    10 }
> >
> >      if (a_5(D) != 0)
> >
> >
> >          edge (true)
> >          last goto2.c:3
> >          goto (null):0
> >
> >      if (a_5(D) != 0)
> >          edge (false)
> >          last goto2.c:3
> >          goto (null):0
> >
> >      // predicted unlikely by goto predictor.
> >          edge (fallthru)
> >          last goto2.c:4
> >          goto goto2.c:4
> >
> > The goto statement is the only with with a location for both the basic
> > block and the edge.
> >
> >    12 int goto2 (int a) {
> >
> >
> >    13     if (a) { goto end; }
> >    14     else   { label: a++; }
> >    15
> >    16     return 1;
> >    17 end:
> >    18     x += a;
> >    19     return 0;
> >    20 }
> >
> >      if (a_5(D) != 0)
> >          edge (true)
> >          last goto2.c:13
> >          goto (null):0
> >
> >      if (a_5(D) != 0)
> >          edge (false)
> >          last goto2.c:13
> >          goto goto2.c:14
> >
> >      // predicted unlikely by goto predictor.
> >          edge (fallthru)
> >          last goto2.c:13
> >          goto goto2.c:13
> >
> > Now the else block has two locations as well, with the edge label
> > e->goto_locus being inside the else block. Note that this label is
> > _unrelated_ to the edge jump a (false) -> else.
> >
> > Now a function without gotos:
> >
> >    22 int goto3 (int a, int b) {
> >    23     if (a && b) {
> >    24         x += a * b;
> >    25     } else {
> >    26         x -= 1;
> >    27     }
> >    28     return 0;
> >    29 }
> >
> >      if (a_7(D) != 0)
> >          edge (true)
> >          last goto2.c:23
> >          goto (null):0
> >
> >      if (a_7(D) != 0)
> >          edge (false)
> >          last goto2.c:23
> >          goto (null):0
> >
> >      if (b_8(D) != 0)
> >          edge (true)
> >          last goto2.c:23
> >          goto (null):0
> >
> >      if (b_8(D) != 0)
> >          edge (false)
> >          last goto2.c:23
> >          goto (null):0
> >
> >      x = _3;
> >          edge (fallthru)
> >          last goto2.c:24
> >          goto goto2.c:24
> >
> >      x = _5;
> >          edge (fallthru)
> >          last goto2.c:26
> >          goto (null):0
> >
> > Now the checks if (a) and (b) don't have goto_locus on the edges. For
> > completeness I included the implied jumps in the then/else blocks which
> > _do_ have edge locus in the then case.
> >
> > Finally, the case that expose the problem for me:
> >
> >    31 int goto4 (int a, int b) {
> >    32     if (a) {
> >    33         if (b) goto elseblock;
> >    34         else a++;
> >    35     } else {
> >    36 elseblock:
> >    37         a--;
> >    38     }
> >    39
> >    40     return a;
> >    41 }
> >
> >      if (a_2(D) != 0)
> >          edge (true)
> >          last goto2.c:32
> >          goto (null):0
> >
> >      if (a_2(D) != 0)
> >          edge (false)
> >          last goto2.c:32
> >          goto goto2.c:36
> >
> >      if (b_3(D) != 0)
> >          edge (true)
> >          last goto2.c:33
> >          goto (null):0
> >
> >      if (b_3(D) != 0)
> >          edge (false)
> >          last goto2.c:33
> >          goto (null):0
> >
> > Again, a (false) has a goto_locus because there is an unrelated label at
> > the top of the else block. For completeness, it also applies happens
> > when there's a label on top of then:
> >
> >    43 int goto5 (int a, int b) {
> >    44     if (a) {
> >    45 then:
> >    46         a++;
> >    47     } else {
> >    48         a--;
> >
> >
> >    49     }
> >    50
> >    51     return a;
> >    52 }
> >
> >      if (a_2(D) != 0)
> >          edge (true)
> >          last goto2.c:44
> >          goto goto2.c:45
> >
> >      if (a_2(D) != 0)
> >          edge (false)
> >          last goto2.c:44
> >          goto (null):0
> >
> > This causes the edge to split which probably isn't a problem for the
> > branch coverage, but it is problematic for my condition coverage
> > algorithm. So how do we solve this?
> >
> > 1. Is the edge splitting necessary? I didn't find a test that covers
> > this (there might be one, any idea?). If the edge splitting is not
> > necessary anymore then removing it should be fine for my coverage needs.
> > 2. Assuming the edge split is necessary making a decision on source file
> > + line seems vulnerable to source formatting:
> >
> >    54 int goto6 (int a) {
> >    55     if (a) label: goto end;
> >
> >
> >    56     return 0;
> >    57 end:
> >    58     return 1;
> >    59
> >    60 }
> >
> >      if (a_2(D) != 0)
> >          edge (true)
> >          last goto2.c:55
> >          goto goto2.c:55
> >
> >      if (a_2(D) != 0)
> >          edge (false)
> >          last goto2.c:55
> >          goto (null):0
> >
> >      // predicted unlikely by goto predictor.
> >          edge (fallthru)
> >          last goto2.c:55
> >          goto goto2.c:55
> >
> > Now unrelated gotos all have the same file/line signature. Since the
> > edge to the goto_locus is unrelated to the label itself.
> > 3. Assuming the test is fine and necessary I *could* work around the
> > problem by recording the original edge somewhere (it is very important
> > that all conditions in an expression share the same then/else basic
> > blocks), or enough metadata to make a virtual edge, but that is a hack I
> > hope I don't have to do.
> >
> > Anyway, the problem is not with the cfg construction itself, but an edge
> > splitting that happens specifically for profiling.
> >
> > CC Martin, maybe you have any idea?
> >
> > Thanks,
> > Jørgen.
>
> So I finally found some time to tinker with this and made some good
> progress I think.
>
> Like I wrote before, the problem is the edge split. I tried removing it
> but could not produce any test failures (or so I thought), but it turns
> out not much is missing for it work. I started by running the test suite
> twice and compare the .gcov files before/after applying this patch:
>
>
> diff --git a/gcc/profile.cc b/gcc/profile.cc
> index a4751279571..ff9f6ef2df2 100644
> --- a/gcc/profile.cc
> +++ b/gcc/profile.cc
> @@ -1244,19 +1244,7 @@ branch_prob (bool thunk)
>                   Don't do that when the locuses match, so
>                   if (blah) goto something;
>                   is not computed twice.  */
> -             if (last
> -                 && gimple_has_location (last)
> -                 && !RESERVED_LOCATION_P (e->goto_locus)
> -                 && !single_succ_p (bb)
> -                 && (LOCATION_FILE (e->goto_locus)
> -                     != LOCATION_FILE (gimple_location (last))
> -                     || (LOCATION_LINE (e->goto_locus)
> -                         != LOCATION_LINE (gimple_location (last)))))
> -               {
> -                 basic_block new_bb = split_edge (e);
> -                 edge ne = single_succ_edge (new_bb);
> -                 ne->goto_locus = e->goto_locus;
> -               }
> +
>                if ((e->flags & (EDGE_ABNORMAL | EDGE_ABNORMAL_CALL))
>                     && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
>                  need_exit_edge = 1;
>
> --- a/gcc/testsuite/lib/gcov.exp
> +++ b/gcc/testsuite/lib/gcov.exp
> @@ -42,7 +42,7 @@ proc clean-gcov { testcase } {
>       clean-gcov-file $testcase "gcno"
>       clean-gcov-file $testcase "gcda"
>       clean-gcov-file $testcase "h.gcov"
> -    remote_file host delete "$testcase.gcov"
> +    #remote_file host delete "$testcase.gcov"
>   }
>
>
> Run the tests and diff (filtered and most relevant section included):
>
> $ make check-gcc RUNTESTFLAGS=gcov.exp
> $ diff -r sans-split-edge with-split-edge | grep -C 2 -E "^[<>]\s\s"
> diff -r sans-split-edge/gcc/gcov-4.c.gcov with-split-edge/gcc/gcov-4.c.gcov
>    228c228
>    <         -:  224:        break;
>    ---
>    >         1:  224:        break;
>    231c231
>    <         -:  227:        break;
>    ---
>    >     #####:  227:        break;
>    237c237
>    <         -:  233:        break;
>    ---
>    >         2:  233:        break;
>
> Ok, so there are some breaks that are counted with the edge and not
> without. I add the count (N) and lo and behold it fails predictably:
>
> FAIL: gcc.misc-tests/gcov-4.c line 224: is -:should be 1
> FAIL: gcc.misc-tests/gcov-4.c line 233: is -:should be 2
>
> These tests pass when the splitting is re-enabled.
>
> Ok, so the splitting is necessary (or at the very least a more
> substantial change), but check is based on locus. I wrote a few test
> programs and printed some diagnostics (file:line:col) when the edge splits.
>
> This program breaks condition coverage because of the edge split:
>
>       1  int fn (int a, int b, int c) {
>       2        int x = 0;
>       3        if (a && b) {
>       4                x = a;
>       5        } else {
>       6  a_:
>       7            x = (a - b);
>       8        }
>       9
>      10        return x;
>      11  }
>
> loc (src): t.c:  3:10 -> if (a_3(D) != 0)
> e->goto:   t.c:  6: 1 -> a_:
> loc (dst): t.c:  6: 1 -> a_:
> loc (src): t.c:  3:13 -> if (b_4(D) != 0)
> e->goto:   t.c:  6: 1 -> a_:
> loc (dst): t.c:  6: 1 -> a_:
>
>
> and this is extract from gcov-4b.c:
>
>     205  int
>     206  test_switch (int i, int j)
>     207  {
>     208    int result = 0;
>     209
>     210    switch (i)                            /* branch(80 25) */
>     211                                          /* branch(end) */
>     212      {
>     213        case 1:
>     214          result = do_something (2);
>     215          break;
>     216        case 2:
>     217          result = do_something (1024);
>     218          break;
>     219        case 3:
>     220        case 4:
>     221          if (j == 2)                     /* branch(67) */
>     222                                          /* branch(end) */
>     223            return do_something (4);
>     224          result = do_something (8);
>     225          break;
>     226        default:
>     227          result = do_something (32);
>     228          switch_m++;
>
> loc (src): gcov-4b.c:214:18 -> result_18 = do_something (2);
> e->goto:   gcov-4b.c:215: 9 -> _22 = result_3;
> loc (dst): gcov-4b.c:231:10 -> _22 = result_3;
> loc (src): gcov-4b.c:217:18 -> result_16 = do_something (1024);
> e->goto:   gcov-4b.c:218: 9 -> _22 = result_3;
> loc (dst): gcov-4b.c:231:10 -> _22 = result_3;
> loc (src): gcov-4b.c:224:18 -> result_12 = do_something (8);
> e->goto:   gcov-4b.c:225: 9 -> _22 = result_3;
> loc (dst): gcov-4b.c:231:10 -> _22 = result_3;
>
> The former is a "bad" split and the latter a good one. Notice that for
> the former the goto (edge) locus is identical to the first statement in
> the destination block, contrary to good split where the goto and dst
> locus are different.
>
> Which leads me to think the problem can be fixed by either extending the
> check to also consider the locus of the destination block, or maybe
> simpler by comparing the e->locus to first_stmt (e->dest)->locus.
>
> In other words, by applying something like this patch we can have
> condition coverage and still have the test suite pass for all other
> cases. I don't think this will be a regression.
>
> diff --git a/gcc/profile.cc b/gcc/profile.cc
> index a4751279571..c1b1028599a 100644
> --- a/gcc/profile.cc
> +++ b/gcc/profile.cc
> @@ -1239,19 +1240,28 @@ branch_prob (bool thunk)
>                      break;
>                  }
>
> +               gimple *fst = nullptr;
> +             for (gsi = gsi_start_nondebug_bb (e->dest);
> +                  !gsi_end_p (gsi);
> +                  gsi_next_nondebug (&gsi))
> +               {
> +                 fst = gsi_stmt (gsi);
> +                 if (!RESERVED_LOCATION_P (gimple_location (fst)))
> +                   break;
> +               }
>                /* Edge with goto locus might get wrong coverage info unless
>                   it is the only edge out of BB.
>                   Don't do that when the locuses match, so
>                   if (blah) goto something;
>                   is not computed twice.  */
> -             if (last
> -                 && gimple_has_location (last)
> +             if (fst
> +                 && gimple_has_location (fst)
>                    && !RESERVED_LOCATION_P (e->goto_locus)
>                    && !single_succ_p (bb)
>                    && (LOCATION_FILE (e->goto_locus)
> -                     != LOCATION_FILE (gimple_location (last))
> +                     != LOCATION_FILE (gimple_location (fst))
>                        || (LOCATION_LINE (e->goto_locus)
> -                         != LOCATION_LINE (gimple_location (last)))))
> +                         != LOCATION_LINE (gimple_location (fst)))))
>                  {
>                    basic_block new_bb = split_edge (e);
>                    edge ne = single_succ_edge (new_bb);
>
> If this sounds reasonably I will clean up my tree and submit the patches
> and tests. What do you think?

I think the above heuristics only work if they match up with the
heuristics where to insert the edge counters to?  Your patch above
also can cause extra edge splits in case the location of 'last' was
the same as the goto location but the location of 'fst' not.  So I can't
see how this is an unconditional win?

Richard.

>
> Thanks,
> Jørgen

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-06  7:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-02  9:49 Surprising CFG construction with goto from then to else Jørgen Kvalsvik
2022-09-02 12:22 ` Richard Biener
2022-09-08 10:30   ` Jørgen Kvalsvik
2022-09-16 11:05     ` Jørgen Kvalsvik
2022-10-03 10:59       ` Martin Liška
2022-10-03 12:22     ` Jørgen Kvalsvik
2022-10-06  7:55       ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).