Re: Surprising CFG construction with goto from then to else

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Jørgen Kvalsvik" <j@lambda.is>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Development <gcc@gcc.gnu.org>, jh@suse.cz, mliska@suse.cz
Subject: Re: Surprising CFG construction with goto from then to else
Date: Fri, 16 Sep 2022 13:05:22 +0200	[thread overview]
Message-ID: <6db0e9be-4143-abc4-d280-f550e8f68cac@lambda.is> (raw)
In-Reply-To: <70a6f006-7e51-f962-ab5b-a6d785b936ea@lambda.is>

Gentle ping. Any idea if the edge split is still useful and/or how to 
test for it?

Thanks,
Jørgen

On 08/09/2022 12:30, Jørgen Kvalsvik wrote:
> On 02/09/2022 14:22, Richard Biener wrote:
>> On Fri, Sep 2, 2022 at 11:50 AM Jørgen Kvalsvik <j@lambda.is> wrote:
>>>
>>>
>>> Hello,
>>>
>>> I played some more with odd programs and the effect on control flow
>>> graph construction (as a part of condition coverage support [1]) and
>>> came across this:
>>>
>>> int fn (int a, int b, int c) {
>>>       int x = 0;
>>>       if (a && b) {
>>>           if (c) {
>>>               goto a_;
>>>           } else {
>>>               x = a;
>>>           }
>>>       } else {
>>> a_:
>>>           x = (a - b);
>>>       }
>>>
>>>       return x;
>>> }
>>>
>>> Run through gcov --conditions I get:
>>>
>>>           4:    5:    if (a && b) {
>>> condition outcomes covered 2/2
>>> condition outcomes covered 2/2
>>>           2:    6:        if (c) {
>>> condition outcomes covered 2/2
>>>
>>> Which is clearly not correct. So I started digging into why and dump the
>>> CFG as the coverage profiling sees it https://i.imgur.com/d0q72rA.png
>>> [2]. I apologize for the labeling, but A2 = a, A3 = b, A5 = c and A9 the
>>> else block. The problem, which is what confuses the algorithm, is that a
>>> and b don't share A9 as a successor (on false) as I would expect.
>>>
>>> If I add some operation before the label the problem disappears and a
>>> and b share false-destination again https://i.imgur.com/PSrfaLC.png [3].
>>>
>>>       } else {
>>>           x++;
>>> a_:
>>>           x = (a - b);
>>>       }
>>>
>>>           4:    5:    if (a && b) {
>>> condition outcomes covered 4/4
>>>           2:    6:        if (c) {
>>> condition outcomes covered 2/2
>>>
>>>
>>> When dumping the cfg in the former case with -fdump-tree-cfg-graph I get
>>> a CFG without the split destinations in a and b
>>> https://i.imgur.com/05MCjzp.png [3]. I would assume from this that the
>>> graph dump happens after _more_ CFG transformations than the branch
>>> profiling.
>>>
>>> So my questions are:
>>>
>>> 1. Is the control flow graph expected to be constructed as such where a
>>> and b don't share outcome, or is it to be considered a bug?
>>> 2. If yes, would it be problematic to push the branch coverage and
>>> condition profiling to a later stage where the cfg has been fixed?
>>
>> I would say you should only see more nodes merged.  It's a bit hard to 
>> follow
>> what you say with the namings - I usually run cc1 in gdb, breaking at
>> execute_build_cfg where you can do, after build_gimple_cfg finished
>> (and before cleanup_tree_cfg ()) do a 'dot-fn' in gdb which produces a 
>> nice
>> picture of the CFG and code with graphviz.
>>
>> It looks like I would have expected, in particular we do not force a
>> new basic-block to be generated for a_: after the D.1991: artificial
>> label we have for the else.  That might be premature optimization
>> for your case (but the cleanup_tree_cfg () would immediately do
>> that as well).
>>
>> Richard.
> 
> I did some more digging into this and have isolated the problem to edge 
> splitting inside the branch_prob () function itself.
> 
> gcc/profile.cc:1248
> 
>    if (last
>        && gimple_has_location (last)
>        && !RESERVED_LOCATION_P (e->goto_locus)
>        && !single_succ_p (bb)
>        && (LOCATION_FILE (e->goto_locus)
>            != LOCATION_FILE (gimple_location (last))
>            || (LOCATION_LINE (e->goto_locus)
>                != LOCATION_LINE (gimple_location (last)))))
>    {
>        basic_block new_bb = split_edge (e);
>        edge ne = single_succ_edge (new_bb);
>        ne->goto_locus = e->goto_locus;
>    }
> 
> Based on the cleaned-up cfg that gcc dumps later it looks like this 
> split only lives through the branch coverage/profiling phase (it may 
> bleed slightly later but it shouldn't be of significance).
> 
> Out of curiosity I removed the splitting altogether and no tests failed 
> when running make check-gcc check-g++ RUNTESTFLAGS="gcov.exp". Either it 
> was not covered by tests in the first place, or whatever behaviour this 
> check is meant to fix is resolved elsewhere. I have to admit I don't 
> really see a difference with/without this patch, but I don't know what 
> to look for.
> 
> The check was first introduced in 2005 by Jan (cc):
> 
> commit d783b2a2dc91e1d2c1fea78cac2b6c6c73b3680d
> Author: Jan Hubicka <jh@suse.cz>
> Date:   Thu Aug 4 00:10:54 2005 +0200
> 
>      profile.c (branch_prob): Split edges with goto locus on them to get 
> proper line counts.
> 
> 
>              * profile.c (branch_prob): Split edges with goto locus on them
>              to get proper line counts.
>              * tree-cfg.c (make_cond_expr_edges): Record user goto 
> locuses, if any.
> 
>              * gcov-1.C: Fix switch counts.
>              * gcov-4b.c: Likewise.
> 
> What stands out to me in the check is that it uses location-file and 
> location-line to decide if to split the edge. I added a few prints to 
> see when the file/line is set:
> 
>     2 int goto1 (int a) {
> 
> 
>     3     if (a)
>     4         goto end;
>     5
>     6     return 1;
>     7 end:
>     8     x += a;
>     9     return 0;
>    10 }
> 
>      if (a_5(D) != 0)
> 
> 
>          edge (true)
>          last goto2.c:3
>          goto (null):0
> 
>      if (a_5(D) != 0)
>          edge (false)
>          last goto2.c:3
>          goto (null):0
> 
>      // predicted unlikely by goto predictor.
>          edge (fallthru)
>          last goto2.c:4
>          goto goto2.c:4
> 
> The goto statement is the only with with a location for both the basic 
> block and the edge.
> 
>    12 int goto2 (int a) {
> 
> 
>    13     if (a) { goto end; }
>    14     else   { label: a++; }
>    15
>    16     return 1;
>    17 end:
>    18     x += a;
>    19     return 0;
>    20 }
> 
>      if (a_5(D) != 0)
>          edge (true)
>          last goto2.c:13
>          goto (null):0
> 
>      if (a_5(D) != 0)
>          edge (false)
>          last goto2.c:13
>          goto goto2.c:14
> 
>      // predicted unlikely by goto predictor.
>          edge (fallthru)
>          last goto2.c:13
>          goto goto2.c:13
> 
> Now the else block has two locations as well, with the edge label 
> e->goto_locus being inside the else block. Note that this label is 
> _unrelated_ to the edge jump a (false) -> else.
> 
> Now a function without gotos:
> 
>    22 int goto3 (int a, int b) {
>    23     if (a && b) {
>    24         x += a * b;
>    25     } else {
>    26         x -= 1;
>    27     }
>    28     return 0;
>    29 }
> 
>      if (a_7(D) != 0)
>          edge (true)
>          last goto2.c:23
>          goto (null):0
> 
>      if (a_7(D) != 0)
>          edge (false)
>          last goto2.c:23
>          goto (null):0
> 
>      if (b_8(D) != 0)
>          edge (true)
>          last goto2.c:23
>          goto (null):0
> 
>      if (b_8(D) != 0)
>          edge (false)
>          last goto2.c:23
>          goto (null):0
> 
>      x = _3;
>          edge (fallthru)
>          last goto2.c:24
>          goto goto2.c:24
> 
>      x = _5;
>          edge (fallthru)
>          last goto2.c:26
>          goto (null):0
> 
> Now the checks if (a) and (b) don't have goto_locus on the edges. For 
> completeness I included the implied jumps in the then/else blocks which 
> _do_ have edge locus in the then case.
> 
> Finally, the case that expose the problem for me:
> 
>    31 int goto4 (int a, int b) {
>    32     if (a) {
>    33         if (b) goto elseblock;
>    34         else a++;
>    35     } else {
>    36 elseblock:
>    37         a--;
>    38     }
>    39
>    40     return a;
>    41 }
> 
>      if (a_2(D) != 0)
>          edge (true)
>          last goto2.c:32
>          goto (null):0
> 
>      if (a_2(D) != 0)
>          edge (false)
>          last goto2.c:32
>          goto goto2.c:36
> 
>      if (b_3(D) != 0)
>          edge (true)
>          last goto2.c:33
>          goto (null):0
> 
>      if (b_3(D) != 0)
>          edge (false)
>          last goto2.c:33
>          goto (null):0
> 
> Again, a (false) has a goto_locus because there is an unrelated label at 
> the top of the else block. For completeness, it also applies happens 
> when there's a label on top of then:
> 
>    43 int goto5 (int a, int b) {
>    44     if (a) {
>    45 then:
>    46         a++;
>    47     } else {
>    48         a--;
> 
> 
>    49     }
>    50
>    51     return a;
>    52 }
> 
>      if (a_2(D) != 0)
>          edge (true)
>          last goto2.c:44
>          goto goto2.c:45
> 
>      if (a_2(D) != 0)
>          edge (false)
>          last goto2.c:44
>          goto (null):0
> 
> This causes the edge to split which probably isn't a problem for the 
> branch coverage, but it is problematic for my condition coverage 
> algorithm. So how do we solve this?
> 
> 1. Is the edge splitting necessary? I didn't find a test that covers 
> this (there might be one, any idea?). If the edge splitting is not 
> necessary anymore then removing it should be fine for my coverage needs.
> 2. Assuming the edge split is necessary making a decision on source file 
> + line seems vulnerable to source formatting:
> 
>    54 int goto6 (int a) {
>    55     if (a) label: goto end;
> 
> 
>    56     return 0;
>    57 end:
>    58     return 1;
>    59
>    60 }
> 
>      if (a_2(D) != 0)
>          edge (true)
>          last goto2.c:55
>          goto goto2.c:55
> 
>      if (a_2(D) != 0)
>          edge (false)
>          last goto2.c:55
>          goto (null):0
> 
>      // predicted unlikely by goto predictor.
>          edge (fallthru)
>          last goto2.c:55
>          goto goto2.c:55
> 
> Now unrelated gotos all have the same file/line signature. Since the 
> edge to the goto_locus is unrelated to the label itself.
> 3. Assuming the test is fine and necessary I *could* work around the 
> problem by recording the original edge somewhere (it is very important 
> that all conditions in an expression share the same then/else basic 
> blocks), or enough metadata to make a virtual edge, but that is a hack I 
> hope I don't have to do.
> 
> Anyway, the problem is not with the cfg construction itself, but an edge 
> splitting that happens specifically for profiling.
> 
> CC Martin, maybe you have any idea?
> 
> Thanks,
> Jørgen.

next prev parent reply	other threads:[~2022-09-16 11:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-02  9:49 Jørgen Kvalsvik
2022-09-02 12:22 ` Richard Biener
2022-09-08 10:30   ` Jørgen Kvalsvik
2022-09-16 11:05     ` Jørgen Kvalsvik [this message]
2022-10-03 10:59       ` Martin Liška
2022-10-03 12:22     ` Jørgen Kvalsvik
2022-10-06  7:55       ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6db0e9be-4143-abc4-d280-f550e8f68cac@lambda.is \
    --to=j@lambda.is \
    --cc=gcc@gcc.gnu.org \
    --cc=jh@suse.cz \
    --cc=mliska@suse.cz \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).