From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx.kolabnow.com (mx.kolabnow.com [212.103.80.154]) by sourceware.org (Postfix) with ESMTPS id EA731381D4FC for ; Fri, 16 Sep 2022 11:05:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EA731381D4FC Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=lambda.is Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=lambda.is Received: from localhost (unknown [127.0.0.1]) by mx.kolabnow.com (Postfix) with ESMTP id 776DC47075; Fri, 16 Sep 2022 13:05:26 +0200 (CEST) Authentication-Results: ext-mx-out003.mykolab.com (amavisd-new); dkim=pass (4096-bit key) reason="pass (just generated, assumed good)" header.d=kolabnow.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kolabnow.com; h= content-transfer-encoding:content-type:content-type:in-reply-to :references:from:from:content-language:subject:subject :mime-version:date:date:message-id:received:received:received; s=dkim20160331; t=1663326325; x=1665140726; bh=oRLxe0PgrW80xPay 8aCtLdCcUaewGxPyVbGCcYHDwts=; b=aEnK8h1sbeF3Jeqk5gTVWYqjeMYxgYJG im2O5TVX3J6L+ra1DwDUNBgBCk1LAkfvDPXhjBUHVUc6FM4yyqvnqu45SMVhTlQC 3rAzfvDzgJuqprHRf1tHkmndCvJdmyQZ+fX/slkon2HJIUFbiKGtbl8kMKLL4cI9 vGEuxs+CGSs3VcVGWyWRHP8l0DkRPzYILtzicOz+aQZ5liBiFzyqnLPgvYoyXBQk unLZPWK4QHuB1HT5YLSY6Q1HUZzpz5h1I4oJ+Uvqiui/yDxpILcLsYkF0uFdVM1v flQhraSpH+4NwngF3qSOz8EQmjNPmve5KpxQcmsxLlGYjxwEvDH/TVtolDGVs4Gc z6Xlpojg9CvC7q9nDApqNfCsoqAoCY3+3mlWfXg4dpOWE/I8k8ZNxqlf/Dx3Gkab LL0TTmXok/XA7WkIs73jPx3aynmdsCwOrYQR/wP03nDM77u6BUYSbR1uPRuLtL20 Jq8MeKHrMpU6xqYFqzyehHm8wjV+w+WIuuVN+vBn4uCWu5gPc1y8Fsf5cXCp6t8V axcx5xl4Ko2iARvxDvCOp2zZR9i6V9tCAX13gx2RzersvpBCUmARUkSoFMX1+Xae YiAEa5d657vzunltl8Sva6zyMnBeDNV8O/6fwp5ze2uK4RA9c7vTgGEgiput7EWT 9X3FeRDPnns= X-Virus-Scanned: amavisd-new at mykolab.com X-Spam-Score: -1.899 X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,SPF_HELO_NONE,SPF_PASS,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 Received: from mx.kolabnow.com ([127.0.0.1]) by localhost (ext-mx-out003.mykolab.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HmdvsvbuU6j9; Fri, 16 Sep 2022 13:05:25 +0200 (CEST) Received: from int-mx003.mykolab.com (unknown [10.9.13.3]) by mx.kolabnow.com (Postfix) with ESMTPS id 7A6AF4706A; Fri, 16 Sep 2022 13:05:24 +0200 (CEST) Received: from ext-subm003.mykolab.com (unknown [10.9.6.3]) by int-mx003.mykolab.com (Postfix) with ESMTPS id 262D3365A; Fri, 16 Sep 2022 13:05:24 +0200 (CEST) Message-ID: <6db0e9be-4143-abc4-d280-f550e8f68cac@lambda.is> Date: Fri, 16 Sep 2022 13:05:22 +0200 MIME-Version: 1.0 Subject: Re: Surprising CFG construction with goto from then to else Content-Language: en-US From: =?UTF-8?Q?J=c3=b8rgen_Kvalsvik?= To: Richard Biener Cc: GCC Development , jh@suse.cz, mliska@suse.cz References: <70a6f006-7e51-f962-ab5b-a6d785b936ea@lambda.is> In-Reply-To: <70a6f006-7e51-f962-ab5b-a6d785b936ea@lambda.is> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Gentle ping. Any idea if the edge split is still useful and/or how to test for it? Thanks, Jørgen On 08/09/2022 12:30, Jørgen Kvalsvik wrote: > On 02/09/2022 14:22, Richard Biener wrote: >> On Fri, Sep 2, 2022 at 11:50 AM Jørgen Kvalsvik wrote: >>> >>> >>> Hello, >>> >>> I played some more with odd programs and the effect on control flow >>> graph construction (as a part of condition coverage support [1]) and >>> came across this: >>> >>> int fn (int a, int b, int c) { >>>       int x = 0; >>>       if (a && b) { >>>           if (c) { >>>               goto a_; >>>           } else { >>>               x = a; >>>           } >>>       } else { >>> a_: >>>           x = (a - b); >>>       } >>> >>>       return x; >>> } >>> >>> Run through gcov --conditions I get: >>> >>>           4:    5:    if (a && b) { >>> condition outcomes covered 2/2 >>> condition outcomes covered 2/2 >>>           2:    6:        if (c) { >>> condition outcomes covered 2/2 >>> >>> Which is clearly not correct. So I started digging into why and dump the >>> CFG as the coverage profiling sees it https://i.imgur.com/d0q72rA.png >>> [2]. I apologize for the labeling, but A2 = a, A3 = b, A5 = c and A9 the >>> else block. The problem, which is what confuses the algorithm, is that a >>> and b don't share A9 as a successor (on false) as I would expect. >>> >>> If I add some operation before the label the problem disappears and a >>> and b share false-destination again https://i.imgur.com/PSrfaLC.png [3]. >>> >>>       } else { >>>           x++; >>> a_: >>>           x = (a - b); >>>       } >>> >>>           4:    5:    if (a && b) { >>> condition outcomes covered 4/4 >>>           2:    6:        if (c) { >>> condition outcomes covered 2/2 >>> >>> >>> When dumping the cfg in the former case with -fdump-tree-cfg-graph I get >>> a CFG without the split destinations in a and b >>> https://i.imgur.com/05MCjzp.png [3]. I would assume from this that the >>> graph dump happens after _more_ CFG transformations than the branch >>> profiling. >>> >>> So my questions are: >>> >>> 1. Is the control flow graph expected to be constructed as such where a >>> and b don't share outcome, or is it to be considered a bug? >>> 2. If yes, would it be problematic to push the branch coverage and >>> condition profiling to a later stage where the cfg has been fixed? >> >> I would say you should only see more nodes merged.  It's a bit hard to >> follow >> what you say with the namings - I usually run cc1 in gdb, breaking at >> execute_build_cfg where you can do, after build_gimple_cfg finished >> (and before cleanup_tree_cfg ()) do a 'dot-fn' in gdb which produces a >> nice >> picture of the CFG and code with graphviz. >> >> It looks like I would have expected, in particular we do not force a >> new basic-block to be generated for a_: after the D.1991: artificial >> label we have for the else.  That might be premature optimization >> for your case (but the cleanup_tree_cfg () would immediately do >> that as well). >> >> Richard. > > I did some more digging into this and have isolated the problem to edge > splitting inside the branch_prob () function itself. > > gcc/profile.cc:1248 > >   if (last >       && gimple_has_location (last) >       && !RESERVED_LOCATION_P (e->goto_locus) >       && !single_succ_p (bb) >       && (LOCATION_FILE (e->goto_locus) >           != LOCATION_FILE (gimple_location (last)) >           || (LOCATION_LINE (e->goto_locus) >               != LOCATION_LINE (gimple_location (last))))) >   { >       basic_block new_bb = split_edge (e); >       edge ne = single_succ_edge (new_bb); >       ne->goto_locus = e->goto_locus; >   } > > Based on the cleaned-up cfg that gcc dumps later it looks like this > split only lives through the branch coverage/profiling phase (it may > bleed slightly later but it shouldn't be of significance). > > Out of curiosity I removed the splitting altogether and no tests failed > when running make check-gcc check-g++ RUNTESTFLAGS="gcov.exp". Either it > was not covered by tests in the first place, or whatever behaviour this > check is meant to fix is resolved elsewhere. I have to admit I don't > really see a difference with/without this patch, but I don't know what > to look for. > > The check was first introduced in 2005 by Jan (cc): > > commit d783b2a2dc91e1d2c1fea78cac2b6c6c73b3680d > Author: Jan Hubicka > Date:   Thu Aug 4 00:10:54 2005 +0200 > >     profile.c (branch_prob): Split edges with goto locus on them to get > proper line counts. > > >             * profile.c (branch_prob): Split edges with goto locus on them >             to get proper line counts. >             * tree-cfg.c (make_cond_expr_edges): Record user goto > locuses, if any. > >             * gcov-1.C: Fix switch counts. >             * gcov-4b.c: Likewise. > > What stands out to me in the check is that it uses location-file and > location-line to decide if to split the edge. I added a few prints to > see when the file/line is set: > >    2 int goto1 (int a) { > > >    3     if (a) >    4         goto end; >    5 >    6     return 1; >    7 end: >    8     x += a; >    9     return 0; >   10 } > >     if (a_5(D) != 0) > > >         edge (true) >         last goto2.c:3 >         goto (null):0 > >     if (a_5(D) != 0) >         edge (false) >         last goto2.c:3 >         goto (null):0 > >     // predicted unlikely by goto predictor. >         edge (fallthru) >         last goto2.c:4 >         goto goto2.c:4 > > The goto statement is the only with with a location for both the basic > block and the edge. > >   12 int goto2 (int a) { > > >   13     if (a) { goto end; } >   14     else   { label: a++; } >   15 >   16     return 1; >   17 end: >   18     x += a; >   19     return 0; >   20 } > >     if (a_5(D) != 0) >         edge (true) >         last goto2.c:13 >         goto (null):0 > >     if (a_5(D) != 0) >         edge (false) >         last goto2.c:13 >         goto goto2.c:14 > >     // predicted unlikely by goto predictor. >         edge (fallthru) >         last goto2.c:13 >         goto goto2.c:13 > > Now the else block has two locations as well, with the edge label > e->goto_locus being inside the else block. Note that this label is > _unrelated_ to the edge jump a (false) -> else. > > Now a function without gotos: > >   22 int goto3 (int a, int b) { >   23     if (a && b) { >   24         x += a * b; >   25     } else { >   26         x -= 1; >   27     } >   28     return 0; >   29 } > >     if (a_7(D) != 0) >         edge (true) >         last goto2.c:23 >         goto (null):0 > >     if (a_7(D) != 0) >         edge (false) >         last goto2.c:23 >         goto (null):0 > >     if (b_8(D) != 0) >         edge (true) >         last goto2.c:23 >         goto (null):0 > >     if (b_8(D) != 0) >         edge (false) >         last goto2.c:23 >         goto (null):0 > >     x = _3; >         edge (fallthru) >         last goto2.c:24 >         goto goto2.c:24 > >     x = _5; >         edge (fallthru) >         last goto2.c:26 >         goto (null):0 > > Now the checks if (a) and (b) don't have goto_locus on the edges. For > completeness I included the implied jumps in the then/else blocks which > _do_ have edge locus in the then case. > > Finally, the case that expose the problem for me: > >   31 int goto4 (int a, int b) { >   32     if (a) { >   33         if (b) goto elseblock; >   34         else a++; >   35     } else { >   36 elseblock: >   37         a--; >   38     } >   39 >   40     return a; >   41 } > >     if (a_2(D) != 0) >         edge (true) >         last goto2.c:32 >         goto (null):0 > >     if (a_2(D) != 0) >         edge (false) >         last goto2.c:32 >         goto goto2.c:36 > >     if (b_3(D) != 0) >         edge (true) >         last goto2.c:33 >         goto (null):0 > >     if (b_3(D) != 0) >         edge (false) >         last goto2.c:33 >         goto (null):0 > > Again, a (false) has a goto_locus because there is an unrelated label at > the top of the else block. For completeness, it also applies happens > when there's a label on top of then: > >   43 int goto5 (int a, int b) { >   44     if (a) { >   45 then: >   46         a++; >   47     } else { >   48         a--; > > >   49     } >   50 >   51     return a; >   52 } > >     if (a_2(D) != 0) >         edge (true) >         last goto2.c:44 >         goto goto2.c:45 > >     if (a_2(D) != 0) >         edge (false) >         last goto2.c:44 >         goto (null):0 > > This causes the edge to split which probably isn't a problem for the > branch coverage, but it is problematic for my condition coverage > algorithm. So how do we solve this? > > 1. Is the edge splitting necessary? I didn't find a test that covers > this (there might be one, any idea?). If the edge splitting is not > necessary anymore then removing it should be fine for my coverage needs. > 2. Assuming the edge split is necessary making a decision on source file > + line seems vulnerable to source formatting: > >   54 int goto6 (int a) { >   55     if (a) label: goto end; > > >   56     return 0; >   57 end: >   58     return 1; >   59 >   60 } > >     if (a_2(D) != 0) >         edge (true) >         last goto2.c:55 >         goto goto2.c:55 > >     if (a_2(D) != 0) >         edge (false) >         last goto2.c:55 >         goto (null):0 > >     // predicted unlikely by goto predictor. >         edge (fallthru) >         last goto2.c:55 >         goto goto2.c:55 > > Now unrelated gotos all have the same file/line signature. Since the > edge to the goto_locus is unrelated to the label itself. > 3. Assuming the test is fine and necessary I *could* work around the > problem by recording the original edge somewhere (it is very important > that all conditions in an expression share the same then/else basic > blocks), or enough metadata to make a virtual edge, but that is a hack I > hope I don't have to do. > > Anyway, the problem is not with the cfg construction itself, but an edge > splitting that happens specifically for profiling. > > CC Martin, maybe you have any idea? > > Thanks, > Jørgen.