From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27649 invoked by alias); 3 Mar 2020 18:39:34 -0000 Mailing-List: contact binutils-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: binutils-owner@sourceware.org Received: (qmail 26903 invoked by uid 89); 3 Mar 2020 18:39:34 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=nasty X-HELO: psionic.psi5.com Received: from psionic.psi5.com (HELO psionic.psi5.com) (212.83.56.200) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 03 Mar 2020 18:39:32 +0000 Received: by psionic.psi5.com (Postfix, from userid 1002) id 24E25E10A2; Tue, 3 Mar 2020 19:39:29 +0100 (CET) Date: Tue, 03 Mar 2020 18:39:00 -0000 From: Simon Richter To: binutils@sourceware.org Subject: Token-level mapping of coverage information and generated code Message-ID: <20200303183928.GA8300@psi5.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes X-SW-Source: 2020-03/txt/msg00051.txt Hi, I'd like to get finer-than-line-level information for code coverage and optimized-out code. Consider: extern void foo(void); // 1 int test() // 2 { // 3 int a = 0, b = 0, c = 1, d = 0; // 4 if( a == b && a == c && b == c) { d = a; } // 5 foo(); // 6 return d; // 7 } // 8 Compiling with gcc -c -O0 and mapping back to source, I get int test() { 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 10 sub $0x10,%rsp int a = 0, b = 0, c = 1, d = 0; 8: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp) f: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp) 16: c7 45 f0 01 00 00 00 movl $0x1,-0x10(%rbp) 1d: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) if( a == b && a == c && b == c) { d = a; } 24: 8b 45 f8 mov -0x8(%rbp),%eax 27: 3b 45 f4 cmp -0xc(%rbp),%eax 2a: 75 16 jne 42 2c: 8b 45 f8 mov -0x8(%rbp),%eax 2f: 3b 45 f0 cmp -0x10(%rbp),%eax 32: 75 0e jne 42 34: 8b 45 f4 mov -0xc(%rbp),%eax 37: 3b 45 f0 cmp -0x10(%rbp),%eax 3a: 75 06 jne 42 3c: 8b 45 f8 mov -0x8(%rbp),%eax 3f: 89 45 fc mov %eax,-0x4(%rbp) foo(); 42: e8 00 00 00 00 callq 47 43: R_X86_64_PLT32 foo-0x4 return d; 47: 8b 45 fc mov -0x4(%rbp),%eax } 4a: c9 leaveq 4b: c3 retq The finest resolution I can get here is a single line, addr2line reports the exact same mapping for instruction-to-source-line. Instrumenting for code coverage and running, I get 1: 2:int test() -: 3:{ 1: 4: int a = 0, b = 0, c = 1, d = 0; 1*: 5: if( a == b && a == c && b == c) { d = a; } 1: 5-block 0 1: 5-block 1 %%%%%: 5-block 2 %%%%%: 5-block 3 1: 6: foo(); 1: 6-block 0 1: 7: return d; -: 8:} As expected, the condition is resolved into four basic blocks, corresponding to the three tests and the conditional body. Can I somehow map these basic blocks back to the tokens in the source file? Similarly, if I compile with optimization enabled, mapping back to source code gives me int test() { 0: 48 83 ec 08 sub $0x8,%rsp int a = 0, b = 0, c = 1, d = 0; if( a == b && a == c && b == c) { d = a; } foo(); 4: e8 00 00 00 00 callq 9 5: R_X86_64_PLT32 foo-0x4 return d; } 9: 31 c0 xor %eax,%eax b: 48 83 c4 08 add $0x8,%rsp f: c3 retq I can get a bit better mapping information by interrogating addr2line to see what source code lines actually contributed to the output: $ python -c 'for x in range(0, 16): print hex(x)' | \ addr2line -e test.o | \ cut -d: -f2 | \ uniq 3 6 8 This does omit the initialization of d, but I guess that can't be helped since it's propagated into the return statement as a constant, which is probably not that relevant a problem for the real world. Again, I'd like to get a finer-grained mapping than lines here, so I can highlight in the source code which code actually got used in the final output. As a nasty hack, I can run the source code through "tr ' ' '\n'" before compiling, which gives me rather good resolution for the coverage test, but the mapping to subexpressions is somewhat arbitrary, because counters are associated with control flow inside the expression 1: 28:if( -: 29:a -: 30:== -: 31:b 1: 32:&& -: 33:a -: 34:== -: 35:c #####: 36:&& -: 37:b -: 38:== -: 39:c) -: 40:{ -: 41:d #####: 42:= -: 43:a; -: 44:} Is there some way I could accurately extract information from a run that allows me to highlight which subexpressions hve been evaluated? >From the run above, I can possibly get if( a == b && a == c && b == c) { d = a; } ~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~ 1 1 - - which isn't bad, but it could probably be improved. The end goal is to build reports "this condition has not been touched by a testcase" and "this code is unused and the compiler can prove it" Simon