From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26088 invoked by alias); 10 Dec 2010 11:15:24 -0000 Received: (qmail 24999 invoked by uid 22791); 10 Dec 2010 11:13:57 -0000 X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from seketeli.net (HELO ms.seketeli.net) (91.121.166.71) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 10 Dec 2010 11:12:09 +0000 Received: from adjoa.torimasen.com (torimasen.com [82.237.12.13]) by ms.seketeli.net (Postfix) with ESMTP id 8D2C11608038; Fri, 10 Dec 2010 12:11:39 +0100 (CET) Received: by adjoa.torimasen.com (Postfix, from userid 500) id 978198E6047; Fri, 10 Dec 2010 12:11:38 +0100 (CET) From: Dodji Seketeli To: gcc-patches@gcc.gnu.org Cc: tromey@redhat.com, joseph@codesourcery.com, gdr@integrable-solutions.net, lopezibanez@gmail.com Subject: [PATCH 0/6] Tracking locations of tokens resulting from macro expansion Date: Fri, 10 Dec 2010 11:27:00 -0000 Message-Id: <1291979498-1604-1-git-send-email-dodji@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-12/txt/msg00861.txt.bz2 * Problem statement Let's consider the diagnostic GCC emits when an error occurs on an expression that was defined in a macro that is later expanded: [dodji@adjoa gcc]$ cat -n test.c 1 #define OPERATE(OPRD1, OPRT, OPRD2) \ 2 OPRD1 OPRT OPRD2; 3 4 #define SHIFTL(A,B) \ 5 OPERATE (A,<<,B) 6 7 #define MULT(A) \ 8 SHIFTL (A,1) 9 10 void 11 g () 12 { 13 MULT (1.0);/* 1.0 << 1; <-- so this is an error. */ 14 } [dodji@adjoa gcc]$ ./cc1 -quiet test.c test.c: In function ‘g’: test.c:13:3: error: invalid operands to binary << (have ‘double’ and ‘int’) Just looking at line 13 as the diagnostic suggests is not enough to understand why there is an error there. On line 13, there is no "<<" token, but the compiler complains about an opereator "<<" that is taking the wrong types of arguments on /that/ line. Why? Obviously what happened is that the MULT macro defined at line 7 and 8 got expanded. The macro SHIFTL used by MULT and defined at lines 1 and 2 got expanded as well. That macro expansion fest yielded the expression: 1.0 << 1; which is an error as you can't shift a float. The problem is tokens '1.0', '<<' and '1' all have their location set to {line 11, column 3} ({11, 3} for short). That actually is the location of the expansion point of the MULT macro. That's is an interesting observation -- in the current incarnation of GCC the location of each tokens resulting from a macro expansion is set to the location of the expansion point of the macro. * Path for improvement For each token resulting from macro expansion it would be nice if GCC could track two other kinds of locations for tokens resulting from macro expansions: - Spelling location. The location of the point where the token appears in the definition of the macro. The spelling location of "<<" would be {5, 14}. - Macro argument replacement point location. This one only exists for a token that is and arguments of a function-like macro. It's the location of the point in the definition of the macro where the argument replaces the parameter. In other words it's the location of the point where the parameter of a given argument is used in the definition of a function-like macro. For the "<<" token, that would be the location {2,9} of the 'OPRT' token in the definition of the 'OPERATE' macro. By using the three locations exposed above (spelling, argument replacement point and expansion point locations), GCC could emit diagnostics that are hopefully more useful. E.g: [dodji@adjoa gcc]$ ./cc1 -quiet -ftrack-macro-expansion test.c test.c: In function ‘g’: test.c:5:14: error: invalid operands to binary << (have ‘double’ and ‘int’) In macro 'OPERATE' at test.c:2:9 Expanded at test.c:5:3 In macro 'SHIFTL' at test.c:5:14 Expanded at test.c:8:3 In macro 'MULT' at test.c:8:3 Expanded at test.c:13:3 Several obversations can be made in that hypothetical example. - The location mentioned in the error message test.c:5:14: error: invalid operands to binary << (have ‘double’ and ‘int’) is the spelling location of the token '<<'. I believe this makes more sense than the original behaviour. - The macro expansion stack following the error message unwinds from the macro which expansion most directly yielded token "<<". Each element of the stack mentions the argument replacement point and the expansion point locations that were exposed earlier. * What it would take Luckily Tom Tromey attached a patch to this bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=7263 That patch paved the way showing how to build on the locations maps provided by libcpp to come up with what I would call virtual locations. A virtual location is a location that can resolve to many possible physical locuses. For a token that does not result from macro a expansion, a virtual location resolves only to the spelling location of the token. For a token resulting from macro expansion, a virtual location can resolve to either: [I would bet you heard about these different kinds of locations already but here we go] - the expansion point of the macro - the point where the token appears in the definition of the macro - the argument replacement point, in the context of a function-like macro. So I took that patch and massaged it a little bit. The user facing result is actually what I was showing in the [finally not so] hypothetical improved GCC example earlier. With the resulting patch-set the linemap module now provides and API to generate and resolve virtual locations for tokens. When a diagnostic function emit a message about a virtual location, it now can detect [using the new linemap API] if that location belongs to a token resulting from macro expansion. If yes, it can print the macro expansion stack that led to that token. So almost nothing changes for the diagnostics clients. * Limits I guess a pure marketing dude would elide this part :-) The most annoying limit is obviously memory consumption. Tracking the location of every token across every macro expansion does consume memory. I tried compiling some a translation here that heavily uses the C++ standard library and I could see some 13% increase of memory consumption over the entire compilation; note that the compilation was consuming around 250MB without the feature. This is why the feature is triggered by the -ftrack-macro-expansion flag. Without that flag, the memory consumption stays roughly flat. The noticeable downside of that flag is that it makes the code more complex. I think there possibly is room to decrease this, but fundamentaly the consumption is going to increase with the number of macro expanded multiplied by the number of tokens per expansion. The patch-set I have today doesn't track the locations of tokens resulting from pasting and stringification. For those, only the spelling token is tracked for now. I think this can be improved. I just haven't thought about it deeply yet. * Perspectives The patch-set is basically the point A.0) presented in the document http://gcc.gnu.org/wiki/Better_Diagnostics. This is material for GCC 4.7 at best but I felt it would be interesting to post this now. So the patches will follow shortly. For now please find below the summary of the changes. Linemap infrastructure for virtual locations Generate virtual locations for tokens Emit macro expansion related diagnostics Support -fdebug-cpp option Add line map statistics to -fmem-report output Kill pedantic warnings on system headers macros gcc/Makefile.in | 2 +- gcc/ada/gcc-interface/trans.c | 10 +- gcc/c-decl.c | 17 +- gcc/c-family/c-lex.c | 10 +- gcc/c-family/c-opts.c | 17 + gcc/c-family/c-pch.c | 2 +- gcc/c-family/c-ppoutput.c | 92 ++- gcc/c-family/c.opt | 12 + gcc/c-parser.c | 12 +- gcc/c-tree.h | 2 +- gcc/cp/error.c | 2 +- gcc/diagnostic.c | 160 +++- gcc/diagnostic.h | 2 +- gcc/doc/cppopts.texi | 29 + gcc/doc/invoke.texi | 6 +- gcc/fortran/cpp.c | 22 +- gcc/input.c | 107 ++- gcc/input.h | 22 +- gcc/java/jcf-parse.c | 2 +- gcc/testsuite/g++.dg/cpp0x/initlist15.C | 1 + gcc/testsuite/g++.old-deja/g++.robertl/eb43.C | 4 + gcc/testsuite/g++.old-deja/g++.robertl/eb79.C | 4 + gcc/testsuite/gcc.dg/cpp/macro-exp-tracking-1.c | 30 + gcc/testsuite/gcc.dg/cpp/macro-exp-tracking-2.c | 31 + gcc/testsuite/gcc.dg/cpp/macro-exp-tracking-3.c | 18 + gcc/testsuite/gcc.dg/cpp/macro-exp-tracking-4.c | 19 + gcc/testsuite/gcc.dg/cpp/syshdr3.c | 16 + gcc/testsuite/gcc.dg/cpp/syshdr3.h | 7 + gcc/testsuite/gcc.dg/nofixed-point-2.c | 6 +- gcc/testsuite/gcc.target/i386/sse-vect-types.c | 6 + gcc/toplev.c | 1 + gcc/tree-diagnostic.c | 2 +- libcpp/directives-only.c | 7 +- libcpp/directives.c | 19 +- libcpp/errors.c | 21 +- libcpp/expr.c | 176 ++-- libcpp/files.c | 24 +- libcpp/include/cpp-id-data.h | 6 + libcpp/include/cpplib.h | 15 +- libcpp/include/line-map.h | 634 +++++++++++-- libcpp/init.c | 3 +- libcpp/internal.h | 49 +- libcpp/lex.c | 112 ++- libcpp/line-map.c | 997 +++++++++++++++++-- libcpp/macro.c | 1188 ++++++++++++++++++++--- libcpp/traditional.c | 7 +- 46 files changed, 3376 insertions(+), 555 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/cpp/macro-exp-tracking-1.c create mode 100644 gcc/testsuite/gcc.dg/cpp/macro-exp-tracking-2.c create mode 100644 gcc/testsuite/gcc.dg/cpp/macro-exp-tracking-3.c create mode 100644 gcc/testsuite/gcc.dg/cpp/macro-exp-tracking-4.c create mode 100644 gcc/testsuite/gcc.dg/cpp/syshdr3.c create mode 100644 gcc/testsuite/gcc.dg/cpp/syshdr3.h -- Dodji