From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24790 invoked by alias); 1 Nov 2013 15:16:33 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 24777 invoked by uid 89); 1 Nov 2013 15:16:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_50,KAM_STOCKGEN,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2 X-HELO: mail-ie0-f178.google.com Received: from mail-ie0-f178.google.com (HELO mail-ie0-f178.google.com) (209.85.223.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 01 Nov 2013 15:16:31 +0000 Received: by mail-ie0-f178.google.com with SMTP id x13so7631319ief.37 for ; Fri, 01 Nov 2013 08:16:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=YhS+MuV0n4jLe+0XykgDWxjxN8oHkSB/TQr/sG5hOEY=; b=Rh1+0MDYlfvTGGvGQiXqWTdNwLUeFmCIHI6ZO26e/ApbdQC+fuAqTUwSVbYUG+S4jD Y2qMEcsZEyO2N+eGgiG/ndxr10KEiXiJ89H8JxBogLfNKL0bmyUjjUK7SMvASx0El0Kf EWRP0Z9t7G+gbUIq3tR20YpfMLeF9Y/RTWYa3ANMlMo8b4p/yOFcjClwT2pWX5QJArQS XVRs8Ck3iFHVX47idGwJpky1izyEUdNCpnhl9vnR4SOafIAqjjpudkzXROaJTRGAlTc2 gCR/NwNuWAlYgiw4VccS9w5noixY6xV+NbdbsnwmNizDxpVQxY4iF+cV6ocEMUluBYnd w2Vg== X-Gm-Message-State: ALoCoQnr11ll3SjQGg/8yOev0FrRn589jj9AqBUG+jUxxuWRJL8RpNQu4UAwuGVsPf1YGcjIB1r2S1+Bz4fPah8K/7h8jNwt9UKMMKSM1uTF2MxOGo8G0OV76JantBQq3ZGyXQJzEcqrkNXxeqf2pLRs+jnjoIgc59ra3LQFPedyZsnj2WxkrwSxyzv/6VtahWWKzUtjlwmycWCNAukuMqNbuLtSsBqZCA== MIME-Version: 1.0 X-Received: by 10.50.129.39 with SMTP id nt7mr2644675igb.13.1383318989760; Fri, 01 Nov 2013 08:16:29 -0700 (PDT) Received: by 10.64.25.18 with HTTP; Fri, 1 Nov 2013 08:16:29 -0700 (PDT) In-Reply-To: <1383249987-14379-1-git-send-email-dmalcolm@redhat.com> References: <1383249987-14379-1-git-send-email-dmalcolm@redhat.com> Date: Fri, 01 Nov 2013 15:16:00 -0000 Message-ID: Subject: Re: [PATCH] Support multiline GTY markers in Doxygen filter; add unittests From: Diego Novillo To: David Malcolm Cc: gcc-patches Content-Type: text/plain; charset=ISO-8859-1 X-IsSubscribed: yes X-SW-Source: 2013-11/txt/msg00025.txt.bz2 On Thu, Oct 31, 2013 at 4:06 PM, David Malcolm wrote: > It's possible to run GCC's sources through Doxygen by setting > INPUT_FILTER = contrib/filter_gcc_for_doxygen > within contrib/gcc.doxy and invoking doxygen on the latter file. > > The script filters out various preprocessor constructs from GCC sources > before Doxygen tries to parse them. > > However, as written, it works one line at-a-time, and thus can't cope > with GTY markers that span multiple lines. I added such a marker > in r204170 (symtab_node_base), and I'd like to add more such markers > (see http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02703.html). > > So the attached patch rewrites the filtering script to cope with multiline > GTY markers. > > I'm not familiar with Perl, so I rewrote the script in Python. > > I expanded the regexes for readability, and added a unit-testing suite. > I also tweaked how comments get layed with @verbatim > to avoid inconsistent indentation between the first line and subsequent > lines within a comment. > > Tested with Python 2.7; you can see a sample of the output (from my > gimple-as-C++-inheritance working copy) at: > http://dmalcolm.fedorapeople.org/gcc/2013-10-31/doxygen/html/structgimple__statement__base.html > > In particular, with this patch Doxygen is able to cope with the symtab > GTY marker, and thus parse the symtab class hierarchy, giving the output > viewable here: > http://dmalcolm.fedorapeople.org/gcc/2013-10-31/doxygen/html/classsymtab__node__base.html > > OK for trunk? > > contrib/ > * filter_gcc_for_doxygen: Use filter_params.py rather than > filter_params.pl. > * filter_params.pl: Delete in favor of... > * filter_params.py: New, porting the perl script to python in > order to cope with GTY markers that span multiple lines, > tweaking the layout of comments, and adding a test suite. > --- > contrib/filter_gcc_for_doxygen | 2 +- > contrib/filter_params.pl | 14 --- > contrib/filter_params.py | 191 +++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 192 insertions(+), 15 deletions(-) > delete mode 100755 contrib/filter_params.pl > create mode 100644 contrib/filter_params.py > > diff --git a/contrib/filter_gcc_for_doxygen b/contrib/filter_gcc_for_doxygen > index 3787eeb..ca1db31 100755 > --- a/contrib/filter_gcc_for_doxygen > +++ b/contrib/filter_gcc_for_doxygen > @@ -8,5 +8,5 @@ > # process is put on stdout. > > dir=`dirname $0` > -perl $dir/filter_params.pl < $1 | perl $dir/filter_knr2ansi.pl > +python $dir/filter_params.py $1 | perl $dir/filter_knr2ansi.pl > exit 0 > diff --git a/contrib/filter_params.pl b/contrib/filter_params.pl > deleted file mode 100755 > index 22dae6c..0000000 > --- a/contrib/filter_params.pl > +++ /dev/null > @@ -1,14 +0,0 @@ > -#!/usr/bin/perl > - > -# Filters out some of the #defines used throughout the GCC sources: > -# - GTY(()) marks declarations for gengtype.c > -# - PARAMS(()) is used for K&R compatibility. See ansidecl.h. > - > -while (<>) { > - s/^\/\* /\/\*\* \@verbatim /; > - s/\*\// \@endverbatim \*\//; > - s/GTY[ \t]*\(\(.*\)\)//g; > - s/[ \t]ATTRIBUTE_UNUSED//g; > - s/PARAMS[ \t]*\(\((.*?)\)\)/\($1\)/sg; > - print; > -} > diff --git a/contrib/filter_params.py b/contrib/filter_params.py > new file mode 100644 > index 0000000..d5092f6 > --- /dev/null > +++ b/contrib/filter_params.py > @@ -0,0 +1,191 @@ > +#!/usr/bin/python > +""" > +Filters out some of the #defines used throughout the GCC sources: > +- GTY(()) marks declarations for gengtype.c > +- PARAMS(()) is used for K&R compatibility. See ansidecl.h. > + > +When passed one or more filenames, acts on those files and prints the > +results to stdout. > + > +When run without a filename, runs a unit-testing suite. > +""" > +import re > +import sys > +import unittest > + > +# Optional whitespace > +opt_ws = '\s*' > + > +def filter_src(text): > + """ > + str -> str. We operate on the whole of the source file at once > + (rather than individual lines) so that we can have multiline > + regexes. > + """ > + > + # First, convert tabs to spaces so that we can line things up > + # sanely. > + text = text.expandtabs(8) Does it really matter? I thought doxygen reformatted output anyway. > + > + # Convert C comments from GNU coding convention of: > + # /* FIRST_LINE > + # NEXT_LINE > + # FINAL_LINE. */ > + # to: > + # /** @verbatim > + # FIRST_LINE > + # NEXT_LINE > + # FINAL_LINE. @endverbatim */ > + # so that doxygen will parse them, and so the first line > + # lines up correctly with subsequent lines. > + # Single-line comments turn from: > + # /* TEXT. */ > + # into: > + # /** @verbatim > + # TEXT. @endverbatim */ Oh, for @verbatim. But, why @verbatim? One trick we could do here is recognize ALL CAPS parameters and turn them into \p PARAM. Later on, we just emit \param. Though I guess it would not be easy to pick up the description. Anyway, we can think about this for later. At the same time, I would like to encourage the use of doxygen markers in the code. If we start adding all these @verbatim markers, those markers will be canceled out. In fact, I would like to set up some expectations for doxygen mark ups in the coding guidelines themselves. > + > + text = re.sub(r'^([ \t]*)/\*+ ', > + (r'\1/** @verbatim\n' > + r'\1 '), > + text, > + flags=re.MULTILINE) > + text = re.sub(r'\*+/', > + r'@endverbatim */', > + text) > + > + # Remove GTY markings (potentially multiline ones): > + text = re.sub('GTY' + opt_ws + r'\(\(.*?\)\)\s+', > + '', > + text, > + flags=(re.MULTILINE|re.DOTALL)) > + > + # Strip out 'ATTRIBUTE_UNUSED' > + text = re.sub('\sATTRIBUTE_UNUSED', > + '', > + text) > + > + # PARAMS(()) is used for K&R compatibility. See ansidecl.h. > + text = re.sub('PARAMS' + opt_ws + r'\(\((.*?)\)\)', > + r'(\1)', > + text) > + > + return text > + > +class FilteringTests(unittest.TestCase): > + ''' > + Unit tests for filter_src. > + ''' > + def assert_filters_to(self, src_input, expected_result): > + # assertMultiLineEqual was added to unittest in 2.7/3.1 > + if hasattr(self, 'assertMultiLineEqual'): > + assertion = self.assertMultiLineEqual > + else: > + assertion = self.assertEqual > + assertion(expected_result, filter_src(src_input)) > + > + def test_comment_example(self): > + self.assert_filters_to( > + (' /* FIRST_LINE\n' > + ' NEXT_LINE\n' > + ' FINAL_LINE. */\n'), > + (' /** @verbatim\n' > + ' FIRST_LINE\n' > + ' NEXT_LINE\n' > + ' FINAL_LINE. @endverbatim */\n')) > + > + def test_oneliner_comment(self): > + self.assert_filters_to( > + '/* Returns the string representing CLASS. */\n', > + ('/** @verbatim\n' > + ' Returns the string representing CLASS. @endverbatim */\n')) > + > + def test_multiline_comment_without_initial_indent(self): > + self.assert_filters_to( > + ('/* The thread-local storage model associated with a given VAR_DECL\n' > + " or SYMBOL_REF. This isn't used much, but both trees and RTL refer\n" > + " to it, so it's here. */\n"), > + ('/** @verbatim\n' > + ' The thread-local storage model associated with a given VAR_DECL\n' > + " or SYMBOL_REF. This isn't used much, but both trees and RTL refer\n" > + " to it, so it's here. @endverbatim */\n")) > + > + def test_multiline_comment_with_initial_indent(self): > + self.assert_filters_to( > + (' /* Set once the definition was analyzed. The list of references and\n' > + ' other properties are built during analysis. */\n'), > + (' /** @verbatim\n' > + ' Set once the definition was analyzed. The list of references and\n' > + ' other properties are built during analysis. @endverbatim */\n')) > + > + def test_multiline_comment_with_tabs(self): > + # Ensure that multiline comments containing tabs look sane, in > + # this case, one with initial indentation. > + self.assert_filters_to( > + (' /* All the anchor SYMBOL_REFs used to address these objects, sorted\n' > + ' in order of increasing offset, and then increasing TLS model.\n' > + ' The following conditions will hold for each element X in this vector:\n' > + '\n' > + '\t SYMBOL_REF_HAS_BLOCK_INFO_P (X)\n' > + '\t SYMBOL_REF_ANCHOR_P (X)\n' > + '\t SYMBOL_REF_BLOCK (X) == [address of this structure]\n' > + '\t SYMBOL_REF_BLOCK_OFFSET (X) >= 0. */\n' > + ' vec *anchors;\n'), > + (' /** @verbatim\n' > + ' All the anchor SYMBOL_REFs used to address these objects, sorted\n' > + ' in order of increasing offset, and then increasing TLS model.\n' > + ' The following conditions will hold for each element X in this vector:\n' > + '\n' > + ' SYMBOL_REF_HAS_BLOCK_INFO_P (X)\n' > + ' SYMBOL_REF_ANCHOR_P (X)\n' > + ' SYMBOL_REF_BLOCK (X) == [address of this structure]\n' > + ' SYMBOL_REF_BLOCK_OFFSET (X) >= 0. @endverbatim */\n' > + ' vec *anchors;\n')) > + > + def test_simple_GTY(self): > + # Ensure that a simple GTY is filtered out. > + self.assert_filters_to( > + ('typedef struct GTY(()) alias_pair {\n' > + ' tree decl;\n' > + ' tree target;\n' > + '} alias_pair;\n'), > + ('typedef struct alias_pair {\n' > + ' tree decl;\n' > + ' tree target;\n' > + '} alias_pair;\n')) > + > + def test_multiline_GTY(self): > + # Ensure that a multiline GTY is filtered out. > + self.assert_filters_to( > + ('class GTY((desc ("%h.type"), tag ("SYMTAB_SYMBOL"),\n' > + '\t chain_next ("%h.next"), chain_prev ("%h.previous")))\n' > + ' symtab_node_base\n' > + '{\n'), > + ('class symtab_node_base\n' > + '{\n')) > + > + def test_ATTRIBUTE_UNUSED(self): > + # Ensure that ATTRIBUTE_UNUSED is filtered out. > + self.assert_filters_to( > + ('static void\n' > + 'record_set (rtx dest, const_rtx set, void *data ATTRIBUTE_UNUSED)\n' > + '{\n'), > + ('static void\n' > + 'record_set (rtx dest, const_rtx set, void *data)\n' > + '{\n')) > + > + def test_PARAMS(self): > + self.assert_filters_to( > + 'char *strcpy PARAMS ((char *dest, char *source));\n', > + 'char *strcpy (char *dest, char *source);\n') > + > +def act_on_files(argv): > + for filename in argv[1:]: > + with open(filename) as f: > + text = f.read() > + print(filter_src(text)) > + > +if __name__ == '__main__': > + if len(sys.argv) > 1: > + act_on_files(sys.argv) > + else: > + unittest.main() TESTS! Thanks. This is so very refreshing :) Other than the @verbatim markers, the rest looks fine. Diego.