From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-352926-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 24790 invoked by alias); 1 Nov 2013 15:16:33 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 24777 invoked by uid 89); 1 Nov 2013 15:16:33 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_50,KAM_STOCKGEN,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2
X-HELO: mail-ie0-f178.google.com
Received: from mail-ie0-f178.google.com (HELO mail-ie0-f178.google.com) (209.85.223.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 01 Nov 2013 15:16:31 +0000
Received: by mail-ie0-f178.google.com with SMTP id x13so7631319ief.37        for <gcc-patches@gcc.gnu.org>; Fri, 01 Nov 2013 08:16:29 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:date         :message-id:subject:from:to:cc:content-type;        bh=YhS+MuV0n4jLe+0XykgDWxjxN8oHkSB/TQr/sG5hOEY=;        b=Rh1+0MDYlfvTGGvGQiXqWTdNwLUeFmCIHI6ZO26e/ApbdQC+fuAqTUwSVbYUG+S4jD         Y2qMEcsZEyO2N+eGgiG/ndxr10KEiXiJ89H8JxBogLfNKL0bmyUjjUK7SMvASx0El0Kf         EWRP0Z9t7G+gbUIq3tR20YpfMLeF9Y/RTWYa3ANMlMo8b4p/yOFcjClwT2pWX5QJArQS         XVRs8Ck3iFHVX47idGwJpky1izyEUdNCpnhl9vnR4SOafIAqjjpudkzXROaJTRGAlTc2         gCR/NwNuWAlYgiw4VccS9w5noixY6xV+NbdbsnwmNizDxpVQxY4iF+cV6ocEMUluBYnd         w2Vg==
X-Gm-Message-State: ALoCoQnr11ll3SjQGg/8yOev0FrRn589jj9AqBUG+jUxxuWRJL8RpNQu4UAwuGVsPf1YGcjIB1r2S1+Bz4fPah8K/7h8jNwt9UKMMKSM1uTF2MxOGo8G0OV76JantBQq3ZGyXQJzEcqrkNXxeqf2pLRs+jnjoIgc59ra3LQFPedyZsnj2WxkrwSxyzv/6VtahWWKzUtjlwmycWCNAukuMqNbuLtSsBqZCA==
MIME-Version: 1.0
X-Received: by 10.50.129.39 with SMTP id nt7mr2644675igb.13.1383318989760; Fri, 01 Nov 2013 08:16:29 -0700 (PDT)
Received: by 10.64.25.18 with HTTP; Fri, 1 Nov 2013 08:16:29 -0700 (PDT)
In-Reply-To: <1383249987-14379-1-git-send-email-dmalcolm@redhat.com>
References: <1383249987-14379-1-git-send-email-dmalcolm@redhat.com>
Date: Fri, 01 Nov 2013 15:16:00 -0000
Message-ID: <CAD_=9DQv_8bgSxpPNGyHTYsRgCvM77fQikY6eN_-MSHG6bmqbg@mail.gmail.com>
Subject: Re: [PATCH] Support multiline GTY markers in Doxygen filter; add unittests
From: Diego Novillo <dnovillo@google.com>
To: David Malcolm <dmalcolm@redhat.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=ISO-8859-1
X-IsSubscribed: yes
X-SW-Source: 2013-11/txt/msg00025.txt.bz2

On Thu, Oct 31, 2013 at 4:06 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> It's possible to run GCC's sources through Doxygen by setting
>         INPUT_FILTER           = contrib/filter_gcc_for_doxygen
> within contrib/gcc.doxy and invoking doxygen on the latter file.
>
> The script filters out various preprocessor constructs from GCC sources
> before Doxygen tries to parse them.
>
> However, as written, it works one line at-a-time, and thus can't cope
> with GTY markers that span multiple lines.  I added such a marker
> in r204170 (symtab_node_base), and I'd like to add more such markers
> (see http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02703.html).
>
> So the attached patch rewrites the filtering script to cope with multiline
> GTY markers.
>
> I'm not familiar with Perl, so I rewrote the script in Python.
>
> I expanded the regexes for readability, and added a unit-testing suite.
> I also tweaked how comments get layed with @verbatim
> to avoid inconsistent indentation between the first line and subsequent
> lines within a comment.
>
> Tested with Python 2.7; you can see a sample of the output (from my
> gimple-as-C++-inheritance working copy) at:
> http://dmalcolm.fedorapeople.org/gcc/2013-10-31/doxygen/html/structgimple__statement__base.html
>
> In particular, with this patch Doxygen is able to cope with the symtab
> GTY marker, and thus parse the symtab class hierarchy, giving the output
> viewable here:
> http://dmalcolm.fedorapeople.org/gcc/2013-10-31/doxygen/html/classsymtab__node__base.html
>
> OK for trunk?
>
> contrib/
>         * filter_gcc_for_doxygen: Use filter_params.py rather than
>         filter_params.pl.
>         * filter_params.pl: Delete in favor of...
>         * filter_params.py: New, porting the perl script to python in
>         order to cope with GTY markers that span multiple lines,
>         tweaking the layout of comments, and adding a test suite.
> ---
>  contrib/filter_gcc_for_doxygen |   2 +-
>  contrib/filter_params.pl       |  14 ---
>  contrib/filter_params.py       | 191 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 192 insertions(+), 15 deletions(-)
>  delete mode 100755 contrib/filter_params.pl
>  create mode 100644 contrib/filter_params.py
>
> diff --git a/contrib/filter_gcc_for_doxygen b/contrib/filter_gcc_for_doxygen
> index 3787eeb..ca1db31 100755
> --- a/contrib/filter_gcc_for_doxygen
> +++ b/contrib/filter_gcc_for_doxygen
> @@ -8,5 +8,5 @@
>  # process is put on stdout.
>
>  dir=`dirname $0`
> -perl $dir/filter_params.pl < $1 | perl $dir/filter_knr2ansi.pl
> +python $dir/filter_params.py $1 | perl $dir/filter_knr2ansi.pl
>  exit 0
> diff --git a/contrib/filter_params.pl b/contrib/filter_params.pl
> deleted file mode 100755
> index 22dae6c..0000000
> --- a/contrib/filter_params.pl
> +++ /dev/null
> @@ -1,14 +0,0 @@
> -#!/usr/bin/perl
> -
> -# Filters out some of the #defines used throughout the GCC sources:
> -# - GTY(()) marks declarations for gengtype.c
> -# - PARAMS(()) is used for K&R compatibility. See ansidecl.h.
> -
> -while (<>) {
> -    s/^\/\* /\/\*\* \@verbatim /;
> -    s/\*\// \@endverbatim \*\//;
> -    s/GTY[ \t]*\(\(.*\)\)//g;
> -    s/[ \t]ATTRIBUTE_UNUSED//g;
> -    s/PARAMS[ \t]*\(\((.*?)\)\)/\($1\)/sg;
> -    print;
> -}
> diff --git a/contrib/filter_params.py b/contrib/filter_params.py
> new file mode 100644
> index 0000000..d5092f6
> --- /dev/null
> +++ b/contrib/filter_params.py
> @@ -0,0 +1,191 @@
> +#!/usr/bin/python
> +"""
> +Filters out some of the #defines used throughout the GCC sources:
> +- GTY(()) marks declarations for gengtype.c
> +- PARAMS(()) is used for K&R compatibility. See ansidecl.h.
> +
> +When passed one or more filenames, acts on those files and prints the
> +results to stdout.
> +
> +When run without a filename, runs a unit-testing suite.
> +"""
> +import re
> +import sys
> +import unittest
> +
> +# Optional whitespace
> +opt_ws = '\s*'
> +
> +def filter_src(text):
> +    """
> +    str -> str.  We operate on the whole of the source file at once
> +    (rather than individual lines) so that we can have multiline
> +    regexes.
> +    """
> +
> +    # First, convert tabs to spaces so that we can line things up
> +    # sanely.
> +    text = text.expandtabs(8)

Does it really matter?  I thought doxygen reformatted output anyway.

> +
> +    # Convert C comments from GNU coding convention of:
> +    #    /* FIRST_LINE
> +    #       NEXT_LINE
> +    #       FINAL_LINE.  */
> +    # to:
> +    #    /** @verbatim
> +    #       FIRST_LINE
> +    #       NEXT_LINE
> +    #       FINAL_LINE.  @endverbatim */
> +    # so that doxygen will parse them, and so the first line
> +    # lines up correctly with subsequent lines.
> +    # Single-line comments turn from:
> +    #    /* TEXT.  */
> +    # into:
> +    #    /** @verbatim
> +    #        TEXT.  @endverbatim */

Oh, for @verbatim.  But, why @verbatim?  One trick we could do here is
recognize ALL CAPS parameters and turn them into \p PARAM. Later on,
we just emit \param.  Though I guess it would not be easy to pick up
the description. Anyway, we can think about this for later.

At the same time, I would like to encourage the use of doxygen markers
in the code. If we start adding all these @verbatim markers, those
markers will be canceled out. In fact, I would like to set up some
expectations for doxygen mark ups in the coding guidelines themselves.


> +
> +    text = re.sub(r'^([ \t]*)/\*+ ',
> +                  (r'\1/** @verbatim\n'
> +                   r'\1   '),
> +                  text,
> +                  flags=re.MULTILINE)
> +    text = re.sub(r'\*+/',
> +                  r'@endverbatim */',
> +                  text)
> +
> +    # Remove GTY markings (potentially multiline ones):
> +    text = re.sub('GTY' + opt_ws + r'\(\(.*?\)\)\s+',
> +                  '',
> +                  text,
> +                  flags=(re.MULTILINE|re.DOTALL))
> +
> +    # Strip out 'ATTRIBUTE_UNUSED'
> +    text = re.sub('\sATTRIBUTE_UNUSED',
> +                  '',
> +                  text)
> +
> +    # PARAMS(()) is used for K&R compatibility. See ansidecl.h.
> +    text = re.sub('PARAMS' + opt_ws + r'\(\((.*?)\)\)',
> +                  r'(\1)',
> +                  text)
> +
> +    return text
> +
> +class FilteringTests(unittest.TestCase):
> +    '''
> +    Unit tests for filter_src.
> +    '''
> +    def assert_filters_to(self, src_input, expected_result):
> +        # assertMultiLineEqual was added to unittest in 2.7/3.1
> +        if hasattr(self, 'assertMultiLineEqual'):
> +            assertion = self.assertMultiLineEqual
> +        else:
> +            assertion = self.assertEqual
> +        assertion(expected_result, filter_src(src_input))
> +
> +    def test_comment_example(self):
> +        self.assert_filters_to(
> +            ('    /* FIRST_LINE\n'
> +             '       NEXT_LINE\n'
> +             '       FINAL_LINE.  */\n'),
> +            ('    /** @verbatim\n'
> +             '       FIRST_LINE\n'
> +             '       NEXT_LINE\n'
> +             '       FINAL_LINE.  @endverbatim */\n'))
> +
> +    def test_oneliner_comment(self):
> +        self.assert_filters_to(
> +            '/* Returns the string representing CLASS.  */\n',
> +            ('/** @verbatim\n'
> +             '   Returns the string representing CLASS.  @endverbatim */\n'))
> +
> +    def test_multiline_comment_without_initial_indent(self):
> +        self.assert_filters_to(
> +            ('/* The thread-local storage model associated with a given VAR_DECL\n'
> +             "   or SYMBOL_REF.  This isn't used much, but both trees and RTL refer\n"
> +             "   to it, so it's here.  */\n"),
> +            ('/** @verbatim\n'
> +             '   The thread-local storage model associated with a given VAR_DECL\n'
> +             "   or SYMBOL_REF.  This isn't used much, but both trees and RTL refer\n"
> +             "   to it, so it's here.  @endverbatim */\n"))
> +
> +    def test_multiline_comment_with_initial_indent(self):
> +         self.assert_filters_to(
> +            ('  /* Set once the definition was analyzed.  The list of references and\n'
> +             '     other properties are built during analysis.  */\n'),
> +            ('  /** @verbatim\n'
> +             '     Set once the definition was analyzed.  The list of references and\n'
> +             '     other properties are built during analysis.  @endverbatim */\n'))
> +
> +    def test_multiline_comment_with_tabs(self):
> +        # Ensure that multiline comments containing tabs look sane, in
> +        # this case, one with initial indentation.
> +        self.assert_filters_to(
> +            ('  /* All the anchor SYMBOL_REFs used to address these objects, sorted\n'
> +             '     in order of increasing offset, and then increasing TLS model.\n'
> +             '     The following conditions will hold for each element X in this vector:\n'
> +             '\n'
> +             '\t SYMBOL_REF_HAS_BLOCK_INFO_P (X)\n'
> +             '\t SYMBOL_REF_ANCHOR_P (X)\n'
> +             '\t SYMBOL_REF_BLOCK (X) == [address of this structure]\n'
> +             '\t SYMBOL_REF_BLOCK_OFFSET (X) >= 0.  */\n'
> +             '  vec<rtx, va_gc> *anchors;\n'),
> +            ('  /** @verbatim\n'
> +             '     All the anchor SYMBOL_REFs used to address these objects, sorted\n'
> +             '     in order of increasing offset, and then increasing TLS model.\n'
> +             '     The following conditions will hold for each element X in this vector:\n'
> +             '\n'
> +             '         SYMBOL_REF_HAS_BLOCK_INFO_P (X)\n'
> +             '         SYMBOL_REF_ANCHOR_P (X)\n'
> +             '         SYMBOL_REF_BLOCK (X) == [address of this structure]\n'
> +             '         SYMBOL_REF_BLOCK_OFFSET (X) >= 0.  @endverbatim */\n'
> +             '  vec<rtx, va_gc> *anchors;\n'))
> +
> +    def test_simple_GTY(self):
> +        # Ensure that a simple GTY is filtered out.
> +        self.assert_filters_to(
> +            ('typedef struct GTY(()) alias_pair {\n'
> +             '  tree decl;\n'
> +             '  tree target;\n'
> +             '} alias_pair;\n'),
> +            ('typedef struct alias_pair {\n'
> +             '  tree decl;\n'
> +             '  tree target;\n'
> +             '} alias_pair;\n'))
> +
> +    def test_multiline_GTY(self):
> +        # Ensure that a multiline GTY is filtered out.
> +        self.assert_filters_to(
> +            ('class GTY((desc ("%h.type"), tag ("SYMTAB_SYMBOL"),\n'
> +             '\t   chain_next ("%h.next"), chain_prev ("%h.previous")))\n'
> +             '  symtab_node_base\n'
> +             '{\n'),
> +            ('class symtab_node_base\n'
> +             '{\n'))
> +
> +    def test_ATTRIBUTE_UNUSED(self):
> +        # Ensure that ATTRIBUTE_UNUSED is filtered out.
> +        self.assert_filters_to(
> +            ('static void\n'
> +             'record_set (rtx dest, const_rtx set, void *data ATTRIBUTE_UNUSED)\n'
> +             '{\n'),
> +            ('static void\n'
> +             'record_set (rtx dest, const_rtx set, void *data)\n'
> +             '{\n'))
> +
> +    def test_PARAMS(self):
> +        self.assert_filters_to(
> +            'char *strcpy PARAMS ((char *dest, char *source));\n',
> +            'char *strcpy (char *dest, char *source);\n')
> +
> +def act_on_files(argv):
> +    for filename in argv[1:]:
> +        with open(filename) as f:
> +            text = f.read()
> +            print(filter_src(text))
> +
> +if __name__ == '__main__':
> +    if len(sys.argv) > 1:
> +        act_on_files(sys.argv)

> +    else:
> +        unittest.main()

TESTS!  Thanks.  This is so very refreshing :)

Other than the @verbatim markers, the rest looks fine.


Diego.