From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9939 invoked by alias); 7 Mar 2016 16:34:02 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 9744 invoked by uid 89); 7 Mar 2016 16:34:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=richard.guenther@gmail.com, richardguenthergmailcom, supplied, sophisticated X-Spam-User: qpsmtpd, 2 recipients X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Mon, 07 Mar 2016 16:33:58 +0000 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (Postfix) with ESMTPS id B51C9C0006F5; Mon, 7 Mar 2016 16:33:56 +0000 (UTC) Received: from vpn-229-7.phx2.redhat.com (vpn-229-7.phx2.redhat.com [10.3.229.7]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u27GXt7p010548; Mon, 7 Mar 2016 11:33:55 -0500 Message-ID: <1457368435.9813.68.camel@redhat.com> Subject: Re: [gimplefe] [gsoc16] Gimple Front End Project From: David Malcolm To: Richard Biener , Prasad Ghangal Cc: Diego Novillo , gcc Mailing List , sandeep@gcc.gnu.org Date: Mon, 07 Mar 2016 16:34:00 -0000 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-03/txt/msg00066.txt.bz2 On Mon, 2016-03-07 at 13:26 +0100, Richard Biener wrote: > On Mon, Mar 7, 2016 at 7:27 AM, Prasad Ghangal < > prasad.ghangal@gmail.com> wrote: > > On 6 March 2016 at 21:13, Richard Biener < > > richard.guenther@gmail.com> wrote: > > > > > > I'll be willing to mentor this. Though I'd rather have us > > > starting from scratch and look at having a C-like input language, > > > even piggy-backing on the C frontend maybe. > > > > That's great. I would like to know scope of the project for gsoc so > > that I can start preparing for proposal. > > In my view (this may require discussion) the GIMPLE FE provides a way > to do better unit-testing in GCC as in > feeding a GIMPLE pass with specific IL to work with rather than > trying > to get that into proper shape via a C > testcase. Especially making the input IL into that pass stable over > the development of GCC is hard. I've been looking at the gimple FE recently, at the above is precisely my own motivation. Much of our current testing involves taking a C file, running the pass pipeline over it, and then verifying properties of one specific pass, and this worries me, since all of the intervening passes can change, and thus can change the effective input seen by the pass we were hoping to test, invalidating the test case. As part of the "unit tests" idea: v1: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00765.html v2: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01224.html v3: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02947.html I attempted to write unit tests for specific passes. The closest I got was this, which built the function in tree form, then gimplified it, then expanded it: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02954.html Whilst writing this I attempted to build test cases by constructing IR directly via API calls, but it became clear to me that that approach isn't a good one: it's very verbose, and would tie us to the internal API. (I think the above patch kit has merit for testing things other than passes, as a "-fself-test" option, which I want to pursue for gcc 7). So for testing specific passes, I'd much rather have an input format for testing individual passes that: * can be easily generated by GCC from real test cases * ability to turn into unit tests, which implies: * human-readable and editable * compatibility and stability: can load gcc 7 test cases into gcc 8; have e.g. gcc 6 generate test cases * roundtrippable: can load the format, then save out the IR, and get the same file, for at least some subset of the format (consider e.g. location data: explicit location data can be roundtripped, implicit location data not so much). ...which suggests that we'd want to use gimple dumps as the input format to a test framework - which leads naturally to the idea of a gimple frontend. I'm thinking of something like a testsuite/gimple.dg subdirectory full of gimple dumps. We could have a new kind of diagnostic, a "remark", with DejaGnu directives to detect for it e.g. a_5 = b_1 * c_2; /* { dg-remark "propagated constant; became a_5 = b_1 * 3" } */ or whatnot. I see our dumpfiles as being something aimed at us, whereas remarks could be aimed at sophisticated end-users; they would be enabled on a per-pass basis, or perhaps for certain topics (e.g. vectorization) and could look something like: foo.c:27:10: remark: loop is not vectorizable since the iterator can be modified... [-Rvectorization] foo.c.35:20: ...here or similar, where the user passed say "-Rvectorization" as a command line option to request more info on vectorization, and our test suites could do this. As a thought-experiment, consider that as well as cc1 etc, we could have an executable for every pass. Then you could run individual passes e.g.: $ run-vrp foo.gimple -o bar.gimple $ run-switchconv quux.gimple -o baz.gimple etc. (I'm not convinced that it makes sense to split things up so much, but I find it useful for inspiration, for getting ideas about the things that we could do if we had that level of modularity, especially from a testing perpective). FWIW I started looking at the existing gimple FE branch last week. It implements a parser for a tuple syntax, rather than the C-like syntax. The existing parser doeesn't actually generate any gimple IR internally, it just syntax-checks the input file. Building IR internally seemed like a good next step, since I'm sure there are lots of state issues to sort out. So I started looking at implementing a testsuite/gimple.dg/roundtrip subdirectory: the idea is that this would be full of gimple dumps; the parser would read them in, and then (with a flag supplied by roundtrip.exp) would write them out, and roundtrip.exp would compare input to output and require them to be identical. I got as far as (partially) building a GIMPLE_ASSIGN internally when parsing a file containing one. That said, I don't care for the tuple syntax in the existing gimple dump format; I'd prefer a C-like syntax. My thought was to hack up the existing gimple FE branch to change the parser to accept a more C-like syntax, but... > A C-like syntax is prefered, a syntax that is also valid C would be > even more prefered so that you can > write "torture" testcases that have fixed IL into a specific pass but > also run as regular testcases through > the whole optimization pipeline. > > Piggy-backing on the C frontend makes it possible to leave all the > details of types and declarations > and global initializers as plain C while interpreting function bodies > as "GIMPLE" when leaving the frontend. ...it sounds like you have a radically different implementation idea, in which the gimple frontend effectively becomes part of the C frontend, with some different behaviors. > I expect that in the process of completing GIMPLE IL features you'll > have to add a few GNU C extensions, > mostly for features used by Ada (self-referential types come to my > mind). > > I expect the first thing the project needs to do is add the "tooling" > side, signalling the C frontend it > should accept GIMPLE (add a -fgimple flag) plus adding a way to input > IL into a specific pass > (-ftest= or a function attribute so it affects only a specific > function so you can write a testcase > driver in plain C and have the actual testcase in a single function). > The first actual frontend > implementation challenge will then be emitting GIMPLE / CFG / SSA > directly which I'd do in the > "genericization" phase. Adjustments to how the C FE handles > expressions should be made as well, > for example I'd remove any promotions done, letting it only literally > parse expressions. Maybe > statement and expression parsing should be forked directly to not > make > the C FEs code too unwieldely > but as said I'd keep type and decl parsing and its data structures as > is. > > Eventually the dump file format used by GCCs GIMPLE dumps should be > changed to be valid > GIMPLE FE inputs (and thus valid C inputs). Adjustments mainly need > to be done to basic-block > labels and PHI nodes. > > I'd first not think about our on-the-side data too much initially > (range info, points-to info, etc). > > Richard. Hope this is constructive Dave