From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31309 invoked by alias); 30 Dec 2019 22:58:16 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 31299 invoked by uid 89); 30 Dec 2019 22:58:16 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.1 spammy=ideas, basis, plenty, quality X-HELO: esa3.mentor.iphmx.com Received: from esa3.mentor.iphmx.com (HELO esa3.mentor.iphmx.com) (68.232.137.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Dec 2019 22:58:15 +0000 IronPort-SDR: TKq3KQMp5cqy9Psdlx6mu66SWuz0QI/7f22KFnr2h/vkCpgXUefrHYecHqJNT/F41J2ACvh9qA 7W4vmU6e7m8QKjLaBEBJonXMYr5evYt+nfdKaJj+U/kYhiH0zyjQyVQaRiid95zvXj66iu8DDF do6yU2omJFLNm2mR8eGI/5N2mWLQD9fYBmj03n+k95HrJtUcnRJY6n75g9lQAYckjBvsBqP0wF riSCk+LOOSne8h5U3DqCYppRh2jleHVRBXVzOZcZpdg5ObpjFC1wsbB9Vb5EtxrH84hfD+3Sv2 TEA= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 30 Dec 2019 14:58:13 -0800 IronPort-SDR: oMjfWm2ip5tLtw8OUn/UiY29PigF5QeXgTRbcNbe/i8VQmhsRksX7wFXSXJN/jIz4bv6jwBDA0 FasGpL5XcrrIlemJRYDec9x+maTNOsjfv361ahlCxjs+YajOEl1APAakcq4c7WStNgwp2Dz7fY Lg5gDugsZ8RU6dXt+CPb7FuA7Clqv4oZtLQGVKQx+SgCXt4zISPY2MKE1zmhHQqf7miPnIG2kU s1OGHnfuJGhUWsaFK4aTrxBwLuUUEpN5bndq5I17mNtWJb3/9L+0dCX+WFz5ZhApthgAQ9M39y ye4= Date: Mon, 30 Dec 2019 22:58:00 -0000 From: Joseph Myers To: Segher Boessenkool CC: "Richard Earnshaw (lists)" , "Eric S. Raymond" , Maxim Kuvyrkov , GCC Development , Alexandre Oliva , Jeff Law , Mark Wielaard , Jakub Jelinek , Subject: Re: Proposal for the transition timetable for the move to GIT In-Reply-To: <20191230223651.GG3191@gate.crashing.org> Message-ID: References: <20191226111633.GJ10088@tucnak> <5DCEA32B-3E36-4400-B931-9F4E2A8F3FA5@linaro.org> <155B5BFD-6ECF-4EBF-A38C-D6DD178FB497@linaro.org> <20191229224740.GB51787@thyrsus.com> <20191229231342.GF3191@gate.crashing.org> <357e6bf2-55c5-fabc-19e7-457539594258@arm.com> <20191230223651.GG3191@gate.crashing.org> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Return-Path: joseph@codesourcery.com X-SW-Source: 2019-12/txt/msg00541.txt.bz2 On Mon, 30 Dec 2019, Segher Boessenkool wrote: > To make it not be super much work, I'd do the second option: better > heuristics. Those in Maxim's conversion have been great since over half > a year, you could borrow some, or peek for inspiration? Actually, comparing authors between the two conversions shows plenty of places where the more aggressive ChangeLog extraction in Maxim's conversion has produced less good attributions than reposurgeon (e.g. attributing merges to some random author from a ChangeLog modified in the merge, rather than to the committer of the merge, or attributing fixes in a ChangeLog to the author of a random entry that got fixed), as well as places where it's simply failed to extract an author from a ChangeLog that reposurgeon has extracted. So for "great", read "have some good ideas to learn from, but plenty of places with problems as well". I'm working on more detailed comparison of authors with some more heuristics to help identify the most interesting cases for manual inspection (those where it's more likely Maxim's heuristics are finding valid authors reposurgeon didn't) and separate those from cases where different subjective choices were made (e.g. of how to assign an author when one person backports another's patch, or multi-author commits where one conversion chose one author as the main one and the other conversion chose the other author). > If you guys want to ever finish, you'll need to drop the quest for > perfection, because this leads to a) much more work, and b) worse quality > in the end. To me, that indicates that using a conversion tool that is conservative in its heuristics, and then selectively applying improvements to the extent they can be done safely with manual review in a reasonable time, is better than applying a conversion tool with more aggressive heuristics. The issues with the reposurgeon conversion listed in Maxim's last comments were of the form "reposurgeon is being conservative in how it generates metadata from SVN information". I think that's a very good basis for adding on a limited set of safe improvements to authors and commit messages that can be done reasonably soon and then doing the final conversion with reposurgeon. -- Joseph S. Myers joseph@codesourcery.com