From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 83550 invoked by alias); 25 Dec 2019 11:03:00 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 83541 invoked by uid 89); 25 Dec 2019 11:03:00 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-5.8 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE,SPF_PASS autolearn=ham version=3.3.1 spammy=reasons, certainly, brief, mirror X-HELO: smtp.ispras.ru Received: from winnie.ispras.ru (HELO smtp.ispras.ru) (83.149.199.91) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 25 Dec 2019 11:02:57 +0000 Received: from [10.10.3.54] (utre4ko.intra.ispras.ru [10.10.3.54]) by smtp.ispras.ru (Postfix) with ESMTP id 877DD201D0; Wed, 25 Dec 2019 14:02:53 +0300 (MSK) From: Roman Zhuykov Subject: Re: Test GCC conversion with reposurgeon available To: Segher Boessenkool , Joseph Myers Cc: gcc@gcc.gnu.org, Alexander Monakov , Maxim Kuvyrkov , esr@thyrsus.com References: <28ca5dbe-a29e-7f1e-a599-d80709643421@ispras.ru> <20191224155543.GH4505@gate.crashing.org> <20191224181444.GJ4505@gate.crashing.org> Message-ID: <279bf8dd-8725-c3fa-0def-130b3d128509@ispras.ru> Date: Wed, 25 Dec 2019 11:03:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20191224181444.GJ4505@gate.crashing.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2019-12/txt/msg00400.txt.bz2 First of all thanks to everyone who spent time making the conversion better and better. Here is my 2c, I have studied a little my colleagues trunk history in Maxim's gcc-pretty vs gcc-reposurgeon-5b. 1) In gcc-pretty timezone info is lost in both author/commiter date (keeping UTC time correct, certainly). Examples are r278990 and r289989. Probably git-svn causes this, current read-only git mirror is also without timezone. Not sure we need that info, but reposurgeon is more correct here. 2) Some thoughts about script for summarizing commit log messages: 2a) Why r143753 and r150680 not have "re PR..." summary instead of "[multiple changes]" ? 2b) On the contrary r155892 have to mention two PRs, even "[multiple changes]" is better here, IMHO. 2c) In r130050 and r155902 we have "Rename too ... " in summary, not sure how to make it better. 2d) r146882 can have better summary if we somehow organize ChangeLog priority (gcc/ChangeLog is more important that testsuite one). 3) About author emails, see below 24.12.2019 21:14, Segher Boessenkool wrote: > On Tue, Dec 24, 2019 at 05:16:54PM +0000, Joseph Myers wrote: >> On Tue, 24 Dec 2019, Segher Boessenkool wrote: >>>> That's because that commit also edits ChangeLog entries from other >>>> authors. When a commit adds / edits ChangeLog entries for more than one >>>> author (the difference between purely editing an existing entry and adding >>>> a new one, possibly under an existing date/author header, for a >>>> multi-author commit, is not something that can reliably be determined >>>> automatically), the conversion falls back to using the committer identity >>>> instead of picking one of the multiple relevant authors from the ChangeLog >>>> files. >>> There is only one relevant author in r270511. It edits a few wrong path >>> names in the previous changelog entries. People often do similar things >>> (like fixing the commit date :-) ) >> Distinguishing "edits a previous ChangeLog entry" from "adds a new entry >> under a previous ChangeLog header for a change included in the commit" is >> a human judgement. > We are doing only one conversion here, the one of the GCC repo. The > heuristic works, we checked it did. > >>> Either never use @gcc.gnu.org, or always use it, don't do the >>> worst of both worlds? >> The heuristics here are to use an attribution from ChangeLog for the >> author where unambiguous, but to use the committer (always @gcc.gnu.org / >> @gnu.org [*], so avoiding attributions at the wrong company even where >> people were using multiple addresses simultaneously for different changes) >> as author if in doubt. > You never need that, and it is worse to use two different schemes than to > choose either. > > I would have chosen the "@gcc.gnu.org" scheme, because it is > simple and *correct*. Other people wanted the nicer names. Maxim's > conversion gets that correct. Please copy it. > IMHO Segher is a bit categorical is the discussion, but I'll be glad to see brief description of Maxim's approach to manage emails, gcc-pretty shows better results. Speaking about the script counting authors from ChangeLog files, even if we drop an "edits a previous ChangeLog entry" issue, it still sometimes work not as Joseph described: 3a) In r155892, r155893 and r259314 Alex is not counted as the only author without any reason. 3b) In r139854, r141108 and r196252 script selected the author successfully, while actually there are more that one. 3c) Maybe here we can also somehow organize ChangeLog priority (again, gcc/ChangeLog is more important that testsuite one). There are a lot of examples, when testsuite/ChangeLog entry have another author: r145055, r150680, r155889, r155894, r155890, r163904, r180186 and r183325. 3d) If we fix 3b+3c we can also look at r143753, r155890 and r155895. 3e) r155891, r207422, r183627 and r234218 are examples of commits which don't touch any ChangeLog files for different reasons. Seems unsolvable in current approach. -- Roman