From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25073 invoked by alias); 30 Dec 2019 13:01:27 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 113058 invoked by uid 89); 30 Dec 2019 13:00:58 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.8 required=5.0 tests=BAYES_00,GIT_PATCH_2,KAM_ASCII_DIVIDERS,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS,T_FILL_THIS_FORM_SHORT,UNSUBSCRIBE_BODY autolearn=ham version=3.3.1 spammy=parents, Summer X-HELO: mail-lf1-f53.google.com Received: from mail-lf1-f53.google.com (HELO mail-lf1-f53.google.com) (209.85.167.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Dec 2019 13:00:35 +0000 Received: by mail-lf1-f53.google.com with SMTP id r14so25114790lfm.5 for ; Mon, 30 Dec 2019 05:00:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=LmgZVoWclc04ERcRJVAuHlr0jNpH3AnV4ghYkC39Z8s=; b=f/NIV/ds3OeQjEJWWUwoe5aSHIbbYBE4ln+E+xd5hJSGCOM8nZqAejU+59hsPOgJpJ oek0W8isPsaGUko0MUkOJGYmQ6eMpSE0RpNOn+dtzkwjBVfMsocBMSVo4yF1spYFs1Fh xZQH3lxTiEthg2v40sSda4v+KHlHsBAAASL7jUc7rRHV+r0UBir/iImc2FDIPnEzt5DM Ulyzd7xAahG65ECtlQrfV+ZI3vb+iUN3m/pQVm+JY4/FhUQcyfjiXiIdHrkb8WAUuGhA uxRcunkeyMT1G2mEkFlzNyonoDzD1NPupQGaO45NgDqU6KdycNJbGqZRmjkZGHpp1Xr+ 6zxQ== Return-Path: Received: from ?IPv6:2a00:1370:8116:64e8:f435:bf44:821c:a9f3? ([2a00:1370:8116:64e8:f435:bf44:821c:a9f3]) by smtp.gmail.com with ESMTPSA id b6sm18617083lfq.11.2019.12.30.05.00.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Dec 2019 05:00:17 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.40.2.2.4\)) Subject: Re: Proposal for the transition timetable for the move to GIT From: Maxim Kuvyrkov In-Reply-To: <2b6330f2-1a00-ac89-fd3c-4b70e5454f4b@arm.com> Date: Mon, 30 Dec 2019 13:01:00 -0000 Cc: GCC Development , Joseph Myers , Alexandre Oliva , "Eric S. Raymond" , Jeff Law , Segher Boessenkool , Mark Wielaard , Jakub Jelinek Content-Transfer-Encoding: quoted-printable Message-Id: <9B71A0F7-CD93-4636-BFC7-1D1DBC040F07@linaro.org> References: <20191216133632.GC3152@gate.crashing.org> <20191216135451.GA3142@thyrsus.com> <20191216140514.GD3152@gate.crashing.org> <20191216153649.GE3152@gate.crashing.org> <20191225120747.GA96669@thyrsus.com> <20191226111633.GJ10088@tucnak> <5DCEA32B-3E36-4400-B931-9F4E2A8F3FA5@linaro.org> <155B5BFD-6ECF-4EBF-A38C-D6DD178FB497@linaro.org> <2b6330f2-1a00-ac89-fd3c-4b70e5454f4b@arm.com> To: "Richard Earnshaw (lists)" X-SW-Source: 2019-12/txt/msg00534.txt.bz2 > On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) wrote: >=20 > On 29/12/2019 18:30, Maxim Kuvyrkov wrote: >> Below are several more issues I found in reposurgeon-6a conversion compa= ring it against gcc-reparent conversion. >>=20 >> I am sure, these and whatever other problems I may find in the reposurge= on conversion can be fixed in time. However, I don't see why should bother= . My conversion has been available since summer 2019, I made it ready in t= ime for GCC Cauldron 2019, and it didn't change in any significant way sinc= e then. >>=20 >> With the "Missed merges" problem (see below) I don't see how reposurgeon= conversion can be considered "ready". Also, I expected a diligent develop= er to compare new conversion (aka reposurgeon's) against existing conversio= n (aka gcc-pretty / gcc-reparent) before declaring the new conversion "bett= er" or even "ready". The data I'm seeing in differences between my and rep= osurgeon conversions shows that gcc-reparent conversion is /better/. >>=20 >> I suggest that GCC community adopts either gcc-pretty or gcc-reparent co= nversion. I welcome Richard E. to modify his summary scripts to work with = svn-git scripts, which should be straightforward, and I'm ready to help. >>=20 >=20 > I don't think either of these conversions are any more ready to use than > the reposurgeon one, possibly less so. In fact, there are still some > major issues to resolve first before they can be considered. >=20 > gcc-pretty has completely wrong parent information for the gcc-3 era > release tags, showing the tags as being made directly from trunk with > massive deltas representing the roll-up of all the commits that were > made on the gcc-3 release branch. I will clarify the above statement, and please correct me where you think I= 'm wrong. Gcc-pretty conversion has the exact right parent information for= the gcc-3 era release tags as recorded in SVN version history. Gcc-pretty conversion aim= s to produce an exact copy of SVN history in git. IMO, it manages to do so= just fine. It is a different thing that SVN history has a screwed up record of gcc-3 e= ra tags. >=20 > gcc-reparent is better, but many (most?) of the release tags are shown > as merge commits with a fake parent back to the gcc-3 branch point, > which is certainly not what happened when the tagging was done at that > time. I agree with you here. >=20 > Both of these factually misrepresent the history at the time of the > release tag being made. Yes and no. Gcc-pretty repository mirrors SVN history. And regarding the = need for reparenting -- we lived with current history for gcc-3 release tag= s for a long time. I would argue their continued brokenness is not a show-= stopper. Looking at this from a different perspective, when I posted the initial svn= -git scripts back in Summer, the community roughly agreed on a plan to 1. Convert entire SVN history to git. 2. Use the stock git history rewrite tools (git filter-branch) to fixup wha= t we want, e.g., reparent tags and branches or set better author/committer = entries. Gcc-pretty does (1) in entirety. For reparenting, I tried a 15min fix to my scripts to enable reparenting, w= hich worked, but with artifacts like the merge commit from old and new pare= nts. I will drop this and instead use tried-and-true "git filter-branch" t= o reparent those tags and branches, thus producing gcc-reparent from gcc-pr= etty. >=20 > As for converting my script to work with your tools, I'm afraid I don't > have time to work on that right now. I'm still bogged down validating > the incorrect bug ids that the script has identified for some commits. > I'm making good progress (we're down to 160 unreviewed commits now), but > it is still going to take what time I have over the next week to > complete that task. >=20 > Furthermore, there is no documentation on how your conversion scripts > work, so it is not possible for me to test any work I might do in order > to validate such changes. Not being able to run the script locally to > test change would be a non-starter. >=20 > You are welcome, of course, to clone the script I have and attempt to > modify it yourself, it's reasonably well documented. The sources can be > found in esr's gcc-conversion repository here: > https://gitlab.com/esr/gcc-conversion.git -- Maxim Kuvyrkov https://www.linaro.org >=20 >=20 >> Meanwhile, I'm going to add additional root commits to my gcc-reparent c= onversion to bring in "missing" branches (the ones, which don't share histo= ry with trunk@1) and restart daily updates of gcc-reparent conversion. >>=20 >> Finally, with the comparison data I have, I consider statements about gi= t-svn's poor quality to be very misleading. Git-svn may have had serious b= ugs years ago when Eric R. evaluated it and started his work on reposurgeon= . But a lot of development has happened and many problems have been fixed = since them. At the moment it is reposurgeon that is producing conversions = with obscure mistakes in repository metadata. >>=20 >>=20 >> =3D=3D=3D Missed merges =3D=3D=3D >>=20 >> Reposurgeon misses merges from trunk on 130+ branches. I've spot-checke= d ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane m= erges were omitted. Below is analysis for ARM/hard_vfp_branch. >>=20 >> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4 >> ---- >> commit ef92c24b042965dfef982349cd5994a2e0ff5fde >> Author: Richard Earnshaw >> Date: Mon Jul 20 08:15:51 2009 +0000 >>=20 >> Merge trunk through to r149768 >>=20 >> Legacy-ID: 149804 >>=20 >> COPYING.RUNTIME | 73 + >> ChangeLog | 270 +- >> MAINTAINERS | 19 +- >> >> ---- >>=20 >> at the same time for svn-git scripts we have: >>=20 >> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4 >> ---- >> commit ce7d5c8df673a7a561c29f095869f20567a7c598 >> Merge: 4970119c20da 3a69b1e566a7 >> Author: Richard Earnshaw >> Date: Mon Jul 20 08:15:51 2009 +0000 >>=20 >> Merge trunk through to r149768 >>=20 >> git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@= 149804 138bc75d-0d04-0410-961f-82ee72b054a4 >> ---- >>=20 >> ... which agrees with >> $ svn propget svn:mergeinfo file:///home/maxim.kuvyrkov/tmpfs-stuff/svnr= epo/branches/ARM/hard_vfp_branch@149804 >> /trunk:142588-149768 >>=20 >> =3D=3D=3D Bad author entries =3D=3D=3D >>=20 >> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and "20= 05-03-18 Kazu Hirata". It is rather obvious that person's name is unlikely= to start with a digit. >>=20 >> =3D=3D=3D Missed authors =3D=3D=3D >>=20 >> Reposurgeon-6a conversion misses many authors, below is a list of people= with names starting with "A". >>=20 >> Akos Kiss >> Anders Bertelrud >> Andrew Pochinsky >> Anton Hartl >> Arthur Norman >> Aymeric Vincent >>=20 >> =3D=3D=3D Conservative author entries =3D=3D=3D >>=20 >> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many co= mmits where svn-git conversion manages to extract valid email from commit d= ata. This happens for hundreds of author entries. >>=20 >> Regards, >>=20 >> -- >> Maxim Kuvyrkov >> https://www.linaro.org >>=20 >>=20 >>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov = wrote: >>>=20 >>>=20 >>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek wrote: >>>>=20 >>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote: >>>> Is there some easy way (e.g. file in the conversion scripts) to correct >>>> spelling and other mistakes in the commit authors? >>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I s= ee >>>> Jakub Jakub Jelinek (1): >>>> Jakub Jeilnek (1): >>>> Jelinek (1): >>>> entries next to the expected one with most of the commits. >>>> For the misspellings, wonder if e.g. we couldn't compute edit distance= s from >>>> other names and if we have one with many commits and then one with ver= y few >>>> with small edit distance from those, flag it for human review. >>>=20 >>> This is close to what svn-git-author.sh script is doing in gcc-pretty a= nd gcc-reparent conversions. It ignores 1-3 character differences in autho= r/committer names and email addresses. I've audited results for all branch= es and didn't spot any mistakes. >>>=20 >>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent an= d gcc-reposurgeon-5a repos among themselves. Below are current notes for c= omparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk. >>>=20 >>> =3D=3D Merges on trunk =3D=3D >>>=20 >>> Reposurgeon creates merge entries on trunk when changes from a branch a= re merged into trunk. This brings entire development history from the bran= ch to trunk, which is both good and bad. The good part is that we get more= visibility into how the code evolved. The bad part is that we get many "n= oisy" commits from merged branch (e.g., "Merge in trunk" every few revision= s) and that our SVN branches are work-in-progress quality, not ready for re= view/commit quality. It's common for files to be re-written in large chunk= s on branches. >>>=20 >>> Also, reposurgeon's commit logs don't have information on SVN path from= which the change came, so there is no easy way to determine that a given c= ommit is from a merged branch, not an original trunk commit. Git-svn, on t= he other hand, provides "git-svn-id: @" tags in its commit = logs. >>>=20 >>> My conversion follows current GCC development policy that trunk history= should be linear. Branch merges to trunk are squashed. Merges between no= n-trunk branches are handled as specified by svn:mergeinfo SVN properties. >>>=20 >>> =3D=3D Differences in trees =3D=3D >>>=20 >>> Git trees (aka filesystem content) match between pretty/trunk and repos= urgeon-5a/trunk from current tip and up tosvn's r130805. >>> Here is SVN log of that revision (restoration of deleted trunk): >>> ------------------------------------------------------------------------ >>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007) >>> Changed paths: >>> A /trunk (from /trunk:130802) >>> ------------------------------------------------------------------------ >>>=20 >>> Reposurgeon conversion has: >>> ------------- >>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a >>> Author: Daniel Berlin >>> Date: Thu Dec 13 01:53:37 2007 +0000 >>>=20 >>> Readd trunk >>>=20 >>> Legacy-ID: 130805 >>>=20 >>> .gitignore | 17 ----------------- >>> 1 file changed, 17 deletions(-) >>> ------------- >>> and my conversion has: >>> ------------- >>> commit fb128f3970789ce094c798945b4fa20eceb84cc7 >>> Author: Daniel Berlin >>> Date: Thu Dec 13 01:53:37 2007 +0000 >>>=20 >>> Readd trunk >>>=20 >>>=20 >>> git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 138bc75d-0d04-04= 10-961f-82ee72b054a4 >>> ------------- >>>=20 >>> It appears that .gitignore has been added in r1 by reposurgeon and then= deleted at r130805. In SVN repository .gitignore was added in r195087. I= speculate that addition of .gitignore at r1 is expected, but it's deletion= at r130805 is highly suspicious. >>>=20 >>> =3D=3D Committer entries =3D=3D >>>=20 >>> Reposurgeon uses $user@gcc.gnu.org for committer email addresses even w= hen it correctly detects author name from ChangeLog. >>>=20 >>> reposurgeon-5a: >>> r278995 Martin Liska Martin Liska >>> r278994 Jozef Lawrynowicz Jozef Lawrynowicz = >>> r278993 Frederik Harwath Frederik Harwath <= frederik@gcc.gnu.org> >>> r278992 Georg-Johann Lay Georg-Johann Lay >>> r278991 Richard Biener Richard Biener >>>=20 >>> pretty: >>> r278995 Martin Liska Martin Liska >>> r278994 Jozef Lawrynowicz Jozef Lawrynowicz = >>> r278993 Frederik Harwath Frederik Harwath <= frederik@codesourcery.com> >>> r278992 Georg-Johann Lay Georg-Johann Lay >>> r278991 Richard Biener Richard Biener >>>=20 >>> =3D=3D Bad summary line =3D=3D >>>=20 >>> While looking around r138087, below caught my eye. Is the contents of = summary line as expected? >>>=20 >>> commit cc2726884d56995c514d8171cc4a03657851657e >>> Author: Chris Fairles >>> Date: Wed Jul 23 14:49:00 2008 +0000 >>>=20 >>> acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS. >>>=20 >>> 2008-07-23 Chris Fairles >>>=20 >>> * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCX= X_LIBS. >>> Holds the lib that defines clock_gettime (-lrt or -lposix4). >>> * src/Makefile.am: Use it. >>> * configure: Regenerate. >>> * configure.in: Likewise. >>> * Makefile.in: Likewise. >>> * src/Makefile.in: Likewise. >>> * libsup++/Makefile.in: Likewise. >>> * po/Makefile.in: Likewise. >>> * doc/Makefile.in: Likewise. >>>=20 >>> Legacy-ID: 138087 >>>=20 >>>=20 >>> -- >>> Maxim Kuvyrkov >>> https://www.linaro.org >>>=20 >>=20 >=20