From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 126075 invoked by alias); 29 Dec 2019 18:31:03 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 125929 invoked by uid 89); 29 Dec 2019 18:30:48 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.1 required=5.0 tests=BAYES_50,GIT_PATCH_2,KAM_ASCII_DIVIDERS,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS,T_FILL_THIS_FORM_SHORT autolearn=ham version=3.3.1 spammy=summer, persons, 2009, chris X-HELO: mail-lj1-f196.google.com Received: from mail-lj1-f196.google.com (HELO mail-lj1-f196.google.com) (209.85.208.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 29 Dec 2019 18:30:44 +0000 Received: by mail-lj1-f196.google.com with SMTP id h23so31390019ljc.8 for ; Sun, 29 Dec 2019 10:30:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=JKaD1pQP2PX79s9oiESy4UvyfeFPbbA9iyQxRPk4JYA=; b=WSotEj4XhtIT7a7mTNOQI+uue8PmcDvWTqcrqJ2+hWE6DM66AS1DYJfJ7RmR8cNt44 gJjBgmqhooEDNVlRfR//WjQiPUSYnqQ/1BdK35GzecFl8X0F1OUWTyFe3N/Fl9wx9QdI K/inG8kY95gbdSc7vCqYltANlLwruyQ8pHk6IRClQB/edy1LNB7OvhB0q8fXKH76yFmz wOzP6wPg+xXuA+VGAOivYFFH6JQjLphzcoDgTJsAI2rkIobYgoSaX0J2Q1e0iyroZhE1 h9XXIOEchPOkL32JxRpeKay6T+F7NGs8m2qVp4HQXjHneEzS50kbCSLoyiBJ7uYD+kZn yjlw== Return-Path: Received: from ?IPv6:2a00:1370:8116:64e8:bd44:65b9:887c:878? ([2a00:1370:8116:64e8:bd44:65b9:887c:878]) by smtp.gmail.com with ESMTPSA id j204sm17592827lfj.38.2019.12.29.10.30.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 29 Dec 2019 10:30:37 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.40.2.2.4\)) Subject: Re: Proposal for the transition timetable for the move to GIT From: Maxim Kuvyrkov In-Reply-To: <5DCEA32B-3E36-4400-B931-9F4E2A8F3FA5@linaro.org> Date: Sun, 29 Dec 2019 18:31:00 -0000 Cc: Joseph Myers , Alexandre Oliva , "Eric S. Raymond" , Jeff Law , Segher Boessenkool , Mark Wielaard , "Richard Earnshaw (lists)" , Jakub Jelinek Content-Transfer-Encoding: quoted-printable Message-Id: <155B5BFD-6ECF-4EBF-A38C-D6DD178FB497@linaro.org> References: <20191216133632.GC3152@gate.crashing.org> <20191216135451.GA3142@thyrsus.com> <20191216140514.GD3152@gate.crashing.org> <20191216153649.GE3152@gate.crashing.org> <20191225120747.GA96669@thyrsus.com> <20191226111633.GJ10088@tucnak> <5DCEA32B-3E36-4400-B931-9F4E2A8F3FA5@linaro.org> To: GCC Development X-SW-Source: 2019-12/txt/msg00515.txt.bz2 Below are several more issues I found in reposurgeon-6a conversion comparin= g it against gcc-reparent conversion. I am sure, these and whatever other problems I may find in the reposurgeon = conversion can be fixed in time. However, I don't see why should bother. = My conversion has been available since summer 2019, I made it ready in time= for GCC Cauldron 2019, and it didn't change in any significant way since t= hen. With the "Missed merges" problem (see below) I don't see how reposurgeon co= nversion can be considered "ready". Also, I expected a diligent developer = to compare new conversion (aka reposurgeon's) against existing conversion (= aka gcc-pretty / gcc-reparent) before declaring the new conversion "better"= or even "ready". The data I'm seeing in differences between my and reposu= rgeon conversions shows that gcc-reparent conversion is /better/. I suggest that GCC community adopts either gcc-pretty or gcc-reparent conve= rsion. I welcome Richard E. to modify his summary scripts to work with svn= -git scripts, which should be straightforward, and I'm ready to help. Meanwhile, I'm going to add additional root commits to my gcc-reparent conv= ersion to bring in "missing" branches (the ones, which don't share history = with trunk@1) and restart daily updates of gcc-reparent conversion. Finally, with the comparison data I have, I consider statements about git-s= vn's poor quality to be very misleading. Git-svn may have had serious bugs= years ago when Eric R. evaluated it and started his work on reposurgeon. = But a lot of development has happened and many problems have been fixed sin= ce them. At the moment it is reposurgeon that is producing conversions wit= h obscure mistakes in repository metadata. =3D=3D=3D Missed merges =3D=3D=3D Reposurgeon misses merges from trunk on 130+ branches. I've spot-checked A= RM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane merg= es were omitted. Below is analysis for ARM/hard_vfp_branch. $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4 ---- commit ef92c24b042965dfef982349cd5994a2e0ff5fde Author: Richard Earnshaw Date: Mon Jul 20 08:15:51 2009 +0000 Merge trunk through to r149768 =20=20=20=20 Legacy-ID: 149804 COPYING.RUNTIME | 73 + ChangeLog | 270 +- MAINTAINERS | 19 +- ---- at the same time for svn-git scripts we have: $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4 ---- commit ce7d5c8df673a7a561c29f095869f20567a7c598 Merge: 4970119c20da 3a69b1e566a7 Author: Richard Earnshaw Date: Mon Jul 20 08:15:51 2009 +0000 Merge trunk through to r149768 =20=20=20=20 git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@14= 9804 138bc75d-0d04-0410-961f-82ee72b054a4 ---- ... which agrees with $ svn propget svn:mergeinfo file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo= /branches/ARM/hard_vfp_branch@149804 /trunk:142588-149768 =3D=3D=3D Bad author entries =3D=3D=3D Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and "2005-= 03-18 Kazu Hirata". It is rather obvious that person's name is unlikely to= start with a digit. =3D=3D=3D Missed authors =3D=3D=3D Reposurgeon-6a conversion misses many authors, below is a list of people wi= th names starting with "A". Akos Kiss Anders Bertelrud Andrew Pochinsky Anton Hartl Arthur Norman Aymeric Vincent =3D=3D=3D Conservative author entries =3D=3D=3D Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many commi= ts where svn-git conversion manages to extract valid email from commit data= . This happens for hundreds of author entries. Regards, -- Maxim Kuvyrkov https://www.linaro.org > On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov w= rote: >=20 >=20 >> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek wrote: >>=20 >> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote: >> Is there some easy way (e.g. file in the conversion scripts) to correct >> spelling and other mistakes in the commit authors? >> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see >> Jakub Jakub Jelinek (1): >> Jakub Jeilnek (1): >> Jelinek (1): >> entries next to the expected one with most of the commits. >> For the misspellings, wonder if e.g. we couldn't compute edit distances = from >> other names and if we have one with many commits and then one with very = few >> with small edit distance from those, flag it for human review. >=20 > This is close to what svn-git-author.sh script is doing in gcc-pretty and= gcc-reparent conversions. It ignores 1-3 character differences in author/= committer names and email addresses. I've audited results for all branches= and didn't spot any mistakes. >=20 > In other news, I'm working on comparison of gcc-pretty, gcc-reparent and = gcc-reposurgeon-5a repos among themselves. Below are current notes for com= parison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk. >=20 > =3D=3D Merges on trunk =3D=3D >=20 > Reposurgeon creates merge entries on trunk when changes from a branch are= merged into trunk. This brings entire development history from the branch= to trunk, which is both good and bad. The good part is that we get more v= isibility into how the code evolved. The bad part is that we get many "noi= sy" commits from merged branch (e.g., "Merge in trunk" every few revisions)= and that our SVN branches are work-in-progress quality, not ready for revi= ew/commit quality. It's common for files to be re-written in large chunks = on branches. >=20 > Also, reposurgeon's commit logs don't have information on SVN path from w= hich the change came, so there is no easy way to determine that a given com= mit is from a merged branch, not an original trunk commit. Git-svn, on the= other hand, provides "git-svn-id: @" tags in its commit lo= gs. >=20 > My conversion follows current GCC development policy that trunk history s= hould be linear. Branch merges to trunk are squashed. Merges between non-= trunk branches are handled as specified by svn:mergeinfo SVN properties. >=20 > =3D=3D Differences in trees =3D=3D >=20 > Git trees (aka filesystem content) match between pretty/trunk and reposur= geon-5a/trunk from current tip and up tosvn's r130805. > Here is SVN log of that revision (restoration of deleted trunk): > ------------------------------------------------------------------------ > r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007) > Changed paths: > A /trunk (from /trunk:130802) > ------------------------------------------------------------------------ >=20 > Reposurgeon conversion has: > ------------- > commit 7e6f2a96e89d96c2418482788f94155d87791f0a > Author: Daniel Berlin > Date: Thu Dec 13 01:53:37 2007 +0000 >=20 > Readd trunk >=20 > Legacy-ID: 130805 >=20 > .gitignore | 17 ----------------- > 1 file changed, 17 deletions(-) > ------------- > and my conversion has: > ------------- > commit fb128f3970789ce094c798945b4fa20eceb84cc7 > Author: Daniel Berlin > Date: Thu Dec 13 01:53:37 2007 +0000 >=20 > Readd trunk >=20 >=20 > git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 138bc75d-0d04-041= 0-961f-82ee72b054a4 > ------------- >=20 > It appears that .gitignore has been added in r1 by reposurgeon and then d= eleted at r130805. In SVN repository .gitignore was added in r195087. I s= peculate that addition of .gitignore at r1 is expected, but it's deletion a= t r130805 is highly suspicious. >=20 > =3D=3D Committer entries =3D=3D >=20 > Reposurgeon uses $user@gcc.gnu.org for committer email addresses even whe= n it correctly detects author name from ChangeLog. >=20 > reposurgeon-5a: > r278995 Martin Liska Martin Liska > r278994 Jozef Lawrynowicz Jozef Lawrynowicz > r278993 Frederik Harwath Frederik Harwath > r278992 Georg-Johann Lay Georg-Johann Lay > r278991 Richard Biener Richard Biener >=20 > pretty: > r278995 Martin Liska Martin Liska > r278994 Jozef Lawrynowicz Jozef Lawrynowicz > r278993 Frederik Harwath Frederik Harwath > r278992 Georg-Johann Lay Georg-Johann Lay > r278991 Richard Biener Richard Biener >=20 > =3D=3D Bad summary line =3D=3D >=20 > While looking around r138087, below caught my eye. Is the contents of su= mmary line as expected? >=20 > commit cc2726884d56995c514d8171cc4a03657851657e > Author: Chris Fairles > Date: Wed Jul 23 14:49:00 2008 +0000 >=20 > acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS. >=20 > 2008-07-23 Chris Fairles >=20 > * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX= _LIBS. > Holds the lib that defines clock_gettime (-lrt or -lposix4). > * src/Makefile.am: Use it. > * configure: Regenerate. > * configure.in: Likewise. > * Makefile.in: Likewise. > * src/Makefile.in: Likewise. > * libsup++/Makefile.in: Likewise. > * po/Makefile.in: Likewise. > * doc/Makefile.in: Likewise. >=20 > Legacy-ID: 138087 >=20 >=20 > -- > Maxim Kuvyrkov > https://www.linaro.org >=20