public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
To: GCC Development <gcc@gcc.gnu.org>
Cc: Joseph Myers <jsm@polyomino.org.uk>,
	Alexandre Oliva <oliva@gnu.org>,
	"Eric S. Raymond" <esr@thyrsus.com>, Jeff Law <law@redhat.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	Mark Wielaard <mark@klomp.org>,
	"Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>,
	Jakub Jelinek <jakub@redhat.com>
Subject: Re: Proposal for the transition timetable for the move to GIT
Date: Sun, 29 Dec 2019 18:31:00 -0000	[thread overview]
Message-ID: <155B5BFD-6ECF-4EBF-A38C-D6DD178FB497@linaro.org> (raw)
In-Reply-To: <5DCEA32B-3E36-4400-B931-9F4E2A8F3FA5@linaro.org>

Below are several more issues I found in reposurgeon-6a conversion comparing it against gcc-reparent conversion.

I am sure, these and whatever other problems I may find in the reposurgeon conversion can be fixed in time.  However, I don't see why should bother.  My conversion has been available since summer 2019, I made it ready in time for GCC Cauldron 2019, and it didn't change in any significant way since then.

With the "Missed merges" problem (see below) I don't see how reposurgeon conversion can be considered "ready".  Also, I expected a diligent developer to compare new conversion (aka reposurgeon's) against existing conversion (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" or even "ready".  The data I'm seeing in differences between my and reposurgeon conversions shows that gcc-reparent conversion is /better/.

I suggest that GCC community adopts either gcc-pretty or gcc-reparent conversion.  I welcome Richard E. to modify his summary scripts to work with svn-git scripts, which should be straightforward, and I'm ready to help.

Meanwhile, I'm going to add additional root commits to my gcc-reparent conversion to bring in "missing" branches (the ones, which don't share history with trunk@1) and restart daily updates of gcc-reparent conversion.

Finally, with the comparison data I have, I consider statements about git-svn's poor quality to be very misleading.  Git-svn may have had serious bugs years ago when Eric R. evaluated it and started his work on reposurgeon.  But a lot of development has happened and many problems have been fixed since them.  At the moment it is reposurgeon that is producing conversions with obscure mistakes in repository metadata.


=== Missed merges ===

Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane merges were omitted.  Below is analysis for ARM/hard_vfp_branch.

$ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
----
commit ef92c24b042965dfef982349cd5994a2e0ff5fde
Author: Richard Earnshaw <rearnsha@gcc.gnu.org>
Date:   Mon Jul 20 08:15:51 2009 +0000

    Merge trunk through to r149768
    
    Legacy-ID: 149804

 COPYING.RUNTIME                                     |    73 +
 ChangeLog                                           |   270 +-
 MAINTAINERS                                         |    19 +-
<MANY OTHER FILES>
----

at the same time for svn-git scripts we have:

$ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
----
commit ce7d5c8df673a7a561c29f095869f20567a7c598
Merge: 4970119c20da 3a69b1e566a7
Author: Richard Earnshaw <rearnsha@arm.com>
Date:   Mon Jul 20 08:15:51 2009 +0000

    Merge trunk through to r149768
    
    git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 138bc75d-0d04-0410-961f-82ee72b054a4
----

... which agrees with
$ svn propget svn:mergeinfo file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
/trunk:142588-149768

=== Bad author entries ===

Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is unlikely to start with a digit.

=== Missed authors ===

Reposurgeon-6a conversion misses many authors, below is a list of people with names starting with "A".

Akos Kiss
Anders Bertelrud
Andrew Pochinsky
Anton Hartl
Arthur Norman
Aymeric Vincent

=== Conservative author entries ===

Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many commits where svn-git conversion manages to extract valid email from commit data.  This happens for hundreds of author entries.

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org


> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
> 
> 
>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> 
>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
>> Is there some easy way (e.g. file in the conversion scripts) to correct
>> spelling and other mistakes in the commit authors?
>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>> Jakub Jakub Jelinek (1):
>> Jakub Jeilnek (1):
>> Jelinek (1):
>> entries next to the expected one with most of the commits.
>> For the misspellings, wonder if e.g. we couldn't compute edit distances from
>> other names and if we have one with many commits and then one with very few
>> with small edit distance from those, flag it for human review.
> 
> This is close to what svn-git-author.sh script is doing in gcc-pretty and gcc-reparent conversions.  It ignores 1-3 character differences in author/committer names and email addresses.  I've audited results for all branches and didn't spot any mistakes.
> 
> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and gcc-reposurgeon-5a repos among themselves.  Below are current notes for comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
> 
> == Merges on trunk ==
> 
> Reposurgeon creates merge entries on trunk when changes from a branch are merged into trunk.  This brings entire development history from the branch to trunk, which is both good and bad.  The good part is that we get more visibility into how the code evolved.  The bad part is that we get many "noisy" commits from merged branch (e.g., "Merge in trunk" every few revisions) and that our SVN branches are work-in-progress quality, not ready for review/commit quality.  It's common for files to be re-written in large chunks on branches.
> 
> Also, reposurgeon's commit logs don't have information on SVN path from which the change came, so there is no easy way to determine that a given commit is from a merged branch, not an original trunk commit.  Git-svn, on the other hand, provides "git-svn-id: <path>@<revision>" tags in its commit logs.
> 
> My conversion follows current GCC development policy that trunk history should be linear.  Branch merges to trunk are squashed.  Merges between non-trunk branches are handled as specified by svn:mergeinfo SVN properties.
> 
> == Differences in trees ==
> 
> Git trees (aka filesystem content) match between pretty/trunk and reposurgeon-5a/trunk from current tip and up tosvn's r130805.
> Here is SVN log of that revision (restoration of deleted trunk):
> ------------------------------------------------------------------------
> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
> Changed paths:
>   A /trunk (from /trunk:130802)
> ------------------------------------------------------------------------
> 
> Reposurgeon conversion has:
> -------------
> commit 7e6f2a96e89d96c2418482788f94155d87791f0a
> Author: Daniel Berlin <dberlin@gcc.gnu.org>
> Date:   Thu Dec 13 01:53:37 2007 +0000
> 
>    Readd trunk
> 
>    Legacy-ID: 130805
> 
> .gitignore | 17 -----------------
> 1 file changed, 17 deletions(-)
> -------------
> and my conversion has:
> -------------
> commit fb128f3970789ce094c798945b4fa20eceb84cc7
> Author: Daniel Berlin <dberlin@dbrelin.org>
> Date:   Thu Dec 13 01:53:37 2007 +0000
> 
>    Readd trunk
> 
> 
>    git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 138bc75d-0d04-0410-961f-82ee72b054a4
> -------------
> 
> It appears that .gitignore has been added in r1 by reposurgeon and then deleted at r130805.  In SVN repository .gitignore was added in r195087.  I speculate that addition of .gitignore at r1 is expected, but it's deletion at r130805 is highly suspicious.
> 
> == Committer entries ==
> 
> Reposurgeon uses $user@gcc.gnu.org for committer email addresses even when it correctly detects author name from ChangeLog.
> 
> reposurgeon-5a:
> r278995 Martin Liska <mliska@suse.cz> Martin Liska <marxin@gcc.gnu.org>
> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozefl@gcc.gnu.org>
> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@gcc.gnu.org>
> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <gjl@gcc.gnu.org>
> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenth@gcc.gnu.org>
> 
> pretty:
> r278995 Martin Liska <mliska@suse.cz> Martin Liska <mliska@suse.cz>
> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozef.l@mittosystems.com>
> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@codesourcery.com>
> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <avr@gjlay.de>
> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenther@suse.de>
> 
> == Bad summary line ==
> 
> While looking around r138087, below caught my eye.  Is the contents of summary line as expected?
> 
> commit cc2726884d56995c514d8171cc4a03657851657e
> Author: Chris Fairles <chris.fairles@gmail.com>
> Date:   Wed Jul 23 14:49:00 2008 +0000
> 
>    acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
> 
>    2008-07-23  Chris Fairles <chris.fairles@gmail.com>
> 
>            * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>            Holds the lib that defines clock_gettime (-lrt or -lposix4).
>            * src/Makefile.am: Use it.
>            * configure: Regenerate.
>            * configure.in: Likewise.
>            * Makefile.in: Likewise.
>            * src/Makefile.in: Likewise.
>            * libsup++/Makefile.in: Likewise.
>            * po/Makefile.in: Likewise.
>            * doc/Makefile.in: Likewise.
> 
>    Legacy-ID: 138087
> 
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 

  parent reply	other threads:[~2019-12-29 18:31 UTC|newest]

Thread overview: 198+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-17 12:02 Richard Earnshaw (lists)
2019-09-17 12:24 ` Richard Biener
2019-09-17 13:50   ` Richard Earnshaw (lists)
2019-09-17 16:35   ` Joseph Myers
2019-09-17 17:51     ` Richard Earnshaw (lists)
2019-09-17 16:33 ` Joseph Myers
2019-09-19 12:04 ` Janne Blomqvist
2019-09-19 14:43   ` Damian Rouson
2019-09-19 15:30     ` Janne Blomqvist
2019-10-25 14:10     ` Richard Earnshaw (lists)
2019-10-25 16:32       ` Jeff Law
2019-09-19 15:30   ` Richard Earnshaw (lists)
2019-09-19 15:49     ` Damian Rouson
2019-09-19 15:35 ` Maxim Kuvyrkov
2019-12-06 14:44   ` Maxim Kuvyrkov
2019-12-06 17:21     ` Eric S. Raymond
2019-12-06 17:39       ` Richard Biener
2019-12-06 19:46         ` Eric S. Raymond
2019-12-06 20:43           ` Sandra Loosemore
2019-12-07  2:57           ` Segher Boessenkool
2019-12-09 18:19           ` Joseph Myers
2019-12-09 18:40             ` Bernd Schmidt
2019-12-09 20:45               ` Joseph Myers
2019-12-09 22:12               ` Eric S. Raymond
2019-12-09 19:28             ` Eric S. Raymond
2019-12-11 14:40             ` Maxim Kuvyrkov
2019-12-11 15:03               ` Richard Earnshaw (lists)
2019-12-11 15:19                 ` Jonathan Wakely
2019-12-11 15:21                   ` Richard Earnshaw (lists)
2019-12-11 15:36                     ` Joseph Myers
2019-12-11 16:02                       ` Jonathan Wakely
2019-12-11 17:47                         ` Eric S. Raymond
2019-12-16  2:19                       ` Joseph Myers
2019-12-11 15:30                   ` Dennis Luehring
2019-12-11 15:36                     ` Richard Earnshaw
2019-12-11 17:36                   ` Eric S. Raymond
2019-12-06 20:49       ` Bernd Schmidt
2019-12-16  9:53     ` Mark Wielaard
2019-12-16 11:29       ` Joseph Myers
2019-12-16 12:43         ` Mark Wielaard
2019-12-16 13:36           ` Segher Boessenkool
2019-12-16 13:54             ` Eric S. Raymond
2019-12-16 14:05               ` Segher Boessenkool
2019-12-16 14:13                 ` Joseph Myers
2019-12-16 15:37                   ` Segher Boessenkool
2019-12-16 16:36                     ` Joseph Myers
2019-12-16 17:40                     ` Jeff Law
2019-12-25  8:12                       ` Alexandre Oliva
2019-12-25 12:07                         ` Eric S. Raymond
2019-12-25 12:24                           ` Segher Boessenkool
2019-12-25 14:16                             ` Joseph Myers
2019-12-25 18:50                             ` Eric S. Raymond
2019-12-25 19:18                               ` Segher Boessenkool
2019-12-26  6:09                           ` Alexandre Oliva
2019-12-26 11:04                             ` Joseph Myers
2019-12-26 11:17                               ` Jakub Jelinek
2019-12-26 12:10                                 ` Joseph Myers
2019-12-26 16:11                                 ` Maxim Kuvyrkov
2019-12-26 16:58                                   ` Joseph Myers
2019-12-26 18:36                                     ` Jakub Jelinek
2019-12-26 18:59                                       ` Joseph Myers
2019-12-27 11:21                                         ` Richard Earnshaw (lists)
2019-12-27 11:33                                           ` Andrew Pinski
2019-12-27 13:35                                             ` Segher Boessenkool
2019-12-27 11:35                                           ` Joseph Myers
2019-12-27 12:37                                             ` Richard Earnshaw (lists)
2019-12-28  2:27                                               ` Eric S. Raymond
2019-12-28 11:23                                                 ` Joseph Myers
2019-12-28 12:19                                             ` Segher Boessenkool
2019-12-28 17:11                                               ` Richard Earnshaw (lists)
2019-12-28 20:28                                                 ` Segher Boessenkool
2019-12-29  1:45                                                   ` Julien "FrnchFrgg" Rivaud
2019-12-29 10:41                                                     ` Segher Boessenkool
2019-12-29 11:02                                                       ` Richard Biener
2019-12-29 11:47                                                         ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 13:31                                                           ` Segher Boessenkool
2019-12-29 13:51                                                             ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 12:15                                                         ` Segher Boessenkool
2019-12-29 16:32                                                           ` Richard Earnshaw
2019-12-29 16:37                                                             ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 11:42                                                       ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 13:26                                                         ` Segher Boessenkool
2019-12-29 13:48                                                           ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 15:01                                                             ` Segher Boessenkool
2019-12-29 17:31                                                             ` Ian Lance Taylor via gcc
2019-12-30  0:31                                                               ` Julien "FrnchFrgg" Rivaud
2019-12-29 21:31                                                           ` Thomas Koenig
2019-12-29 23:57                                                             ` Jeff Law
2019-12-27 13:29                                           ` Segher Boessenkool
2019-12-26 20:31                                     ` Richard Biener
2019-12-27  1:32                                     ` Joseph Myers
2019-12-27 10:14                                       ` Maxim Kuvyrkov
2019-12-28  1:55                                         ` Eric S. Raymond
2019-12-29 18:31                                   ` Maxim Kuvyrkov [this message]
2019-12-29 18:55                                     ` Joseph Myers
2019-12-29 22:47                                       ` Eric S. Raymond
2019-12-29 23:00                                         ` Joseph Myers
2019-12-29 23:13                                           ` Segher Boessenkool
2019-12-30 15:36                                             ` Richard Earnshaw (lists)
2019-12-30 22:37                                               ` Segher Boessenkool
2019-12-30 22:58                                                 ` Joseph Myers
2019-12-31  0:23                                                   ` Segher Boessenkool
2019-12-31 12:48                                                     ` Segher Boessenkool
2019-12-31  3:09                                                   ` Eric S. Raymond
2019-12-29 22:24                                     ` Richard Earnshaw (lists)
2019-12-30  0:18                                       ` Joseph Myers
2019-12-30  0:44                                         ` Julien "FrnchFrgg" Rivaud
2019-12-30 12:39                                         ` Maxim Kuvyrkov
2019-12-30 13:01                                       ` Maxim Kuvyrkov
2019-12-30 15:31                                         ` Richard Earnshaw (lists)
2019-12-30 15:49                                           ` Maxim Kuvyrkov
2019-12-30 16:08                                             ` Richard Earnshaw (lists)
2020-01-02  2:59                                               ` Alexandre Oliva
2020-01-02 10:58                                                 ` Richard Earnshaw (lists)
2020-01-08 20:46                                               ` Maxim Kuvyrkov
2020-01-08 22:11                                                 ` Eric S. Raymond
2020-01-08 23:34                                                   ` Joseph Myers
2020-01-09  2:38                                                     ` Segher Boessenkool
2020-01-09 12:12                                                       ` Richard Earnshaw (lists)
2020-01-09 14:01                                                         ` Eric S. Raymond
2020-01-11 11:30                                                         ` Segher Boessenkool
2020-01-10  7:33                                                       ` Maxim Kuvyrkov
2020-01-10  9:49                                                         ` Richard Earnshaw (lists)
2020-01-10 11:38                                                           ` Richard Biener
2020-01-10 12:09                                                             ` Iain Sandoe
2020-01-10 13:11                                                               ` Joseph Myers
2020-01-10 12:53                                                             ` Nathan Sidwell
2020-01-10 14:13                                                               ` Martin Liška
2020-01-11 11:57                                                             ` Segher Boessenkool
2020-01-11 11:52                                                           ` Segher Boessenkool
2020-01-10 13:31                                                         ` Bernd Schmidt
2020-01-10 15:27                                                           ` Eric S. Raymond
2020-01-10 15:09                                                         ` Maxim Kuvyrkov
2020-01-10 15:16                                                           ` Joseph Myers
2020-01-10 15:33                                                             ` Maxim Kuvyrkov
2020-01-11  7:04                                                               ` Gerald Pfeifer
2020-01-09  5:07                                                     ` Jeff Law
2020-01-09 12:30                                                       ` Joseph Myers
2020-01-10 15:27                                                         ` Joseph Myers
2020-01-11  7:06                                                         ` Gerald Pfeifer
2020-01-14  8:21                                                         ` Jeff Law
2019-12-26 22:33                                 ` Joseph Myers
2019-12-26 19:16                             ` Eric S. Raymond
2019-12-26 20:08                               ` Alexandre Oliva
2019-12-26 20:28                                 ` Joseph Myers
2019-12-27 12:06                                   ` Alexandre Oliva
2019-12-27 12:21                                     ` Joseph Myers
2019-12-28  2:33                                       ` Eric S. Raymond
2019-12-26 21:19                                 ` Eric S. Raymond
2019-12-25 12:10                         ` Segher Boessenkool
2019-12-25 14:13                           ` Joseph Myers
2019-12-29 16:47                           ` Mark Wielaard
2019-12-29 22:42                             ` Joseph Myers
2019-12-16 16:27                   ` Eric S. Raymond
2019-12-16 16:47                     ` Segher Boessenkool
2019-12-16 16:04               ` Jeff Law
2019-12-16 16:37                 ` Eric S. Raymond
2019-12-16 16:47                   ` Jeff Law
2019-12-31 13:43                     ` Joseph Myers
2019-12-31 14:13                       ` Richard Earnshaw (lists)
2019-12-31 17:26                       ` Segher Boessenkool
2019-12-16 13:56             ` Joseph Myers
2019-12-16 14:17               ` Mark Wielaard
2019-12-16 16:29                 ` Joseph Myers
2019-12-16 13:53           ` Joseph Myers
2019-12-16 16:39             ` Jeff Law
2019-12-16 17:57               ` Richard Biener
2019-12-16 16:55         ` Jeff Law
2019-12-16 17:08           ` Joseph Myers
2019-12-16 19:15             ` Eric S. Raymond
2019-12-16 21:59             ` Segher Boessenkool
2019-12-16 22:14               ` Jeff Law
2019-12-16 22:42                 ` Segher Boessenkool
2019-12-16 23:26                   ` Joseph Myers
2019-12-16 23:44                     ` Eric S. Raymond
2019-12-18 18:07                   ` Jeff Law
2019-12-18 18:24                     ` Joseph Myers
2019-12-19  0:57                       ` Eric S. Raymond
2019-12-18 19:50                     ` Segher Boessenkool
2019-12-18 20:43                       ` Jeff Law
2019-12-20 16:28                         ` Segher Boessenkool
2019-12-19  2:34                       ` Unix philosopy vs. poor semantic locality Eric S. Raymond
2019-12-19  3:16                         ` Joseph Myers
2019-12-19  5:46                           ` Eric S. Raymond
2019-12-19  0:46                     ` Proposal for the transition timetable for the move to GIT Eric S. Raymond
2019-12-16 23:34                 ` Eric S. Raymond
2019-12-16 23:18               ` Joseph Myers
2019-12-16 23:19               ` Eric S. Raymond
2019-12-18 17:27                 ` Segher Boessenkool
2019-12-16 13:33       ` Segher Boessenkool
2019-09-19 17:04 ` Paul Koning
2019-10-25 14:02   ` Richard Earnshaw (lists)
2019-09-20 15:49 ` Jeff Law
2019-09-21  9:11   ` Segher Boessenkool
2019-09-21  9:39     ` Andreas Schwab
2019-09-21  9:51       ` Segher Boessenkool
2019-09-21 10:04         ` Andreas Schwab
2019-09-21  9:26 ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=155B5BFD-6ECF-4EBF-A38C-D6DD178FB497@linaro.org \
    --to=maxim.kuvyrkov@linaro.org \
    --cc=Richard.Earnshaw@arm.com \
    --cc=esr@thyrsus.com \
    --cc=gcc@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=jsm@polyomino.org.uk \
    --cc=law@redhat.com \
    --cc=mark@klomp.org \
    --cc=oliva@gnu.org \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).