public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Eric S. Raymond" <esr@thyrsus.com>
To: Alexandre Oliva <oliva@gnu.org>
Cc: Jeff Law <law@redhat.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	Joseph Myers <joseph@codesourcery.com>,
	Mark Wielaard <mark@klomp.org>,
	Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>,
	"Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>,
	gcc@gcc.gnu.org
Subject: Re: Proposal for the transition timetable for the move to GIT
Date: Wed, 25 Dec 2019 12:07:00 -0000	[thread overview]
Message-ID: <20191225120747.GA96669@thyrsus.com> (raw)
In-Reply-To: <or36d83jv0.fsf@livre.home>

Alexandre Oliva <oliva@gnu.org>:
> I know very little about reposurgeon, but I'm concerned that, should we
> make the conversion with it, and later identify e.g. missed branches, we
> might be unable to make such an incremental recovery.  Can anyone
> alleviate my concerns and let me know we could indeed make such an
> incremental recovery of a branch missed in the initial conversion, in
> such a way that its commit history would be shared with that of the
> already-converted branch it branched from?

Reposurgeon has a reparent command.  If you have determined that a
branch is detached or has an incorrect attachment point, patching the
metadata of the root node to fix that is very easy.

> Now, would it be too much of a burden to insist that the commit graphs
> out of both conversions be isomorphic, and maybe mappings between the
> commit ids (if they can't be made identical to begin with, that is) be
> generated and shared, so that the results of both conversions can be
> efficiently and mechanically compared (disregarding expected
> differences) not only in terms of branch and tag names and commit
> graphs, but also tree contents, commit messages and any other metadata?
> Has anything like this been done yet?

On the GCC repository, no. 

There are very serious practical problems with full verification of
git against SVN stemming mainly from the fact that Subversion checkout
on a respository of this size is extremely slow. IIRC Joseph at one
point estimated a check time on the order of months due to that
overhead alone.

If you're talking about a commit-by-commit comparison between two
conversions that assumes one or te other is correct, that is
theoretically possible and - because git retrieval is much faster -
could theoretically be done in a reasonable amount of time.  But there
is a lot of devil in the practical details.

The reposurgeon suite once included a tool for such comparisons.
Last year this happened:

commit b8a609925ba70a6b68f9eda1d748eb667ad2fa59
Author: Eric S. Raymond <esr@thyrsus.com>
Date:   Fri Aug 24 12:40:46 2018 -0400

    Retire repodiffer.  Its only use case was checks against git-svn...
    
    ...which we now know to make such bad conversions that on larger than trivial
    repos the differ would be prohibitively noisy.

Maxim's scripts probably make a better conversion than bare git-svn,
because he uses git-svn only for linear basic blocks and thereby
avoids its worst failure modes. In theory I could dust off repodiffer
and apply it.

That's in theory. In practice, on a repository this size I am not
greatly optimistic about getting a result that could be interpreted by
a Mark I brain.  The reasons go beyond git-svn's brain damage to the
same ontological-mismatch problems that make SVN-to-git conversion a 
headache in general.

You might think at least there'd be a 1:1 correspondence between
commits in the two conversions, but that's not going to be true for a
couple of different reasons.

1. Split commits. Reposurgeon decomposes these into pieces one per 
git branch.  I don't know what Maxim's scripts do.  I think Joseph turned
up that there are over a thousand of these in the GCC history.

2. There are three classes of commits in Subversion that don't really fit 
the git data model, (1) directory creation/deletion commits, (2) directory
copy commits, (3) property changes with no associated blob.

For each of these exceptional commits a converter to Git has a choice
of dropping the commit, turning it into some sort of annotated tag, or
leaving it in place as a zero-op commit (anomalous but not forbidden
in the git model). It is pretty much guaranteed that different
converters will make different choices about these, which will make
for huge amounts of noise in your attempt at a diff.

Checking for DAG isomorphism: again, theoretically possible,
practically pretty daunting.  It could be worse - general graph
isomorphism is not even known to be polynomial-time - but in this case
we can label corresponding commits with matching legacy IDs, which
should make possible an isomorphism check in linear time with a trivial
algorithm.

Well, except for split commits. That one would be solvable, albeit
painful.

The real problem here would be mergeinfo links.  It's not even obvious
what "correct" mapping of mergeinfo links is, in general, due to the
mismatch between Subversion's cherry-pick-based merge model and git's
branch merging.  Again, different converters will make different
choices. Reconciling them would be not fun.

There is another world of hurt lurking in "(disregarding expected
differences)".  How do you know what differences to expect? How are
you going to specify them?  What will interpret that spec?

There is more months of work here - nasty, wearing toil, with no
guarantee of a result with a decent signal-to-noise ratio.  Even
though I'm quite literally the best-qualified person on earth to do
it, I flinch at the thought.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


  reply	other threads:[~2019-12-25 12:07 UTC|newest]

Thread overview: 198+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-17 12:02 Richard Earnshaw (lists)
2019-09-17 12:24 ` Richard Biener
2019-09-17 13:50   ` Richard Earnshaw (lists)
2019-09-17 16:35   ` Joseph Myers
2019-09-17 17:51     ` Richard Earnshaw (lists)
2019-09-17 16:33 ` Joseph Myers
2019-09-19 12:04 ` Janne Blomqvist
2019-09-19 14:43   ` Damian Rouson
2019-09-19 15:30     ` Janne Blomqvist
2019-10-25 14:10     ` Richard Earnshaw (lists)
2019-10-25 16:32       ` Jeff Law
2019-09-19 15:30   ` Richard Earnshaw (lists)
2019-09-19 15:49     ` Damian Rouson
2019-09-19 15:35 ` Maxim Kuvyrkov
2019-12-06 14:44   ` Maxim Kuvyrkov
2019-12-06 17:21     ` Eric S. Raymond
2019-12-06 17:39       ` Richard Biener
2019-12-06 19:46         ` Eric S. Raymond
2019-12-06 20:43           ` Sandra Loosemore
2019-12-07  2:57           ` Segher Boessenkool
2019-12-09 18:19           ` Joseph Myers
2019-12-09 18:40             ` Bernd Schmidt
2019-12-09 20:45               ` Joseph Myers
2019-12-09 22:12               ` Eric S. Raymond
2019-12-09 19:28             ` Eric S. Raymond
2019-12-11 14:40             ` Maxim Kuvyrkov
2019-12-11 15:03               ` Richard Earnshaw (lists)
2019-12-11 15:19                 ` Jonathan Wakely
2019-12-11 15:21                   ` Richard Earnshaw (lists)
2019-12-11 15:36                     ` Joseph Myers
2019-12-11 16:02                       ` Jonathan Wakely
2019-12-11 17:47                         ` Eric S. Raymond
2019-12-16  2:19                       ` Joseph Myers
2019-12-11 15:30                   ` Dennis Luehring
2019-12-11 15:36                     ` Richard Earnshaw
2019-12-11 17:36                   ` Eric S. Raymond
2019-12-06 20:49       ` Bernd Schmidt
2019-12-16  9:53     ` Mark Wielaard
2019-12-16 11:29       ` Joseph Myers
2019-12-16 12:43         ` Mark Wielaard
2019-12-16 13:36           ` Segher Boessenkool
2019-12-16 13:54             ` Eric S. Raymond
2019-12-16 14:05               ` Segher Boessenkool
2019-12-16 14:13                 ` Joseph Myers
2019-12-16 15:37                   ` Segher Boessenkool
2019-12-16 16:36                     ` Joseph Myers
2019-12-16 17:40                     ` Jeff Law
2019-12-25  8:12                       ` Alexandre Oliva
2019-12-25 12:07                         ` Eric S. Raymond [this message]
2019-12-25 12:24                           ` Segher Boessenkool
2019-12-25 14:16                             ` Joseph Myers
2019-12-25 18:50                             ` Eric S. Raymond
2019-12-25 19:18                               ` Segher Boessenkool
2019-12-26  6:09                           ` Alexandre Oliva
2019-12-26 11:04                             ` Joseph Myers
2019-12-26 11:17                               ` Jakub Jelinek
2019-12-26 12:10                                 ` Joseph Myers
2019-12-26 16:11                                 ` Maxim Kuvyrkov
2019-12-26 16:58                                   ` Joseph Myers
2019-12-26 18:36                                     ` Jakub Jelinek
2019-12-26 18:59                                       ` Joseph Myers
2019-12-27 11:21                                         ` Richard Earnshaw (lists)
2019-12-27 11:33                                           ` Andrew Pinski
2019-12-27 13:35                                             ` Segher Boessenkool
2019-12-27 11:35                                           ` Joseph Myers
2019-12-27 12:37                                             ` Richard Earnshaw (lists)
2019-12-28  2:27                                               ` Eric S. Raymond
2019-12-28 11:23                                                 ` Joseph Myers
2019-12-28 12:19                                             ` Segher Boessenkool
2019-12-28 17:11                                               ` Richard Earnshaw (lists)
2019-12-28 20:28                                                 ` Segher Boessenkool
2019-12-29  1:45                                                   ` Julien "FrnchFrgg" Rivaud
2019-12-29 10:41                                                     ` Segher Boessenkool
2019-12-29 11:02                                                       ` Richard Biener
2019-12-29 11:47                                                         ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 13:31                                                           ` Segher Boessenkool
2019-12-29 13:51                                                             ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 12:15                                                         ` Segher Boessenkool
2019-12-29 16:32                                                           ` Richard Earnshaw
2019-12-29 16:37                                                             ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 11:42                                                       ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 13:26                                                         ` Segher Boessenkool
2019-12-29 13:48                                                           ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 15:01                                                             ` Segher Boessenkool
2019-12-29 17:31                                                             ` Ian Lance Taylor via gcc
2019-12-30  0:31                                                               ` Julien "FrnchFrgg" Rivaud
2019-12-29 21:31                                                           ` Thomas Koenig
2019-12-29 23:57                                                             ` Jeff Law
2019-12-27 13:29                                           ` Segher Boessenkool
2019-12-26 20:31                                     ` Richard Biener
2019-12-27  1:32                                     ` Joseph Myers
2019-12-27 10:14                                       ` Maxim Kuvyrkov
2019-12-28  1:55                                         ` Eric S. Raymond
2019-12-29 18:31                                   ` Maxim Kuvyrkov
2019-12-29 18:55                                     ` Joseph Myers
2019-12-29 22:47                                       ` Eric S. Raymond
2019-12-29 23:00                                         ` Joseph Myers
2019-12-29 23:13                                           ` Segher Boessenkool
2019-12-30 15:36                                             ` Richard Earnshaw (lists)
2019-12-30 22:37                                               ` Segher Boessenkool
2019-12-30 22:58                                                 ` Joseph Myers
2019-12-31  0:23                                                   ` Segher Boessenkool
2019-12-31 12:48                                                     ` Segher Boessenkool
2019-12-31  3:09                                                   ` Eric S. Raymond
2019-12-29 22:24                                     ` Richard Earnshaw (lists)
2019-12-30  0:18                                       ` Joseph Myers
2019-12-30  0:44                                         ` Julien "FrnchFrgg" Rivaud
2019-12-30 12:39                                         ` Maxim Kuvyrkov
2019-12-30 13:01                                       ` Maxim Kuvyrkov
2019-12-30 15:31                                         ` Richard Earnshaw (lists)
2019-12-30 15:49                                           ` Maxim Kuvyrkov
2019-12-30 16:08                                             ` Richard Earnshaw (lists)
2020-01-02  2:59                                               ` Alexandre Oliva
2020-01-02 10:58                                                 ` Richard Earnshaw (lists)
2020-01-08 20:46                                               ` Maxim Kuvyrkov
2020-01-08 22:11                                                 ` Eric S. Raymond
2020-01-08 23:34                                                   ` Joseph Myers
2020-01-09  2:38                                                     ` Segher Boessenkool
2020-01-09 12:12                                                       ` Richard Earnshaw (lists)
2020-01-09 14:01                                                         ` Eric S. Raymond
2020-01-11 11:30                                                         ` Segher Boessenkool
2020-01-10  7:33                                                       ` Maxim Kuvyrkov
2020-01-10  9:49                                                         ` Richard Earnshaw (lists)
2020-01-10 11:38                                                           ` Richard Biener
2020-01-10 12:09                                                             ` Iain Sandoe
2020-01-10 13:11                                                               ` Joseph Myers
2020-01-10 12:53                                                             ` Nathan Sidwell
2020-01-10 14:13                                                               ` Martin Liška
2020-01-11 11:57                                                             ` Segher Boessenkool
2020-01-11 11:52                                                           ` Segher Boessenkool
2020-01-10 13:31                                                         ` Bernd Schmidt
2020-01-10 15:27                                                           ` Eric S. Raymond
2020-01-10 15:09                                                         ` Maxim Kuvyrkov
2020-01-10 15:16                                                           ` Joseph Myers
2020-01-10 15:33                                                             ` Maxim Kuvyrkov
2020-01-11  7:04                                                               ` Gerald Pfeifer
2020-01-09  5:07                                                     ` Jeff Law
2020-01-09 12:30                                                       ` Joseph Myers
2020-01-10 15:27                                                         ` Joseph Myers
2020-01-11  7:06                                                         ` Gerald Pfeifer
2020-01-14  8:21                                                         ` Jeff Law
2019-12-26 22:33                                 ` Joseph Myers
2019-12-26 19:16                             ` Eric S. Raymond
2019-12-26 20:08                               ` Alexandre Oliva
2019-12-26 20:28                                 ` Joseph Myers
2019-12-27 12:06                                   ` Alexandre Oliva
2019-12-27 12:21                                     ` Joseph Myers
2019-12-28  2:33                                       ` Eric S. Raymond
2019-12-26 21:19                                 ` Eric S. Raymond
2019-12-25 12:10                         ` Segher Boessenkool
2019-12-25 14:13                           ` Joseph Myers
2019-12-29 16:47                           ` Mark Wielaard
2019-12-29 22:42                             ` Joseph Myers
2019-12-16 16:27                   ` Eric S. Raymond
2019-12-16 16:47                     ` Segher Boessenkool
2019-12-16 16:04               ` Jeff Law
2019-12-16 16:37                 ` Eric S. Raymond
2019-12-16 16:47                   ` Jeff Law
2019-12-31 13:43                     ` Joseph Myers
2019-12-31 14:13                       ` Richard Earnshaw (lists)
2019-12-31 17:26                       ` Segher Boessenkool
2019-12-16 13:56             ` Joseph Myers
2019-12-16 14:17               ` Mark Wielaard
2019-12-16 16:29                 ` Joseph Myers
2019-12-16 13:53           ` Joseph Myers
2019-12-16 16:39             ` Jeff Law
2019-12-16 17:57               ` Richard Biener
2019-12-16 16:55         ` Jeff Law
2019-12-16 17:08           ` Joseph Myers
2019-12-16 19:15             ` Eric S. Raymond
2019-12-16 21:59             ` Segher Boessenkool
2019-12-16 22:14               ` Jeff Law
2019-12-16 22:42                 ` Segher Boessenkool
2019-12-16 23:26                   ` Joseph Myers
2019-12-16 23:44                     ` Eric S. Raymond
2019-12-18 18:07                   ` Jeff Law
2019-12-18 18:24                     ` Joseph Myers
2019-12-19  0:57                       ` Eric S. Raymond
2019-12-18 19:50                     ` Segher Boessenkool
2019-12-18 20:43                       ` Jeff Law
2019-12-20 16:28                         ` Segher Boessenkool
2019-12-19  2:34                       ` Unix philosopy vs. poor semantic locality Eric S. Raymond
2019-12-19  3:16                         ` Joseph Myers
2019-12-19  5:46                           ` Eric S. Raymond
2019-12-19  0:46                     ` Proposal for the transition timetable for the move to GIT Eric S. Raymond
2019-12-16 23:34                 ` Eric S. Raymond
2019-12-16 23:18               ` Joseph Myers
2019-12-16 23:19               ` Eric S. Raymond
2019-12-18 17:27                 ` Segher Boessenkool
2019-12-16 13:33       ` Segher Boessenkool
2019-09-19 17:04 ` Paul Koning
2019-10-25 14:02   ` Richard Earnshaw (lists)
2019-09-20 15:49 ` Jeff Law
2019-09-21  9:11   ` Segher Boessenkool
2019-09-21  9:39     ` Andreas Schwab
2019-09-21  9:51       ` Segher Boessenkool
2019-09-21 10:04         ` Andreas Schwab
2019-09-21  9:26 ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191225120747.GA96669@thyrsus.com \
    --to=esr@thyrsus.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc@gcc.gnu.org \
    --cc=joseph@codesourcery.com \
    --cc=law@redhat.com \
    --cc=mark@klomp.org \
    --cc=maxim.kuvyrkov@linaro.org \
    --cc=oliva@gnu.org \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).