public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Joseph Myers <joseph@codesourcery.com>
To: Bernd Schmidt <bernds_cb1@t-online.de>
Cc: "Eric S. Raymond" <esr@thyrsus.com>,
	Richard Biener	<richard.guenther@gmail.com>,
	Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>,
	"Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>,
	<gcc@gcc.gnu.org>
Subject: Re: Proposal for the transition timetable for the move to GIT
Date: Mon, 09 Dec 2019 20:45:00 -0000	[thread overview]
Message-ID: <alpine.DEB.2.21.1912092026420.816@digraph.polyomino.org.uk> (raw)
In-Reply-To: <14a615e8-05e7-8c2f-f240-2726441d22e1@t-online.de>

On Mon, 9 Dec 2019, Bernd Schmidt wrote:

> On 12/9/19 7:19 PM, Joseph Myers wrote:
> > 
> > For any conversion we're clearly going to need to run various validation
> > (comparing properties of the converted repository, such as contents at
> > branch tips, with expected values of those properties based on the SVN
> > repository) and fix issues shown up by that validation.  reposurgeon has
> > its own tools for such validation; I also intend to write some validation
> > scripts myself.
> 
> Would it be feasible to require that both conversions produce the same output
> repository to some degree? Can we just look at release tags and require that
> they have the same hash in both conversions, or are there good reasons why the
> two would produce different outputs?

The same hashes are not practical.  There are several areas where two 
perfectly correct conversions are still expected to have different 
contents because of subjective decisions and heuristics involved in the 
conversion.

If some alternative heuristic is found to be clearly better than an 
existing one in reposurgeon, so that it would be better for any project 
converting with reposurgeon, or if some preference in the GCC case can 
readily be represented as a configuration option to choose between 
different approaches, it makes sense to implement the improvements in 
reposurgeon so that any project with similar issues can benefit.  For 
example, see Richard's suggestions in reposurgeon issue 174 of two 
possible improvements to ChangeLog handling: disregarding ChangeLog data 
if a commit adds multiple ChangeLog entries by different authors, and 
specifing a wildcard to allow ChangeLog processing on ChangeLog* files to 
cover ChangeLog.<branch>.  GCC is hardly the last project converting from 
SVN to git, so we can benefit from the experiences of past conversions, 
and help contribute to having useful features available for future 
conversions.

Here are some cases for differences between two correct conversions:

* Tree contents should mostly be identical at any given commit, but 
reposurgeon deliberately produces a .gitignore with contents based on 
svn:ignore if the SVN tree contents don't have a .gitignore (we use 
--user-ignores to prefer the .gitignore file in SVN if it exists), and 
removes any .cvsignore file.

* The first parent of a commit should typically be the same between 
conversions, but (a) might be corrected in some way for cvs2svn issues, 
(b) might skip SVN commits that would translate into empty git commits, 
depending on the choices made for handling of such commits.

* Cases that give rise to no tree changes in a commit (which thus might 
not become a git commit at all depending on the choices made and whether 
they also don't change any merge information properties) include (a) 
branch or tag creation as an exact copy of some revision of some branch, 
(b) branch recreation as a copy, e.g. when trunk was deleted accidentally, 
(c) commits that in SVN only add or remove empty directories, as git does 
not store empty directories, (d) commits that in SVN just remove some file 
or directory and replace it with a copy from some revision of some branch 
that happens to have identical contents to the file or directory removed 
(yes, we do have commits like that in GCC SVN).

* Subsequent parents of a commit based on merge info handling may well 
have subjective differences between correct conversions.

* Commit messages might differ, both because of heuristics to improve 
them, like Richard's work on that, and because of different choices for 
how to represent the SVN revision number information in commit messages.

* Author and committer identifications, and commit timestamps (especially 
timezones, something git has, SVN doesn't and reposurgeon has a per-author 
map for) may vary because of different heuristics or author maps used, 
especially when there is no ChangeLog entry for a commit or the ChangeLog 
entry is in some way malformed or the commit adds ChangeLog entries for 
multiple changes with different authors.

-- 
Joseph S. Myers
joseph@codesourcery.com

  reply	other threads:[~2019-12-09 20:45 UTC|newest]

Thread overview: 198+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-17 12:02 Richard Earnshaw (lists)
2019-09-17 12:24 ` Richard Biener
2019-09-17 13:50   ` Richard Earnshaw (lists)
2019-09-17 16:35   ` Joseph Myers
2019-09-17 17:51     ` Richard Earnshaw (lists)
2019-09-17 16:33 ` Joseph Myers
2019-09-19 12:04 ` Janne Blomqvist
2019-09-19 14:43   ` Damian Rouson
2019-09-19 15:30     ` Janne Blomqvist
2019-10-25 14:10     ` Richard Earnshaw (lists)
2019-10-25 16:32       ` Jeff Law
2019-09-19 15:30   ` Richard Earnshaw (lists)
2019-09-19 15:49     ` Damian Rouson
2019-09-19 15:35 ` Maxim Kuvyrkov
2019-12-06 14:44   ` Maxim Kuvyrkov
2019-12-06 17:21     ` Eric S. Raymond
2019-12-06 17:39       ` Richard Biener
2019-12-06 19:46         ` Eric S. Raymond
2019-12-06 20:43           ` Sandra Loosemore
2019-12-07  2:57           ` Segher Boessenkool
2019-12-09 18:19           ` Joseph Myers
2019-12-09 18:40             ` Bernd Schmidt
2019-12-09 20:45               ` Joseph Myers [this message]
2019-12-09 22:12               ` Eric S. Raymond
2019-12-09 19:28             ` Eric S. Raymond
2019-12-11 14:40             ` Maxim Kuvyrkov
2019-12-11 15:03               ` Richard Earnshaw (lists)
2019-12-11 15:19                 ` Jonathan Wakely
2019-12-11 15:21                   ` Richard Earnshaw (lists)
2019-12-11 15:36                     ` Joseph Myers
2019-12-11 16:02                       ` Jonathan Wakely
2019-12-11 17:47                         ` Eric S. Raymond
2019-12-16  2:19                       ` Joseph Myers
2019-12-11 15:30                   ` Dennis Luehring
2019-12-11 15:36                     ` Richard Earnshaw
2019-12-11 17:36                   ` Eric S. Raymond
2019-12-06 20:49       ` Bernd Schmidt
2019-12-16  9:53     ` Mark Wielaard
2019-12-16 11:29       ` Joseph Myers
2019-12-16 12:43         ` Mark Wielaard
2019-12-16 13:36           ` Segher Boessenkool
2019-12-16 13:54             ` Eric S. Raymond
2019-12-16 14:05               ` Segher Boessenkool
2019-12-16 14:13                 ` Joseph Myers
2019-12-16 15:37                   ` Segher Boessenkool
2019-12-16 16:36                     ` Joseph Myers
2019-12-16 17:40                     ` Jeff Law
2019-12-25  8:12                       ` Alexandre Oliva
2019-12-25 12:07                         ` Eric S. Raymond
2019-12-25 12:24                           ` Segher Boessenkool
2019-12-25 14:16                             ` Joseph Myers
2019-12-25 18:50                             ` Eric S. Raymond
2019-12-25 19:18                               ` Segher Boessenkool
2019-12-26  6:09                           ` Alexandre Oliva
2019-12-26 11:04                             ` Joseph Myers
2019-12-26 11:17                               ` Jakub Jelinek
2019-12-26 12:10                                 ` Joseph Myers
2019-12-26 16:11                                 ` Maxim Kuvyrkov
2019-12-26 16:58                                   ` Joseph Myers
2019-12-26 18:36                                     ` Jakub Jelinek
2019-12-26 18:59                                       ` Joseph Myers
2019-12-27 11:21                                         ` Richard Earnshaw (lists)
2019-12-27 11:33                                           ` Andrew Pinski
2019-12-27 13:35                                             ` Segher Boessenkool
2019-12-27 11:35                                           ` Joseph Myers
2019-12-27 12:37                                             ` Richard Earnshaw (lists)
2019-12-28  2:27                                               ` Eric S. Raymond
2019-12-28 11:23                                                 ` Joseph Myers
2019-12-28 12:19                                             ` Segher Boessenkool
2019-12-28 17:11                                               ` Richard Earnshaw (lists)
2019-12-28 20:28                                                 ` Segher Boessenkool
2019-12-29  1:45                                                   ` Julien "FrnchFrgg" Rivaud
2019-12-29 10:41                                                     ` Segher Boessenkool
2019-12-29 11:02                                                       ` Richard Biener
2019-12-29 11:47                                                         ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 13:31                                                           ` Segher Boessenkool
2019-12-29 13:51                                                             ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 12:15                                                         ` Segher Boessenkool
2019-12-29 16:32                                                           ` Richard Earnshaw
2019-12-29 16:37                                                             ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 11:42                                                       ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 13:26                                                         ` Segher Boessenkool
2019-12-29 13:48                                                           ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 15:01                                                             ` Segher Boessenkool
2019-12-29 17:31                                                             ` Ian Lance Taylor via gcc
2019-12-30  0:31                                                               ` Julien "FrnchFrgg" Rivaud
2019-12-29 21:31                                                           ` Thomas Koenig
2019-12-29 23:57                                                             ` Jeff Law
2019-12-27 13:29                                           ` Segher Boessenkool
2019-12-26 20:31                                     ` Richard Biener
2019-12-27  1:32                                     ` Joseph Myers
2019-12-27 10:14                                       ` Maxim Kuvyrkov
2019-12-28  1:55                                         ` Eric S. Raymond
2019-12-29 18:31                                   ` Maxim Kuvyrkov
2019-12-29 18:55                                     ` Joseph Myers
2019-12-29 22:47                                       ` Eric S. Raymond
2019-12-29 23:00                                         ` Joseph Myers
2019-12-29 23:13                                           ` Segher Boessenkool
2019-12-30 15:36                                             ` Richard Earnshaw (lists)
2019-12-30 22:37                                               ` Segher Boessenkool
2019-12-30 22:58                                                 ` Joseph Myers
2019-12-31  0:23                                                   ` Segher Boessenkool
2019-12-31 12:48                                                     ` Segher Boessenkool
2019-12-31  3:09                                                   ` Eric S. Raymond
2019-12-29 22:24                                     ` Richard Earnshaw (lists)
2019-12-30  0:18                                       ` Joseph Myers
2019-12-30  0:44                                         ` Julien "FrnchFrgg" Rivaud
2019-12-30 12:39                                         ` Maxim Kuvyrkov
2019-12-30 13:01                                       ` Maxim Kuvyrkov
2019-12-30 15:31                                         ` Richard Earnshaw (lists)
2019-12-30 15:49                                           ` Maxim Kuvyrkov
2019-12-30 16:08                                             ` Richard Earnshaw (lists)
2020-01-02  2:59                                               ` Alexandre Oliva
2020-01-02 10:58                                                 ` Richard Earnshaw (lists)
2020-01-08 20:46                                               ` Maxim Kuvyrkov
2020-01-08 22:11                                                 ` Eric S. Raymond
2020-01-08 23:34                                                   ` Joseph Myers
2020-01-09  2:38                                                     ` Segher Boessenkool
2020-01-09 12:12                                                       ` Richard Earnshaw (lists)
2020-01-09 14:01                                                         ` Eric S. Raymond
2020-01-11 11:30                                                         ` Segher Boessenkool
2020-01-10  7:33                                                       ` Maxim Kuvyrkov
2020-01-10  9:49                                                         ` Richard Earnshaw (lists)
2020-01-10 11:38                                                           ` Richard Biener
2020-01-10 12:09                                                             ` Iain Sandoe
2020-01-10 13:11                                                               ` Joseph Myers
2020-01-10 12:53                                                             ` Nathan Sidwell
2020-01-10 14:13                                                               ` Martin Liška
2020-01-11 11:57                                                             ` Segher Boessenkool
2020-01-11 11:52                                                           ` Segher Boessenkool
2020-01-10 13:31                                                         ` Bernd Schmidt
2020-01-10 15:27                                                           ` Eric S. Raymond
2020-01-10 15:09                                                         ` Maxim Kuvyrkov
2020-01-10 15:16                                                           ` Joseph Myers
2020-01-10 15:33                                                             ` Maxim Kuvyrkov
2020-01-11  7:04                                                               ` Gerald Pfeifer
2020-01-09  5:07                                                     ` Jeff Law
2020-01-09 12:30                                                       ` Joseph Myers
2020-01-10 15:27                                                         ` Joseph Myers
2020-01-11  7:06                                                         ` Gerald Pfeifer
2020-01-14  8:21                                                         ` Jeff Law
2019-12-26 22:33                                 ` Joseph Myers
2019-12-26 19:16                             ` Eric S. Raymond
2019-12-26 20:08                               ` Alexandre Oliva
2019-12-26 20:28                                 ` Joseph Myers
2019-12-27 12:06                                   ` Alexandre Oliva
2019-12-27 12:21                                     ` Joseph Myers
2019-12-28  2:33                                       ` Eric S. Raymond
2019-12-26 21:19                                 ` Eric S. Raymond
2019-12-25 12:10                         ` Segher Boessenkool
2019-12-25 14:13                           ` Joseph Myers
2019-12-29 16:47                           ` Mark Wielaard
2019-12-29 22:42                             ` Joseph Myers
2019-12-16 16:27                   ` Eric S. Raymond
2019-12-16 16:47                     ` Segher Boessenkool
2019-12-16 16:04               ` Jeff Law
2019-12-16 16:37                 ` Eric S. Raymond
2019-12-16 16:47                   ` Jeff Law
2019-12-31 13:43                     ` Joseph Myers
2019-12-31 14:13                       ` Richard Earnshaw (lists)
2019-12-31 17:26                       ` Segher Boessenkool
2019-12-16 13:56             ` Joseph Myers
2019-12-16 14:17               ` Mark Wielaard
2019-12-16 16:29                 ` Joseph Myers
2019-12-16 13:53           ` Joseph Myers
2019-12-16 16:39             ` Jeff Law
2019-12-16 17:57               ` Richard Biener
2019-12-16 16:55         ` Jeff Law
2019-12-16 17:08           ` Joseph Myers
2019-12-16 19:15             ` Eric S. Raymond
2019-12-16 21:59             ` Segher Boessenkool
2019-12-16 22:14               ` Jeff Law
2019-12-16 22:42                 ` Segher Boessenkool
2019-12-16 23:26                   ` Joseph Myers
2019-12-16 23:44                     ` Eric S. Raymond
2019-12-18 18:07                   ` Jeff Law
2019-12-18 18:24                     ` Joseph Myers
2019-12-19  0:57                       ` Eric S. Raymond
2019-12-18 19:50                     ` Segher Boessenkool
2019-12-18 20:43                       ` Jeff Law
2019-12-20 16:28                         ` Segher Boessenkool
2019-12-19  2:34                       ` Unix philosopy vs. poor semantic locality Eric S. Raymond
2019-12-19  3:16                         ` Joseph Myers
2019-12-19  5:46                           ` Eric S. Raymond
2019-12-19  0:46                     ` Proposal for the transition timetable for the move to GIT Eric S. Raymond
2019-12-16 23:34                 ` Eric S. Raymond
2019-12-16 23:18               ` Joseph Myers
2019-12-16 23:19               ` Eric S. Raymond
2019-12-18 17:27                 ` Segher Boessenkool
2019-12-16 13:33       ` Segher Boessenkool
2019-09-19 17:04 ` Paul Koning
2019-10-25 14:02   ` Richard Earnshaw (lists)
2019-09-20 15:49 ` Jeff Law
2019-09-21  9:11   ` Segher Boessenkool
2019-09-21  9:39     ` Andreas Schwab
2019-09-21  9:51       ` Segher Boessenkool
2019-09-21 10:04         ` Andreas Schwab
2019-09-21  9:26 ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1912092026420.816@digraph.polyomino.org.uk \
    --to=joseph@codesourcery.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=bernds_cb1@t-online.de \
    --cc=esr@thyrsus.com \
    --cc=gcc@gcc.gnu.org \
    --cc=maxim.kuvyrkov@linaro.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).