public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Jason Merrill <jason@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git
Date: Tue, 16 Jul 2019 10:21:00 -0000	[thread overview]
Message-ID: <4E46E435-F95C-46AD-87F0-8220D2BF4CD4@linaro.org> (raw)
In-Reply-To: <8C62F814-2F57-4D1A-B66F-5C5ACFF37D6C@linaro.org>

Hi Everyone,

I've been swamped with other projects for most of June, which gave me time to digest all the feedback I've got on GCC's conversion from SVN to Git.

The scripts have heavily evolved from the initial version posted here.  They have become fairly generic in that they have no implied knowledge about GCC's repo structure.  Due to this I no longer plan to merge them into GCC tree, but rather publish as a separate project on github.  For now, you can track the current [hairy] version at https://review.linaro.org/c/toolchain/gcc/+/31416 .

The initial version of scripts used heuristics to construct branch tree, which turned out to be error-prone.  The current version parse entire history of SVN repo to detect all trees that start at /trunk@1.  Therefore all branches in the converted repo converge to the same parent at the beginning of their histories.

As far as GCC conversion goes, below is what I plan to do and what not to do.  This is based on comments from everyone in this thread:

1. Construct GCC's git repo from SVN using same settings as current git mirror.
2. Compare the resulting git repo with current GCC mirror -- they should match on the commit hash level for trunk, branches/gcc-*-branch, and other "normal" branches.
3. Investigate any differences between converted GCC repo and current GCC mirror.  These can be due to bugs in git-svn or other misconfigurations.
4. Import git-only branches from current GCC mirror.
5. Publish this "raw" repo for community to sanity-check its contents.
6. Re-write history of all branches -- converted from svn and git-only -- see note below [*].
7. Publish this "pretty" repo for community to sanity-check its contents.
8. Update both "raw" and "pretty" repos daily with new commits
9. Fix problems in the "raw" and "pretty" repos as they reported by the community.

Once these steps are done, the community could switch from SVN to git by disabling commits to SVN, waiting for final history to be absorbed by the "pretty" repo, and deploying the git repo as the official repo.

[*] Note on branch re-writing:
During svn->git conversion we have an opportunity to correct some of the artifacts of current git mirror:

a. Author and committer entries.  These are difficult to get right during git-svn import process because the tool gives only SVN committer ID without much else.  We could do much better by matching SVN committer ID with person's name in the map file, and then searching for person's current-at-the-time email address in the commit diff.  I.e., mkuvyrkov -> Maxim Kuvyrkov -> [changelog from 2010's commit] -> maxim@codesourcery.com .

b. Re-write tags/ branches into annotated tags.  Note that tags/* are included into history of several branches via merge or copy commits, so we would need to re-write history to have proper references to annotated tag commits in the histories of such branches.

c. Since we are re-writing history anyway, it would be nice to convert "svn-git: svn+ssh://" tags to "svn-git: https://".  We are sure to retain publicly-visible svn repo accessible via https://, but not as likely to retain svn+ssh:// interface.

Which of these will make into the final repo is for community to decide.

Regards,

--
Maxim Kuvyrkov
www.linaro.org



> On May 28, 2019, at 1:31 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
> 
> Hi Everyone,
> 
> What can I say, I was too optimistic about how easy it would be to convert GCC's svn repo to git one branch at a time.  After 2 more weeks and several re-writes of the scripts I now know more about GCC's svn history than I would ever wanted.
> 
> The prize for most complicated branch history goes to /branches/ibm/* .  It has merges, it has re-creation branches from /trunk and even an accidental deletion of all of IBM's branches.
> 
> The version of scripts I'm testing right now seems to deal with all of that.
> 
> Also, to avoid controversy -- I'm working on these scripts to satisfy my own curiosity, and to give GCC community another option to choose from for the final migration.  If by end of Summer 2019 we have 2-3 git repos to choose from, then we are likely to push GCC [kicking and screaming] into 2010's by the end of this decade.
> 
> --
> Maxim Kuvyrkov
> www.linaro.org
> 
> 
> 
>> On May 14, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
>> 
>> This patch adds scripts to contrib/ to migrate full history of GCC's subversion repository to git.  My hope is that these scripts will finally allow GCC project to migrate to Git.
>> 
>> The result of the conversion is at https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with "@rev" suffixes represent branch points.  The conversion is still running, so not all branches may appear right away.
>> 
>> The scripts are not specific to GCC repo and are usable for other projects.  In particular, they should be able to convert downstream GCC svn repos.
>> 
>> The scripts convert svn history branch by branch.  They rely on git-svn on convert individual branches.  Git-svn is a good tool for converting individual branches.  It is, however, either very slow at converting the entire GCC repo, or goes into infinite loop.
>> 
>> There are 3 scripts:
>> 
>> - svn-git-repo.sh: top level script to convert entire repo or a part of it (e.g., branches/),
>> - svn-list-branches.sh: helper script to output branches and their parents in bottom-up order,
>> - svn-git-branch.sh: helper script to convert a single branch.
>> 
>> Whenever possible, svn-git-branch.sh uses existing git branches as caches.
>> 
>> What are your questions and comments?
>> 
>> The attached is cleaned up version, which hasn't been fully tested yet; typos and other silly mistakes are likely.  OK to commit after testing?
>> 
>> --
>> Maxim Kuvyrkov
>> www.linaro.org
>> 
>> 
>> <0001-Contrib-SVN-Git-conversion-scripts.patch>
> 

  reply	other threads:[~2019-07-16 10:18 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-14 16:11 Maxim Kuvyrkov
2019-05-14 21:20 ` Segher Boessenkool
2019-05-15  8:34   ` Maxim Kuvyrkov
2019-05-15 18:47     ` Segher Boessenkool
2019-05-16  9:44       ` Maxim Kuvyrkov
2019-05-15 11:19 ` Richard Biener
2019-05-15 12:08   ` Maxim Kuvyrkov
2019-05-15 18:42     ` Eric Gallager
2019-05-16  0:33       ` Paul Koning
2019-05-16  9:53         ` Maxim Kuvyrkov
2019-05-16 16:22   ` Jeff Law
2019-05-16 16:40     ` Maxim Kuvyrkov
2019-05-16 18:36       ` Ramana Radhakrishnan
2019-05-16 19:07         ` Jeff Law
2019-05-16 22:04           ` Jonathan Wakely
2019-05-17 11:33             ` Martin Liška
2019-05-16 23:54       ` Joseph Myers
2019-05-17  8:19         ` Richard Sandiford
2019-05-17 19:51           ` Segher Boessenkool
2019-05-17 20:59             ` Steve Ellcey
2019-05-17 21:23             ` Jason Merrill
2019-05-20 22:42           ` Joseph Myers
2019-05-21 14:24             ` Richard Earnshaw (lists)
2019-05-21 14:45               ` Jeff Law
2019-05-21 15:02                 ` Richard Earnshaw (lists)
2019-05-21 16:44             ` Segher Boessenkool
2019-05-23 22:33               ` Joseph Myers
2019-05-24  8:58                 ` Segher Boessenkool
2019-05-24 12:02                   ` Florian Weimer
2019-05-29  1:50                   ` Joseph Myers
2019-05-29 13:04                     ` Segher Boessenkool
2019-05-31  0:16                       ` Joseph Myers
2019-06-02 23:13                         ` Segher Boessenkool
2019-06-03 22:33                           ` Joseph Myers
2019-06-03 22:49                             ` Segher Boessenkool
2019-06-05 18:04                             ` Jason Merrill
2019-06-06 10:14                               ` Richard Earnshaw (lists)
2019-06-06 23:41                                 ` Joseph Myers
2019-06-06 23:50                                   ` Ian Lance Taylor
2019-06-07  9:32                                     ` Richard Earnshaw (lists)
2019-06-06 23:36                               ` Joseph Myers
2019-07-22  9:05                                 ` Maxim Kuvyrkov
2019-05-16 23:06 ` Joseph Myers
2019-05-17 12:22   ` Martin Liška
2019-05-17 12:39     ` Jakub Jelinek
2019-05-19  7:35       ` Martin Liška
2019-05-19  8:11         ` Segher Boessenkool
2019-05-19 19:21           ` Marek Polacek
2019-05-19 19:46             ` Andreas Schwab
2019-05-19 19:54             ` Segher Boessenkool
2019-05-19 20:01               ` Andrew Pinski
2019-05-19 20:06                 ` Marek Polacek
2019-05-20  7:29                   ` Martin Liška
2019-05-20 13:56                 ` Florian Weimer
2019-05-20 14:18                   ` Segher Boessenkool
2019-05-20 14:25                   ` Jakub Jelinek
2019-05-20 14:26                   ` Andreas Schwab
2019-05-20 14:29                     ` Jakub Jelinek
2019-05-20 14:36                       ` Andreas Schwab
2019-05-20 15:04                       ` Segher Boessenkool
2019-05-17 14:59     ` Maxim Kuvyrkov
2019-05-19  7:09       ` Martin Liška
2019-05-17 14:56   ` Maxim Kuvyrkov
2019-05-17 13:07 ` Jason Merrill
2019-05-17 15:08   ` Maxim Kuvyrkov
2019-05-20 22:48   ` Joseph Myers
2019-05-28 10:44 ` Maxim Kuvyrkov
2019-07-16 10:21   ` Maxim Kuvyrkov [this message]
2019-07-16 12:40     ` Jason Merrill
2019-07-16 14:27       ` Maxim Kuvyrkov
2019-07-20 11:24         ` Maxim Kuvyrkov
2019-07-22  9:35         ` Maxim Kuvyrkov
2019-08-01 20:43           ` Jason Merrill
2019-08-02  8:41             ` Maxim Kuvyrkov
2019-08-02  8:57               ` Richard Biener
2019-08-02 10:27               ` Martin Liška
2019-08-02 10:54                 ` Maxim Kuvyrkov
2019-08-02 11:01                   ` Martin Liška
2019-08-02 11:06                     ` Richard Biener
2019-08-02 11:35                       ` Martin Liška
2019-08-02 22:31                         ` Jason Merrill
2019-08-05 13:20                           ` Martin Liška
2019-08-05 15:20                             ` Monotonically increasing counter (was Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git) Jason Merrill
2019-08-05 15:34                               ` Jakub Jelinek
2019-08-05 15:45                                 ` Richard Earnshaw (lists)
2019-08-05 18:22                                 ` Jason Merrill
2019-08-14 18:49                                   ` Jason Merrill
2019-09-19 19:29                                     ` Jason Merrill
2019-09-21 18:18                                       ` Segher Boessenkool
2019-09-21 20:31                                         ` Nicholas Krause
2019-09-21 21:32                                         ` Jason Merrill
2019-09-22  0:20                                           ` Segher Boessenkool
2019-08-02 14:35                       ` [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git Segher Boessenkool
2019-08-02 14:55                       ` Maxim Kuvyrkov
2019-08-05 16:43                       ` Mike Stump
2019-08-05  8:24               ` Maxim Kuvyrkov
2019-08-06 11:16                 ` Maxim Kuvyrkov
2019-08-23  8:27                   ` Maxim Kuvyrkov
2019-08-23 22:08                     ` Joseph Myers
2019-09-13  7:20                       ` Maxim Kuvyrkov
2019-08-02  8:35           ` Maxim Kuvyrkov
2019-08-02 14:14             ` Maxim Kuvyrkov
2019-08-02 15:47               ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E46E435-F95C-46AD-87F0-8220D2BF4CD4@linaro.org \
    --to=maxim.kuvyrkov@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jason@redhat.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).