public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Test GCC conversion with reposurgeon available
@ 2020-01-06 22:09 Loren James Rittle
  2020-01-07  9:35 ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 54+ messages in thread
From: Loren James Rittle @ 2020-01-06 22:09 UTC (permalink / raw)
  To: gcc

On Fri, 3 Jan 2020, Joseph Myers wrote:

> git+ssh://gcc.gnu.org/home/gccadmin/gcc-reposurgeon-7a.git
> git+ssh://gcc.gnu.org/home/gccadmin/gcc-reposurgeon-7b.git

I have not had a substantial commit to gcc [or, likely, post to this
list] in a decade THUS a warm howdy to anyone still around from
1999-2009.  Every "git log" entry for my commits looked fine.  There
were two odd cases, but both now make sense to me.  My first commit's
Author line (1999) contains a physical machine name (but it correctly
matches the contemporary Changelog entry).  The second odd case took
more time to understand:

In gcc-reposurgeon-7a, commit hash 16fc918929 ; I corrected a prior
Changelog (for 89903bf801).  I would not expect any version control
system translation process to look ahead for such textual changes to a
side-maintained ChangeLog file but I was somewhat stumped on why the
ChangeLog text did not match the "git log" text.

Regards,
Loren

^ permalink raw reply	[flat|nested] 54+ messages in thread
* Test GCC conversion with reposurgeon available
@ 2019-12-17 21:32 Joseph Myers
  2019-12-17 23:33 ` Bernd Schmidt
                   ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Joseph Myers @ 2019-12-17 21:32 UTC (permalink / raw)
  To: gcc; +Cc: esr

I've made test conversions of the GCC repository with reposurgeon
available (gcc.gnu.org / sourceware.org account required to access
these git+ssh repositories, it doesn't need to be one in the gcc group
or to have shell access).  More information about the repositories,
conversion choices made and known issues is given below, and, as noted
there, I'm running another conversion now with fixes for some of those
issues and the remaining listed issues not fixed in that conversion
are being actively worked on.

git+ssh://gcc.gnu.org/home/gccadmin/gcc-reposurgeon-1a.git
git+ssh://gcc.gnu.org/home/gccadmin/gcc-reposurgeon-1b.git

The two repositories have exactly the same objects (thus, exactly the
same commit graph).  The only difference is that the 1a conversion has
branches and tags named the same as in SVN (or as similarly as
possible; tags in branches/st/tags/ in SVN become tags in
refs/tags/st/ in git), whereas the 1b conversion has refs rearranged
as suggested by Richard (meaning most are not fetched by default, so
you may wish to clone with --mirror to inspect them more closely).  We
can of course do a different rearrangement if desired.

The repositories also include refs/deleted refs for each commit that
deleted a tag or branch in SVN (to be precise, the ref points to a
commit deleting all the tag or branch contents, so preserving the
original commit message for the deletion; its parent is thus the final
state of the tag or branch before deletion).  We may or may not want
these in the final conversion, but it seems useful to have them at
this point for verification purposes (in particular, I intend to
implement a check that the final state of each tag or branch before
deletion is correct, as a further check that the conversion machinery
is working correctly).

The repositories don't include refs for the version of history from
the old git-svn mirror, but I have a script to add them (in
refs/git-old/ and refs/git-svn-old/) for the benefit of people wishing
to interpret old commit hashes after the conversion and to make things
more convenient for people wishing to rebase active git-only branches
onto the new version of the history.  The script is independent of
reposurgeon; it's just a single "git fetch" command (which should be
followed by "git gc --aggressive").

The repositories include all the non-deleted branches and tags in the
SVN repository (and, outside refs/deleted/, that is the exact set of
branches and tags present).  For this purpose, the file
branches/st/README in the SVN repository is considered to have its own
branch.  reposurgeon generates a "root" branch for commits to paths
not part of any branch; this is not included in these repositories
(has been deleted at the git level) because I don't believe it
contains anything plausibly relevant in git.  The commits to
branches/st/README were moved to their own branch, as noted; all other
commits that end up in "root" are either commits wrongly creating a
branch or tag at top level rather than /branches or /tags, commits
deleting such branches or tags created in the wrong place, or changes
to the SVN /hooks directory.

As far as I know, all issues affecting commit tree contents have been
fixed, as have some previously noted issues with some merge commits
having too many parents, and incorrect attributions seen in an earlier
conversion of Richard's.  Tree contents are verified correct at every
non-deleted branch tip and tag (I intend to do such validation for
deleted branches and tags as well, but haven't yet implemented it).
For comparisons, the following methodology applies: empty directories
are removed from the SVN checkout, because git doesn't store empty
directories; .cvsignore files are excluded from the comparison, since
reposurgeon doesn't include them (but if people want them in the git
history, they could easily be included); .gitignore files are excluded
from the comparison, since reposurgeon generates one based on
svn:ignore properties (or SVN defaults, in the absence of such
properties) where the repository doesn't have one checked in (where
there *is* a .gitignore file checked into SVN, it's preferred over the
auto-generated one); cases of SVN keyword expansion are excluded
manually (only two branches have files with SVN keyword expansion
enabled).

Every branch with SVN ancestry based on the first commit of /trunk has
first-parent ancestry in git going back to that commit, as expected.
This includes libstdcxx_so_7-2-branch, which was created via creating
the directory for the branch and then copying only the libstdc++-v3
subdirectory from trunk rather than directly copying the whole of
/trunk to /branches/libstdcxx_so_7-2-branch; reposurgeon has detected
that case automatically and created an appropriate parent link to the
relevant trunk commit.

Some parts of Richard's commit message improvements are present, but
most aren't because of an issue accessing Bugzilla (which also
affected some of the improvements not involving accessing Bugzilla, as
the script terminated early).  This should be fixed for my next
conversion run.  As discussed, Richard's improvements only add new
summary lines, with the original commit message following them.


Known issues (all either already fixed or understood and currently
being worked on):

1. Some cherry-picks are showing up as merges (this is the only issue
I could find in my checks, manual and automated, that affects the
commit graph; I couldn't find any issues affecting tree contents,
first-parent ancestry or the set of refs present).  Being worked on by
Julien "_FrnchFrgg_" RIVAUD.

2. Branch creation or recreation commits have attribution taken from
some ChangeLog file in the branch when it should come from the SVN
committer.  Being worked on by Eric S. Raymond.

3. There are still some merge commits with too many parents, although
the cases Richard found have all been fixed (and all those parents in
the cases I found are genuinely ancestors of the merge commit in
question, so it's essentially a cosmetic issue that there are some
that are redundant - it won't affect anything in default "git log"
output other than the "Merge:" line, for example, as "git log" orders
by commit timestamp by default).  Being worked on by Julien
"_FrnchFrgg_" RIVAUD.

4. Only files called ChangeLog are used to extract attributions, not
ChangeLog.<branch> (fixed for my next conversion run, currently
running at the git-fast-import stage).

5. Most of Richard's commit message improvements aren't present (fixed
for my next conversion run, currently running at the git-fast-import
stage).


Points for consideration:

1. Do we want some kind of rearrangement of refs as in the 1b
repository or not?

2. Should the final converted repository contain refs/deleted/ refs or
not?

3. Where an attribution comes from an author map rather than a
ChangeLog file, do we wish to use the existing author map or do people
prefer using names from that map but with @gcc.gnu.org addresses (and
@gnu.org for usernames that only committed in the gcc2 period)?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2020-01-09 21:57 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-06 22:09 Test GCC conversion with reposurgeon available Loren James Rittle
2020-01-07  9:35 ` Richard Earnshaw (lists)
2020-01-07 15:53   ` Loren James Rittle
  -- strict thread matches above, loose matches on Subject: below --
2019-12-17 21:32 Joseph Myers
2019-12-17 23:33 ` Bernd Schmidt
2019-12-18  0:51   ` Eric S. Raymond
2019-12-18  0:52   ` Joseph Myers
2019-12-18  3:28     ` Joseph Myers
2019-12-18 14:36       ` Joseph Myers
2019-12-18 13:10 ` Jason Merrill
2019-12-18 18:16   ` Joseph Myers
2019-12-19  5:50     ` Jason Merrill
2019-12-19 15:55       ` Joseph Myers
2019-12-18 21:55 ` Joseph Myers
2019-12-19  0:36   ` Bernd Schmidt
2019-12-19  0:58     ` Joseph Myers
2019-12-19 16:29   ` Joseph Myers
2019-12-22 13:57     ` Joseph Myers
2019-12-23 17:27       ` Roman Zhuykov
2019-12-24 11:50         ` Joseph Myers
2019-12-24 15:55           ` Segher Boessenkool
2019-12-24 17:17             ` Joseph Myers
2019-12-24 18:14               ` Segher Boessenkool
2019-12-25 11:03                 ` Roman Zhuykov
2019-12-25 11:20                   ` Joseph Myers
2019-12-25 12:23                     ` Eric S. Raymond
2019-12-25 14:32                   ` Andreas Schwab
2019-12-25 14:41                     ` Joseph Myers
2019-12-25 15:10                       ` Andreas Schwab
2019-12-25 15:36                         ` Joseph Myers
2019-12-25 17:15                           ` Segher Boessenkool
2019-12-25 19:33                             ` Eric S. Raymond
2019-12-26 21:03                               ` Vincent Lefevre
2019-12-26 21:31                                 ` Eric S. Raymond
2019-12-26 22:25                                   ` Toon Moene
2019-12-26 22:32                                     ` Eric S. Raymond
2019-12-27 14:40                                       ` Segher Boessenkool
2019-12-26 22:57                                   ` Vincent Lefevre
2019-12-26 23:38                                     ` Eric S. Raymond
2019-12-25 19:40                           ` Eric S. Raymond
2019-12-27 21:29                           ` Andreas Schwab
2019-12-27 21:43                             ` Joseph Myers
2019-12-25 19:19                     ` Eric S. Raymond
2019-12-27 21:30                       ` Andreas Schwab
2019-12-28  2:43                         ` Eric S. Raymond
2019-12-27 14:37                   ` Richard Earnshaw
2019-12-24 10:57       ` Maxim Kuvyrkov
2019-12-28 16:30       ` Joseph Myers
2020-01-03 12:38         ` Joseph Myers
2020-01-06 23:58           ` Andrew Pinski
2020-01-07  0:30             ` Joseph Myers
2020-01-07  0:44             ` Richard Earnshaw
2020-01-09 12:22           ` Joseph Myers
2020-01-09 21:57             ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).