public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Fixing cvs2svn branchpoints
@ 2019-10-18 16:26 Joseph Myers
  2019-11-01  0:44 ` Joseph Myers
  0 siblings, 1 reply; 8+ messages in thread
From: Joseph Myers @ 2019-10-18 16:26 UTC (permalink / raw)
  To: gcc; +Cc: esr

As mentioned at the Cauldron, I'm looking at finding better branchpoints 
for the cases in the GCC repository where cvs2svn messed up identifying 
the parent branch and commit on which a branch was based, so that affected 
branches can be reparented as part of moving to git, since messed-up 
branchpoints are actually confusing in practice when looking at old 
branches.

An idiomatic branch in SVN would start with a commit that just copies one 
commit of one branch to another branch, with no further changes.  In many 
cases it's not possible to achieve that through reparenting because there 
is no commit on any parent branch exactly corresponding to the first 
commit on the cvs2svn-generated branch.  However, it's still possible to 
find a much better approximation than cvs2svn did in some cases.  (There 
are also cases where cvs2svn found a good branchpoint, but represented the 
branch-creation commit in a superfluously complicated way, replacing lots 
of files and subdirectories by copies of different revisions.  That 
doesn't really matter for conversion to git, however, since git's data 
structures don't say anything about where a particular subdirectory was 
copied from, just the tree hash and the parent commit.)

I'm using heuristics to see if a particular branch has a suspicious 
branchpoint.  First, if there is a branchpoint tag I take that as the best 
estimate of what the tree should look like at the branchpoint commit on 
the parent branch; otherwise, I take the first commit on the branch as the 
best estimate of that.  Then, I consider a branchpoint not to be 
suspicious if the only diffs between the tree at the parent commit and the 
tree estimated to start the branch to be file deletions, and, if there was 
no branchpoint commit, file additions.

(There are several reasons why the creation of a branch might involve file 
deletions.  Some look like CVS glitches where it simply failed to create 
the branch in particular ,v files; some may be cases where the person 
created the branch only for certain subdirectories, deliberately; some 
look like cases where ,v files for separately developed subdirectories, 
e.g. libjava, got moved into the GCC CVS repository at some point, so 
resulting in the appearance of those subdirectories being deleted on 
creation of branches before they were moved into place.  File additions at 
branch creation look more like an artifact of how cvs2svn handles cases of 
a file first added on trunk after a branch was created, then backported to 
that branch.)

If the branchpoint is suspicious (54 are, out of 135 branches in /branches 
as of r105925, the last cvs2svn-generated commit), I then look for an 
alternative non-suspicious branchpoint, which might be either on the same 
parent branch currently used, or on a different one chosen by some 
heuristics.  Because pretty much all normal GCC commits change file 
contents (modifying a ChangeLog file, if nothing else), any candidate 
parent that is non-suspicious, and thus does not involve any file content 
differences when compared with the branchpoint commit or first commit on 
the branch, should be very close to being the right parent commit.

Here is a list of reparentings I suggest for 16 of those 54 branches, 
including in particular the cases of egcs_1_00_branch and gcc-3_2-branch 
that were noted on IRC to have bad branchpoints at present; some are only 
small changes, some are much more major fixes.  I expect I can find 
reparentings for some of the rest with more investigation and improved 
heuristics or hints for those heuristics, while others may well already be 
essentially the right branchpoint despite file content changes being 
present in the first commit.  (Two of the rest do have reparentings 
suggested by my script, but they need more careful investigation because 
of file content mismatches between the branchpoint tags and the first 
commit on the branch.)

The first two columns after REPARENT: list the SVN path of the branch, and 
the revision number of the first commit on it (the one that should be 
reparented).  The next two list the suspicious parent (that is, the branch 
and revision from which cvs2svn generated the copy that created the 
top-level /branches/whatever directory for the branch, along with further 
changes in the commit to fix up files and subdirectories in that copy to 
have the right tree contents).  The final two columns list the proposed 
new parent branch and revision on that branch.  In all cases, the tree 
content is expected to be left as generated by cvs2svn; it's simply the 
commit parent that should be changed in git.

REPARENT: /branches/GC_5_0_ALPHA_1 27860 /trunk 27852 /trunk 27855
REPARENT: /branches/csl-3_3_1-branch 70143 /trunk 60111 /branches/gcc-3_3-branch 70142
REPARENT: /branches/csl-3_4-linux-branch 90110 /trunk 75991 /branches/gcc-3_4-branch 90109
REPARENT: /branches/csl-3_4_0-hp-branch 80843 /trunk 75991 /branches/gcc-3_4-branch 80842
REPARENT: /branches/csl-sol210-3_4-branch 87927 /trunk 75991 /branches/gcc-3_4-branch 87903
REPARENT: /branches/cygming331 70683 /trunk 60111 /branches/gcc-3_3-branch 70142
REPARENT: /branches/cygming332 73014 /trunk 60111 /branches/cygming331 73013
REPARENT: /branches/cygwin-mingw-gcc-3_1-branch 53609 /trunk 50029 /branches/gcc-3_1-branch 53596
REPARENT: /branches/egcs_1_00_branch 16282 /branches/devo_gcc_testsuite 14842 /trunk 16272
REPARENT: /branches/gcc-2_95_2_1-branch 30162 /trunk 26993 /branches/gcc-2_95-branch 30160
REPARENT: /branches/gcc-3_2-branch 55785 /trunk 50029 /branches/gcc-3_1-branch 55783
REPARENT: /branches/gcc-3_3-rhl-branch 66998 /trunk 60111 /branches/gcc-3_3-branch 66832
REPARENT: /branches/gcc-3_4-e500-branch 89417 /trunk 75991 /branches/gcc-3_4-branch 89410
REPARENT: /branches/gcc-3_4-rhl-branch 81014 /trunk 75991 /branches/gcc-3_4-branch 80870
REPARENT: /branches/gcc-4_0-rhl-branch 95664 /trunk 95533 /branches/gcc-4_0-branch 95655
REPARENT: /branches/libgcj-2_95-branch 27730 /branches/CYGNUS 26267 /trunk 27727

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-11-07 16:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-18 16:26 Fixing cvs2svn branchpoints Joseph Myers
2019-11-01  0:44 ` Joseph Myers
2019-11-01  4:45   ` Eric S. Raymond
2019-11-01 16:14     ` Joseph Myers
2019-11-02  0:45       ` Joseph Myers
2019-11-02 16:30         ` Eric S. Raymond
2019-11-04 17:31           ` Joseph Myers
2019-11-07 16:52             ` Eric S. Raymond

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).