From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 98235 invoked by alias); 18 Oct 2019 16:26:18 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 98227 invoked by uid 89); 18 Oct 2019 16:26:18 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=appearance, 53609, gits X-HELO: esa4.mentor.iphmx.com Received: from esa4.mentor.iphmx.com (HELO esa4.mentor.iphmx.com) (68.232.137.252) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 18 Oct 2019 16:26:16 +0000 IronPort-SDR: kZZZheyWFoGerhVQWNvuWiiztswMG362SJC/EDufhHKjAZfX/JfPK1uQpJ+O2BOzLrAi4HOLjf RCn60kkpplUNkzjQpfd0i0EYRtFXAD4MKD2p3otbQjs2Bnayz/n1tKtZuqBDw4P3u3xVT/IYoi mw+vdO26T4AXXUegMAOjr88yqTiJMWNC66BQX8+sZEBujXt9jDVdl0P/E0h9AixxiLcUJb4RcX V9z6E8nP9DPGyy7atVYDhfaSFJYmSLtMyI23Q4oROkRnttJh+BS1iTEeTsnDndJXekhiyu4Zgk z1k= Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa4.mentor.iphmx.com with ESMTP; 18 Oct 2019 08:26:15 -0800 IronPort-SDR: +l+VXjTjG8QWQoQJLcqB0EQ/lP9dY8Drc+hMO4zWKiKOJqQCV/zf5zbBR+gqgkFqwCS4DbMUHs EDcYeg4JdaPu13gZFgrwjC1Wbdi8A4aO3DNFDB/aVzcWImXsMfF/jzsYkmXLGUFL6EjZiV2XDL LQEhNZzGC4ndiXQSSI/Tdv4zEEsM55ky9+GwbQSXJKstMs7LYlO6Na7ZAIVqV+C16xYfhv6g6/ k9B/ziA2j1hCE7uQBnvm8zjL9/uIkc18GA8fAqMaaY62BHTM/auVScuRc+NrJ/G/XUh4pfbkRk Xdk= Date: Fri, 18 Oct 2019 16:26:00 -0000 From: Joseph Myers To: CC: Subject: Fixing cvs2svn branchpoints Message-ID: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Return-Path: joseph@codesourcery.com X-SW-Source: 2019-10/txt/msg00127.txt.bz2 As mentioned at the Cauldron, I'm looking at finding better branchpoints for the cases in the GCC repository where cvs2svn messed up identifying the parent branch and commit on which a branch was based, so that affected branches can be reparented as part of moving to git, since messed-up branchpoints are actually confusing in practice when looking at old branches. An idiomatic branch in SVN would start with a commit that just copies one commit of one branch to another branch, with no further changes. In many cases it's not possible to achieve that through reparenting because there is no commit on any parent branch exactly corresponding to the first commit on the cvs2svn-generated branch. However, it's still possible to find a much better approximation than cvs2svn did in some cases. (There are also cases where cvs2svn found a good branchpoint, but represented the branch-creation commit in a superfluously complicated way, replacing lots of files and subdirectories by copies of different revisions. That doesn't really matter for conversion to git, however, since git's data structures don't say anything about where a particular subdirectory was copied from, just the tree hash and the parent commit.) I'm using heuristics to see if a particular branch has a suspicious branchpoint. First, if there is a branchpoint tag I take that as the best estimate of what the tree should look like at the branchpoint commit on the parent branch; otherwise, I take the first commit on the branch as the best estimate of that. Then, I consider a branchpoint not to be suspicious if the only diffs between the tree at the parent commit and the tree estimated to start the branch to be file deletions, and, if there was no branchpoint commit, file additions. (There are several reasons why the creation of a branch might involve file deletions. Some look like CVS glitches where it simply failed to create the branch in particular ,v files; some may be cases where the person created the branch only for certain subdirectories, deliberately; some look like cases where ,v files for separately developed subdirectories, e.g. libjava, got moved into the GCC CVS repository at some point, so resulting in the appearance of those subdirectories being deleted on creation of branches before they were moved into place. File additions at branch creation look more like an artifact of how cvs2svn handles cases of a file first added on trunk after a branch was created, then backported to that branch.) If the branchpoint is suspicious (54 are, out of 135 branches in /branches as of r105925, the last cvs2svn-generated commit), I then look for an alternative non-suspicious branchpoint, which might be either on the same parent branch currently used, or on a different one chosen by some heuristics. Because pretty much all normal GCC commits change file contents (modifying a ChangeLog file, if nothing else), any candidate parent that is non-suspicious, and thus does not involve any file content differences when compared with the branchpoint commit or first commit on the branch, should be very close to being the right parent commit. Here is a list of reparentings I suggest for 16 of those 54 branches, including in particular the cases of egcs_1_00_branch and gcc-3_2-branch that were noted on IRC to have bad branchpoints at present; some are only small changes, some are much more major fixes. I expect I can find reparentings for some of the rest with more investigation and improved heuristics or hints for those heuristics, while others may well already be essentially the right branchpoint despite file content changes being present in the first commit. (Two of the rest do have reparentings suggested by my script, but they need more careful investigation because of file content mismatches between the branchpoint tags and the first commit on the branch.) The first two columns after REPARENT: list the SVN path of the branch, and the revision number of the first commit on it (the one that should be reparented). The next two list the suspicious parent (that is, the branch and revision from which cvs2svn generated the copy that created the top-level /branches/whatever directory for the branch, along with further changes in the commit to fix up files and subdirectories in that copy to have the right tree contents). The final two columns list the proposed new parent branch and revision on that branch. In all cases, the tree content is expected to be left as generated by cvs2svn; it's simply the commit parent that should be changed in git. REPARENT: /branches/GC_5_0_ALPHA_1 27860 /trunk 27852 /trunk 27855 REPARENT: /branches/csl-3_3_1-branch 70143 /trunk 60111 /branches/gcc-3_3-branch 70142 REPARENT: /branches/csl-3_4-linux-branch 90110 /trunk 75991 /branches/gcc-3_4-branch 90109 REPARENT: /branches/csl-3_4_0-hp-branch 80843 /trunk 75991 /branches/gcc-3_4-branch 80842 REPARENT: /branches/csl-sol210-3_4-branch 87927 /trunk 75991 /branches/gcc-3_4-branch 87903 REPARENT: /branches/cygming331 70683 /trunk 60111 /branches/gcc-3_3-branch 70142 REPARENT: /branches/cygming332 73014 /trunk 60111 /branches/cygming331 73013 REPARENT: /branches/cygwin-mingw-gcc-3_1-branch 53609 /trunk 50029 /branches/gcc-3_1-branch 53596 REPARENT: /branches/egcs_1_00_branch 16282 /branches/devo_gcc_testsuite 14842 /trunk 16272 REPARENT: /branches/gcc-2_95_2_1-branch 30162 /trunk 26993 /branches/gcc-2_95-branch 30160 REPARENT: /branches/gcc-3_2-branch 55785 /trunk 50029 /branches/gcc-3_1-branch 55783 REPARENT: /branches/gcc-3_3-rhl-branch 66998 /trunk 60111 /branches/gcc-3_3-branch 66832 REPARENT: /branches/gcc-3_4-e500-branch 89417 /trunk 75991 /branches/gcc-3_4-branch 89410 REPARENT: /branches/gcc-3_4-rhl-branch 81014 /trunk 75991 /branches/gcc-3_4-branch 80870 REPARENT: /branches/gcc-4_0-rhl-branch 95664 /trunk 95533 /branches/gcc-4_0-branch 95655 REPARENT: /branches/libgcj-2_95-branch 27730 /branches/CYGNUS 26267 /trunk 27727 -- Joseph S. Myers joseph@codesourcery.com