From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 57162 invoked by alias); 9 Dec 2019 20:45:58 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 57148 invoked by uid 89); 9 Dec 2019 20:45:58 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.1 spammy=experiences, rise, H*i:sk:14a615e, H*MI:sk:14a615e X-HELO: esa1.mentor.iphmx.com Received: from esa1.mentor.iphmx.com (HELO esa1.mentor.iphmx.com) (68.232.129.153) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 09 Dec 2019 20:45:56 +0000 IronPort-SDR: zFOcHYUmXUqiYSBYXIld4hBbUirBTpQi1LF5BDzH91PBxilMXXgV3HQNrWmjJtZNYfGXftjRhs CdTZ2CN9emFyrwEGkAOtvLXY3NDb1HTAqpKIwGIygSHfGMWGlSU8Ptgo2a4F0b2AsegmKUavqQ uKwISS1GOQHFigv6y0J8NmywHYGZaefUzbJqeWj7q/af5qs4gtqKDTV3bctpQ1LK2Iimvm1D0a 1UYSaeUtYXrNBFGRXDCVZriBLMauGEpwClVyFSR35Ji8/0J30Unfgft5Ifm2EXFqZozA9s2Uql 6D8= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 09 Dec 2019 12:45:54 -0800 IronPort-SDR: AOSdPJBhSDkiH3e8CkHJr20Klv4xUEtRD+AhLa40MYnNfNhiOh6bD2vZ8io3Xe6uTKx56yZOLN 9mmpA/wbLYLiLisF0FBPx4bH8hambbvI6IB/eIqSt8Nggls3xMYjf9D+LRPc2t8f1GDOY7DiK5 1xTufIb6fJ1U5+fFIt/0+3LKQ9LcvJgK4/nr5cgYuDpdwttyjo9ciaiEs/htd2Y4mRNTALGuPH hZbuLUP4zppWD6CqmSDqRuflC22XCbz2g/XbL/WTGC5eP7J7aRObGSUKBuRblHpJUMQUWphv5E UNo= Date: Mon, 09 Dec 2019 20:45:00 -0000 From: Joseph Myers To: Bernd Schmidt CC: "Eric S. Raymond" , Richard Biener , Maxim Kuvyrkov , "Richard Earnshaw (lists)" , Subject: Re: Proposal for the transition timetable for the move to GIT In-Reply-To: <14a615e8-05e7-8c2f-f240-2726441d22e1@t-online.de> Message-ID: References: <1685e719-738f-dd4e-c39c-c08e495b202e@arm.com> <9E009921-96EA-44A2-A06A-232711227E69@linaro.org> <20191206172111.GA116282@thyrsus.com> <0485C474-1B83-42C2-AEAD-7CA252C6CC12@gmail.com> <20191206194604.GA115432@thyrsus.com> <14a615e8-05e7-8c2f-f240-2726441d22e1@t-online.de> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Return-Path: joseph@codesourcery.com X-SW-Source: 2019-12/txt/msg00144.txt.bz2 On Mon, 9 Dec 2019, Bernd Schmidt wrote: > On 12/9/19 7:19 PM, Joseph Myers wrote: > > > > For any conversion we're clearly going to need to run various validation > > (comparing properties of the converted repository, such as contents at > > branch tips, with expected values of those properties based on the SVN > > repository) and fix issues shown up by that validation. reposurgeon has > > its own tools for such validation; I also intend to write some validation > > scripts myself. > > Would it be feasible to require that both conversions produce the same output > repository to some degree? Can we just look at release tags and require that > they have the same hash in both conversions, or are there good reasons why the > two would produce different outputs? The same hashes are not practical. There are several areas where two perfectly correct conversions are still expected to have different contents because of subjective decisions and heuristics involved in the conversion. If some alternative heuristic is found to be clearly better than an existing one in reposurgeon, so that it would be better for any project converting with reposurgeon, or if some preference in the GCC case can readily be represented as a configuration option to choose between different approaches, it makes sense to implement the improvements in reposurgeon so that any project with similar issues can benefit. For example, see Richard's suggestions in reposurgeon issue 174 of two possible improvements to ChangeLog handling: disregarding ChangeLog data if a commit adds multiple ChangeLog entries by different authors, and specifing a wildcard to allow ChangeLog processing on ChangeLog* files to cover ChangeLog.. GCC is hardly the last project converting from SVN to git, so we can benefit from the experiences of past conversions, and help contribute to having useful features available for future conversions. Here are some cases for differences between two correct conversions: * Tree contents should mostly be identical at any given commit, but reposurgeon deliberately produces a .gitignore with contents based on svn:ignore if the SVN tree contents don't have a .gitignore (we use --user-ignores to prefer the .gitignore file in SVN if it exists), and removes any .cvsignore file. * The first parent of a commit should typically be the same between conversions, but (a) might be corrected in some way for cvs2svn issues, (b) might skip SVN commits that would translate into empty git commits, depending on the choices made for handling of such commits. * Cases that give rise to no tree changes in a commit (which thus might not become a git commit at all depending on the choices made and whether they also don't change any merge information properties) include (a) branch or tag creation as an exact copy of some revision of some branch, (b) branch recreation as a copy, e.g. when trunk was deleted accidentally, (c) commits that in SVN only add or remove empty directories, as git does not store empty directories, (d) commits that in SVN just remove some file or directory and replace it with a copy from some revision of some branch that happens to have identical contents to the file or directory removed (yes, we do have commits like that in GCC SVN). * Subsequent parents of a commit based on merge info handling may well have subjective differences between correct conversions. * Commit messages might differ, both because of heuristics to improve them, like Richard's work on that, and because of different choices for how to represent the SVN revision number information in commit messages. * Author and committer identifications, and commit timestamps (especially timezones, something git has, SVN doesn't and reposurgeon has a per-author map for) may vary because of different heuristics or author maps used, especially when there is no ChangeLog entry for a commit or the ChangeLog entry is in some way malformed or the commit adds ChangeLog entries for multiple changes with different authors. -- Joseph S. Myers joseph@codesourcery.com