From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20719 invoked by alias); 12 Oct 2009 12:34:10 -0000 Received: (qmail 20403 invoked by uid 22791); 12 Oct 2009 12:34:08 -0000 X-SWARE-Spam-Status: No, hits=-0.8 required=5.0 tests=AWL,BAYES_50,SPF_PASS X-Spam-Check-By: sourceware.org Received: from hagrid.ecoscentric.com (HELO mail.ecoscentric.com) (212.13.207.197) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 12 Oct 2009 12:34:01 +0000 Received: from localhost (hagrid.ecoscentric.com [127.0.0.1]) by mail.ecoscentric.com (Postfix) with ESMTP id 20F442F78005 for ; Mon, 12 Oct 2009 13:33:59 +0100 (BST) Received: from mail.ecoscentric.com ([127.0.0.1]) by localhost (hagrid.ecoscentric.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cLPHbFd1pKkx; Mon, 12 Oct 2009 13:33:56 +0100 (BST) Message-ID: <4AD32231.5060506@ecoscentric.com> Date: Mon, 12 Oct 2009 12:34:00 -0000 From: Alex Schuilenburg User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: ecos-maintainers@ecos.sourceware.org Subject: hg conversion notes and summary Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Mailing-List: contact ecos-maintainers-help@ecos.sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: ecos-maintainers-owner@ecos.sourceware.org X-SW-Source: 2009-10/txt/msg00000.txt.bz2 Hi As per my message on ecos-discuss, here is a brief set of notes of my conversion of anoncvs to hg. Tags ==== First, since some of you may not be familiar with DRCS systems, a note on tags. Tags in DRCS systems reflect the state of the repository or branch at a particular point in time. They cover the *whole* repository, not individual files as CVS does. This can and does lead to differences between what "revisions" of files are tagged in CVS and those tagged in hg. As a simple example, since CVS tags are simply textual bookmarks of specific file revisions, there is nothing stopping somebody in CVS tagging the revisions of one set of files as , making a bunch of changes with commits to some of those files so changing the revisions, and then tagging a second set of files as . While you can do this in CVS, you cannot do this in hg because hg tags apply to a single changeset only. Of course this is just a simple example. In actual fact there are another two complicated instances of how our current usage of CVS results in similar mismatches, but one is enough to prove my point. This also happened in real life as both the ecos and ecos-net modules were tagged at different times, with updates to files previously tagged occurring before the second tag was made. So since tags in DRCS systems are made against a single changeset, there is no clean correspondence betwen CVS tags and DRCS tags, and in fact the normal conversion processes to git/hg/bzr they do not even attempt to preserve tags. I have made a pretty reasonable guess, however. Conversion Process Summary ======================= _**_ 1. Create a copy of the CVS repository, remove modules file and munge directory structure in CVS repository to match that given by modules file. /Note: This step was necessary to bring in the ecos network packages into the main tree rather than as a separate checkout. Also, the //naming of a module 'ecos' to an existing directory 'ecos' confused the heck out of cvsps, as well as some files having a revision tagged "ecos"./ / / 2. Use a modified cvsps to create a summary set of changes (i.e. without the actual patch changes). This effectively creates sets of "atomic" checkins which can be used to match hg changesets. /Notes: / 1. /patchsets are created by cvsps/ 2. /changsets are created by hg/ 3. /CVS checkins of a group of files or directories are not atomic - the times of the changes to files recorded by RCS by a single CVS commit often differ, sometimes by as much as 12 minutes for large commits. / 4. /All of CVS diff, cvsps diff and the standard diff are unable to cope with certain changes to binary files and either crash or create patches that cause patch to crash/ 5. /Some CVS log entries were not UTF8 and came from different character sets. These needed special attention (e.g. Roland Caebohm, Daniel Nri, and the best one... soft spaces ) / 6. /CVS locking was broken at some point as I found an instance of a tag being performed midway during a checkin by another user. The CVS history files confirms this. Tag was made by jifl, checkin by gthomas. / 3. Loop through all the patchsets using RCS to create updated or new files within the corresponding repository, or delete files from the repository, and commit according to the log patchset log 4. Some patchsets were applicable to multiple branches. That is, parts or all of some changes within the trunk or a branch propogated to other branches or the trunk. Such propogations were restricted to direct descendants or ascendants. Thus, for every checkin, checks were made to ensure that changes made in one branch/trunk were propogated when necessary to the trunk/branch. There were 198 such changes between the trunk and the branches. Simple example: http://hg-pub.ecoscentric.com/ecos/rev/007ddf4d1979 http://hg-pub.ecoscentric.com/ecos-v2_0-branch/rev/f30bba77e433 PatchSet 593 Date: 2003/02/24 14:04:35 Author: jlarmour Branch: HEAD Tag: (none) Branches: Log: * sgml/doclist: Reorder in a slightly more logical order with related bits grouped together. Add docs for power management, USB (slave, eth slave, and SA11x0 and NEC uPD8985xx drivers), and synthetic target HAL, eth and watchdog drivers. * sgml/.cvsignore: Add gifs and rename ecos.* to ecos-ref.* Members: doc/ChangeLog:1.13->1.14 doc/sgml/.cvsignore:1.2->1.3 doc/sgml/doclist:1.7->1.8 doc/sgml/makemakefile:1.9->1.10 /Notes: / 1. /The CVS history file and CVS checkouts against a date confirmed that revisions checked in made after a branch also appeared in directly related branches./ 2. /changesets could not be transferred between branches because only *some* of the changes in a fair number of them propogated to other branches. Hence individual commits were made to propagate only those changes that CVS reported./ 3. /When propagating changes, some changes appear within files within branches *before* their actual commit on another branch, while other changes magically appear on other branches sometime *after* the checkin. That is, CVS appears to invent time travelin both directions/./ This is fixed in the conversion by only propagating the revision to the other branches at the same time as the original commit./ / / 5. Create the creation of new branches as a clone from the ancestor. Ignore the "Branches" tag from cvsps as it is too unreliable. Not cvsps fault - CVS is just broken. Branches were manually cloned instead to closesr match the code. This was done to ensure that the actual changes of the first commit were preserved. /Note: / 1. /The "Branches" and "Tag" labels within cvsps patchsets were only used as indicators when a branch or tag was made. / 2. /The CVS history file was used as the first reference to determine the time of the branch / 3. /When the history file was not forthcoming (yes, it did not store every tag/branch, and occasionally even gave totally bogus information, looking like timezone bugs.) the time of the branch was calculated to be one second before the first commit to the branch./ 4. /Some files are only appear in a CVS checkout of a branch at their first change in the branch. These files existed when the branch was made, so should appear when the branch is checked out against a time prior to the change, but they did not./ 5. /Some files suddenly appear on a branch with the same revision as at the branchpoint //on their parent // at some arbitraty checkout time / 6. hg tags are after the time of the last RCS commit of all the files that have the tag, just before the next commit. In our simple example which tags a two sets of files at different times, the tag is made after the "newest" (greatest) time of all files containing the tag, one second before the next CVS commit. / / 7. Some revisions of files or patchsets are orphaned. They do not have a branch and do not belong on the trunk. These summarised below. 8. At periodic intervals, do a full CVS checkout of all active branches and compare these files against a hg "checkout" at the same time. If all the comparisons of CVS against hg of the branches matched, create a checkpoint (a clone of the hg repositories). If there are any differences (ignoring files that do not appear in cvs but appear in hg), revert to a previous checkpoint and do a binary chop style search to find which CVS checkin resulted in the change and make the corresponding hg commit to bring the hg files in line with CVS. This was termed "Normalising" and the hg commit message reflected this process. More than occasionally (like merges with anoncvs) such changes occurred at every small commit. In these instances, with many CVS checkins occurring as part of the merge, a single "Normalisation" was done after the multiple CVS commits. 9. At the end of the conversion process, do a full current checkout of both CVS and hg repos and compare. All files matched. However the flash_v2 branch which had additional files in hg due to normalisation. These were removed in one final commit: http://hg-pub.ecoscentric.com/flash_v2/rev/3d4d52f8a035 Orphaned changes ================ PatchSet 582 Date: 2003/02/21 09:09:49 Author: bartv Log: Merge from trunk - tweak CDL testcase definitions to refer to the executables rather than the source Members: packages/io/fileio/current/ChangeLog:1.26->1.26.2.1 packages/io/fileio/current/cdl/fileio.cdl:1.10->1.10.2.1 packages/services/cpuload/current/ChangeLog:1.1->1.1.2.1 packages/services/cpuload/current/cdl/cpuload.cdl:1.2->1.2.2.1 packages/services/crc/current/ChangeLog:1.4->1.4.2.1 packages/services/crc/current/cdl/crc.cdl:1.1->1.1.2.1 Odd usage notes =============== Finally, I just imported the hg ecos repository into git, for the hell of it, just to see, and it worked very smoothly. However, I was surprised to see that the repository was around 10% bigger than hg, upon which I was informed about git-pack. It seems busy git repositories do need regular manual maintenance to stay efficient and small. And as a sub-note, when manually messing with changesets between hg and git, I was disappointed to find that git requires the SHA-1 to refer to a changeset while hg allows you to use both SHA-1 and a local id meaning I ended up with a lot less cut n pastes with hg than I had to do with git. Summary ======= My preference is obvious: hg Hence my continual lobbying :-) However, more practically, my choice is mainly from usage and usability experience. I would encourage you to use both hg and git on different host platforms before making a decision so you can make your own mind up. Regular git users should also try to be a bit more open minded and remember that not every eCos developer uses linux and is a command-line expert. Neither eCosCentric nor myself have any commercial interest in any of the DRCS options, nor do we stand to gain or lose commercially if you choose one over the other. In particular I do not want hg to be disadvantaged just because eCosCentric and myself have recommended it. I am also sure the community would appreciate seeing a summary of how and why you reached your decision if you decide to hold the decision process behind closed doors. Finally, if you have any questions, concerns or would like some help setting up your own hg repository, including setting up push privileges over https for maintainers (assuming you don't want to be exclusive ssh), automated email sent when changes are pushed as well as automatic updates of checked-out versions (e.g. web pages), I have done all of these and will happily share my experiences or set it up for you on sourceware. It is pretty much a no brainer - all you need is a web server (though not even that) with suitable local privileges. OTOH all of this is well documented and each option only involves a couple of lines in your hgrc file, so you should be easily able to set it up yourself. -- Alex Schuilenburg Managing Director/CEO eCosCentric Limited www.ecoscentric.com