* Re: on reputation and lines and putting things places (Re: gcc branches?) @ 2002-12-08 7:13 Robert Dewar 2002-12-08 14:18 ` source mgt. requirements solicitation Tom Lord 0 siblings, 1 reply; 60+ messages in thread From: Robert Dewar @ 2002-12-08 7:13 UTC (permalink / raw) To: dewar, lord; +Cc: gcc > That's pretty much what I'd guessed. I'll reiterate: you go girl! > That's cool. I admire you. Human scaled, competent, successful: > neat! Sheesh. Are you just flipping out over my use of the word > "dinky"? No, it is just the entire style of your presentation. > I've started to believe that there is no variation on advocacy that > could possibly succeed given presumptions such as you have exhibited. > It is interesting to try to trace those presumptions back to their > origins (*cough*cygnus). Yet another "bash on Tom" day, I guess. I would tend to agree if it is you doing the advocacy. My best advice, find someone who knows how to approach other people successfully. > I don't know much at all about ACT So I see :-) > I'm "not talking to" ACT because, at your scale, my R&D funding needs > are too big for you and not central enough to your mission. Well how do you know? Given the previous quote? In fact CM and revision control systems are quite critical to many of our customers. We have several customers managing systems with tens of thousands of files and millions of lines of code. Remember that the niche Ada occupies is large scale mission critical systems. Perhaps you are missing an opportunity here, though I must say the phrase "my R&D" funding needs is worryingly personal, and as I said earlier, if the intent of this thread was to encourage people to look at arch, it has not worked with me. ^ permalink raw reply [flat|nested] 60+ messages in thread
* source mgt. requirements solicitation 2002-12-08 7:13 on reputation and lines and putting things places (Re: gcc branches?) Robert Dewar @ 2002-12-08 14:18 ` Tom Lord 2002-12-08 14:56 ` DJ Delorie ` (2 more replies) 0 siblings, 3 replies; 60+ messages in thread From: Tom Lord @ 2002-12-08 14:18 UTC (permalink / raw) To: dewar; +Cc: gcc dewar: > No, it is just the entire style of your presentation. Ok, here's a patch: > In fact CM and revision control systems are quite critical to > many of our customers. We have several customers managing > systems with tens of thousands of files and millions of lines > of code. [...] > Perhaps you are missing an opportunity here [...] > if the intent of this thread was to encourage people to look > at arch, it has not worked with me. I'm inexperienced in sales, but from what I read, the right thing here is for me to solicit from you much more information about what you think your (or your customers) needs are -- then if `arch' fits, I can state why in your terms (and if not, thank you for your time and take my leave). Ok? So, I'm listening. For both the GCC project and ACT's customers, what do you (and others on this list) initially think is important in source management technology, especially, but not limited to revision control and adjacent tools? I said "initially" because I'm wondering how to proceed if you list requirements that I think are buggy in one way or another. Is it "good style" to point that out if it occurs? I encourage you to spend a little time answering these questions. There are currently three of four serious revision control projects in the free software world (OpenCM, svn, arch, and metacvs), all in the later stages of initial development. A lot of people, besides just me, can probably benefit from your (and other GCC developers') input -- and your input can help make sure you get better tools down the road. I have some observations that I hope your answers might begin to address. These are observations of facts I think are relevant; I'm assuming it's "good style" to stop there rather than to try to turn these into leading questions. These observations include (in no particular order): 1) There are frequent reports on this list of glitches with the current CVS repository. 2) GCC, more than many projects, relies on a distributed testing effort, which mostly applies to the HEAD revision and to release candidates. Most of this testing is done by hand. 3) Judging by the messages on this list, there is some tension between the release cycle and feature development -- some issues around what is merged when, and around the impact of freezes. 4) GCC, more than many projects, makes use of a formal review process for incoming patches. 5) Mark and the CodeSourcery crew seem to do a lot of fairly mechanical work by hand to operate the release cycle. 6) People often do some archaeology to understand how performance and quality of generated code are evolving: they work up experiments comparing older releases to newer, and comparing various combinations of patches. 7) Questions about which patches relate to which issues in the issue database are fairly common. 8) There have been a few controversies from GCC "customers" arising out of whether they can use the latest release, or whether they should release non-official versions. 9) Distributed testing occurs mostly on the HEAD -- which means that the HEAD breaks on various targets, fairly frequently. 10) The utility of the existing revision control set up to people who lack write access is distinctly less than the utility to people with write access. 11) Some efforts, such as overhauling the build process, will probably benefit from a switch to rev ctl. systems that support tree rearrangements. 12) The GCC project is heavily invested in a particular testing framework. 13) GCC, more than many projects, makes very heavy use of development on branches. -t ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 14:18 ` source mgt. requirements solicitation Tom Lord @ 2002-12-08 14:56 ` DJ Delorie 2002-12-08 15:02 ` David S. Miller 2002-12-08 15:11 ` Bruce Stephens 2002-12-08 16:09 ` Phil Edwards 2002-12-08 18:32 ` Joseph S. Myers 2 siblings, 2 replies; 60+ messages in thread From: DJ Delorie @ 2002-12-08 14:56 UTC (permalink / raw) To: lord; +Cc: dewar, gcc > There are currently three of four serious revision control projects in > the free software world (OpenCM, svn, arch, and metacvs), You forgot to list RCS and CVS. > 2) GCC, more than many projects, relies on a distributed > testing effort, which mostly applies to the HEAD revision > and to release candidates. Most of this testing is done > by hand. All my testing is automated. > 3) Judging by the messages on this list, there is some tension > between the release cycle and feature development -- some > issues around what is merged when, and around the impact of > freezes. I don't see how any revision management system can fix this. This is a people problem. > 9) Distributed testing occurs mostly on the HEAD -- which > means that the HEAD breaks on various targets, fairly > frequently. No, more testing on head means that head *works* more often. The other branches are just as broken, we just don't know about it yet. > 10) The utility of the existing revision control set up to > people who lack write access is distinctly less than > the utility to people with write access. This is a good thing. We don't want them to be able to do all the things write-access people can do. That's the whole point. > 11) Some efforts, such as overhauling the build process, will > probably benefit from a switch to rev ctl. systems that > support tree rearrangements. Like CVS? It supports trees. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 14:56 ` DJ Delorie @ 2002-12-08 15:02 ` David S. Miller 2002-12-08 15:45 ` Bruce Stephens 2002-12-08 15:11 ` Bruce Stephens 1 sibling, 1 reply; 60+ messages in thread From: David S. Miller @ 2002-12-08 15:02 UTC (permalink / raw) To: dj; +Cc: lord, dewar, gcc I think if one is going to try and promote a source management system, I'm pretty sure performance alone would be enough to convince a lot of people. After using bitkeeper for just a week or two, I nearly stopped doing much GCC development simply because CVS is such a dinosaur. It's like driving a model-T on a US interstate highway or the autobahn. It's truly that painful to use. So if arch can provide the same kind of improvement, promote that part of it. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 15:02 ` David S. Miller @ 2002-12-08 15:45 ` Bruce Stephens 2002-12-08 16:52 ` David S. Miller 0 siblings, 1 reply; 60+ messages in thread From: Bruce Stephens @ 2002-12-08 15:45 UTC (permalink / raw) To: gcc "David S. Miller" <davem@redhat.com> writes: > I think if one is going to try and promote a source management system, > I'm pretty sure performance alone would be enough to convince a lot of > people. > > After using bitkeeper for just a week or two, I nearly stopped doing > much GCC development simply because CVS is such a dinosaur. It's like > driving a model-T on a US interstate highway or the autobahn. It's > truly that painful to use. > > So if arch can provide the same kind of improvement, promote that part > of it. I think it can't, at the moment. However, that's an interesting point: what do you do with CVS and with BitKeeper? What operations are performance-critical for you? (My intuition is that arch has concentrated on operations which are relatively uncommon, such as branch merging and the like, relying on a revision library for operations which seem to me more common---like "cvs log", "cvs diff", and the like (or rather their moral equivalents in a configuration based CM). The catch is that the revision library is expensive in disk terms---arguably not a problem, since disk space is cheap, but even so. But my intuition may be wrong, so what about CVS seems slow to you, compared with BitKeeper?) ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 15:45 ` Bruce Stephens @ 2002-12-08 16:52 ` David S. Miller 0 siblings, 0 replies; 60+ messages in thread From: David S. Miller @ 2002-12-08 16:52 UTC (permalink / raw) To: bruce; +Cc: gcc From: Bruce Stephens <bruce@cenderis.demon.co.uk> Date: Sun, 08 Dec 2002 23:11:14 +0000 However, that's an interesting point: what do you do with CVS and with BitKeeper? What operations are performance-critical for you? I think CVS's weak performance points are so well understood by other people that they can comment as good as or better than me :-) Operations on a branch are painful, so someone can start there :-) ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 14:56 ` DJ Delorie 2002-12-08 15:02 ` David S. Miller @ 2002-12-08 15:11 ` Bruce Stephens 2002-12-08 16:24 ` Joseph S. Myers 1 sibling, 1 reply; 60+ messages in thread From: Bruce Stephens @ 2002-12-08 15:11 UTC (permalink / raw) To: gcc DJ Delorie <dj@redhat.com> writes: [...] >> 10) The utility of the existing revision control set up to >> people who lack write access is distinctly less than >> the utility to people with write access. > > This is a good thing. We don't want them to be able to do all the > things write-access people can do. That's the whole point. Not on the central repository, no. But it might be that people (people without write access to the main repository) could usefully keep branches on their own repository (perhaps merging the patches in at some stage). With CVS, that's not possible, but with a distributed CM system it would be. >> 11) Some efforts, such as overhauling the build process, will >> probably benefit from a switch to rev ctl. systems that >> support tree rearrangements. > > Like CVS? It supports trees. It doesn't handle renaming files or directories. There are ways to do both, but you lose something, whatever you choose to do. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 15:11 ` Bruce Stephens @ 2002-12-08 16:24 ` Joseph S. Myers 2002-12-08 16:47 ` Tom Lord 0 siblings, 1 reply; 60+ messages in thread From: Joseph S. Myers @ 2002-12-08 16:24 UTC (permalink / raw) To: Bruce Stephens; +Cc: gcc On Sun, 8 Dec 2002, Bruce Stephens wrote: > Not on the central repository, no. But it might be that people > (people without write access to the main repository) could usefully > keep branches on their own repository (perhaps merging the patches in > at some stage). With CVS, that's not possible, but with a distributed > CM system it would be. Distributed CM could be a mixed blessing. Sometimes when people merge development from a branch to mainline the mainline ChangeLog such says "See ChangeLog.foobar on foobar-branch for details." (though I don't think this is a proper form of ChangeLog for such changes, the Changelog should describe the changes made to mainline following the usual standards). If the branch sat on someone's machine elsewhere, there's then a lot of potential for losing this information later if the machine goes away, fails, etc. - whereas the main repository is at least rsyncable and rsynced by various people. (Such problems could be avoided if there were a mechanism by which such branches of interest - probably including any discussed on the list, could be "adopted" into the main repository, so that their history (maintained on some other machine) is regularly made available from the main rsyncable repository and isn't lost if the originating machine goes away. This applies even to branches that don't get merged to mainline (superseded by other branches, etc.) but which are of relevance to historical discussions on the lists.) There is one notable problem with CVS's handling of users without write access: they can't do "cvs add" to generate diffs with added files, though they can fake its local effects. I don't know whether svn fixes this. -- Joseph S. Myers jsm28@cam.ac.uk ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 16:24 ` Joseph S. Myers @ 2002-12-08 16:47 ` Tom Lord 2002-12-08 22:20 ` Craig Rodrigues 0 siblings, 1 reply; 60+ messages in thread From: Tom Lord @ 2002-12-08 16:47 UTC (permalink / raw) To: gcc Thanks for the replies so far. These are helpful. My intention is to read these over, take lots of notes, and make a succinct-as-possible, coalesced reply. I'll also (so far) reply individually to shebs' issue with merging (since it sounds like an interesting and relevant technical problem). If there's some other issue you'd like to see pulled out from a coallesced reply, please say so explicitly. One quick request: someone said "Hey, testing is already automated." Can I please see a slight elaboration on the form and function of that automation? (I have some idea, but maybe there's something I've overlooked. What I _think_ I know already is that `make test' works, and that there's some infrastructure for mailing in `make test' output and having it show up on a web site. Presumably individual testers have their own scripts for that. I'm not aware that there is any infrastructure for easily testing arbitrary combinations of patches, but one comment implied that there is. Someone mentioned QMtest, which can really tighten-up that automation -- but last I heard, prospects for its adoption by GCC were slim: has that changed?) Still listening, -t ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 16:47 ` Tom Lord @ 2002-12-08 22:20 ` Craig Rodrigues 0 siblings, 0 replies; 60+ messages in thread From: Craig Rodrigues @ 2002-12-08 22:20 UTC (permalink / raw) To: Tom Lord; +Cc: gcc On Sun, Dec 08, 2002 at 04:22:22PM -0800, Tom Lord wrote: > One quick request: someone said "Hey, testing is already automated." > Can I please see a slight elaboration on the form and function of that > automation? As far as I can tell, there are are number of people who run daily (or frequent) builds of GCC on a few platforms. They use the output of "make test", which kicks off some tests which use the DejaGNU testing framework, and post their output to the gcc-testresults mailing list: http://gcc.gnu.org/ml/gcc-testresults/ CodeSourcery has been working on converting the GCC testsutie over to QMTest: http://gcc.gnu.org/ml/gcc/2002-05/msg01978.html While the existing GCC testing process has its benefits, I don't think it is perfect. It would be great if someone had some positive ideas towards improving the GCC testing process. To give you some ideas of some of the problems in the current process, Wolfgang Banerth has informed me that he has identified 71 current C++ regressions from GCC 3.2 in the mainline branch of GCC, based on reading reports in GNATS. Granted, many of these regressions might be related and duplicates, but still, that is quite a number of regressions to track down and fix. I'd be very interested in any ideas which could improve this process. -- Craig Rodrigues http://www.gis.net/~craigr rodrigc@attbi.com ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 14:18 ` source mgt. requirements solicitation Tom Lord 2002-12-08 14:56 ` DJ Delorie @ 2002-12-08 16:09 ` Phil Edwards 2002-12-08 19:13 ` Zack Weinberg 2002-12-09 15:10 ` Walter Landry 2002-12-08 18:32 ` Joseph S. Myers 2 siblings, 2 replies; 60+ messages in thread From: Phil Edwards @ 2002-12-08 16:09 UTC (permalink / raw) To: Tom Lord; +Cc: dewar, gcc On Sun, Dec 08, 2002 at 02:06:31PM -0800, Tom Lord wrote: > I said "initially" because I'm wondering how to proceed if you list > requirements that I think are buggy in one way or another. Is it > "good style" to point that out if it occurs? It's more likely that they understand the requirements better than you do, so it would be /better/ style if you said, "could you elaborate on this, here are my questions," rather than, "no, /your/ requirements are buggy." > 1) There are frequent reports on this list of glitches with > the current CVS repository. IIRC, these have all been caused by non-CVS problems. (E.g., disks filled up, mail server getting hammed and DoS'ing the other services, etc.) > 2) GCC, more than many projects, relies on a distributed > testing effort, which mostly applies to the HEAD revision > and to release candidates. Most of this testing is done > by hand. I'll borrow one of your choice phrases and call this a bullshit rumor. It's nearly all automated. > 3) Judging by the messages on this list, there is some tension > between the release cycle and feature development -- some > issues around what is merged when, and around the impact of > freezes. Yes. I don't see how the choice of revision control software makes a difference here. The limiting resource here is people-hours. > 4) GCC, more than many projects, makes use of a formal review > process for incoming patches. Yes. > 5) Mark and the CodeSourcery crew seem to do a lot of fairly > mechanical work by hand to operate the release cycle. Perhaps you haven't looked at contrib/* and maintainer-scripts/* lately? Releases and weekly snapshots are all done with those. > 6) People often do some archaeology to understand how > performance and quality of generated code are evolving: > they work up experiments comparing older releases to newer, > and comparing various combinations of patches. Yes. This is also automated, e.g., Diego's SPEC2000 pages. > 7) Questions about which patches relate to which issues in the > issue database are fairly common. *shrug* When a patch is committed with an PR number in the log, the issue database takes notice of it. That's something that we added with a CVS plugin. > 8) There have been a few controversies from GCC "customers" > arising out of whether they can use the latest release, or > whether they should release non-official versions. Yes. What does this have to do with revision control software? Anybody using open source can make this same decision. > 9) Distributed testing occurs mostly on the HEAD -- which > means that the HEAD breaks on various targets, fairly > frequently. Uh, no. Exactly backwards. > 10) The utility of the existing revision control set up to > people who lack write access is distinctly less than > the utility to people with write access. Well, duh. > 11) Some efforts, such as overhauling the build process, will > probably benefit from a switch to rev ctl. systems that > support tree rearrangements. Probably. > 12) The GCC project is heavily invested in a particular > testing framework. Yes. Well, that plus the new QMtest, which looks to bo far superior. > 13) GCC, more than many projects, makes very heavy use of > development on branches. Yes. -- I would therefore like to posit that computing's central challenge, viz. "How not to make a mess of it," has /not/ been met. - Edsger Dijkstra, 1930-2002 ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 16:09 ` Phil Edwards @ 2002-12-08 19:13 ` Zack Weinberg 2002-12-09 10:33 ` Phil Edwards 2002-12-09 11:06 ` Joseph S. Myers 2002-12-09 15:10 ` Walter Landry 1 sibling, 2 replies; 60+ messages in thread From: Zack Weinberg @ 2002-12-08 19:13 UTC (permalink / raw) To: Phil Edwards; +Cc: Tom Lord, dewar, gcc Phil Edwards <phil@jaj.com> writes: > On Sun, Dec 08, 2002 at 02:06:31PM -0800, Tom Lord wrote: >> 1) There are frequent reports on this list of glitches with >> the current CVS repository. > > IIRC, these have all been caused by non-CVS problems. (E.g., disks > filled up, mail server getting hammed and DoS'ing the other > services, etc.) There is one situation that used to come up a lot which is CVS's fault: a 'cvs server' process dies without removing its lock files, wedging that directory for everyone else until it's manually removed. I believe this has been dealt with by some patches to the server plus a cron job that looks for stale locks; however, a version control system that could not get into a wedged state like that would be useful. >> 3) Judging by the messages on this list, there is some tension >> between the release cycle and feature development -- some >> issues around what is merged when, and around the impact of >> freezes. > > Yes. I don't see how the choice of revision control software makes a > difference here. The limiting resource here is people-hours. CVS makes working on branches quite difficult. I suspect that a system that made it easier would mean that people were a bit more comfortable about doing development on branches for long periods of time. >> 4) GCC, more than many projects, makes use of a formal review >> process for incoming patches. > Yes. This is a strength, but with a downside -- patches can and do get lost. We advise people to re-send patches at intervals, but some sort of automated patch-tracker would probably be helpful. I don't think the version control system can help much here (but see below). >> 5) Mark and the CodeSourcery crew seem to do a lot of fairly >> mechanical work by hand to operate the release cycle. > > Perhaps you haven't looked at contrib/* and maintainer-scripts/* lately? > Releases and weekly snapshots are all done with those. I do a fair amount of by-hand work merging the trunk into the basic-improvements-branch. Some, but not all, of that work could be facilitated with a better version control system. See below. >> 11) Some efforts, such as overhauling the build process, will >> probably benefit from a switch to rev ctl. systems that >> support tree rearrangements. > > Probably. I have several changes in mind which I have not done largely because CVS lacks the ability to version renames. To be specific: move cpplib to the top level; move gcc/intl to the top level and sync it with the version of that directory in the src repository; move the C front end to a language subdirectory like the others; move the Ada runtime library to the top level. I'm not saying that I would definitely have done all of these changes by now if we were using a version control system that handled renames; only that the lack of rename support is a major barrier to them. * * * I'm now going to list the requirements which I would place on a replacement for CVS, in rough decreasing order of importance. I haven't done any research to back them up -- this is just off the top of my head (but having thought about the issue quite a bit). 0. Must be at least as reliable and at least as portable as CVS. GCC is a very large development effort. We can't afford to lose contributors because their preferred platform is shut out, nor can we afford to lose work due to bugs, and we *especially* cannot risk a system which has not been audited for security exposures. It would be relatively easy to give much stronger data integrity guarantees than CVS currently manages: 0a. All data stored in the repository is under an end-to-end checksum. All data transmitted over the network is independently checksummed (yes, redundant with TCP-layer checksums). CVS does no checksumming at all. 0b. Anonymous repository access is done under a user ID that has only OS-level read privileges on the repository's files. This cannot be done with (unpatched) CVS. 0c. Remote write operations on the repository intrinsically require the use of a protocol which makes strong cryptographic integrity and authority guarantees. CVS can be set up like this, but it's not built into the design. 0d. The data stored in the repository cannot be modified by unprivileged local users except by going through the version control system. Presently I could take 'vi' to one of the ,v files in /cvs/gcc and break it thoroughly, or sneak something into the file content, and leave no trace. 1. Must be at least as fast as CVS for all operations, and should be substantially faster for all operations where CVS uses a braindead algorithm. I would venture to guess that everyone's #1 complaint about CVS is the amount of time we waste waiting for it to complete this or that request. To be more specific: 1a. Efficient network protocol. Specifically, a network protocol that, for *all* operations, transmits a volume of data proportional -- with a small constant! -- to the size of the diff involved, *not* the total size of all the files touched by the diff involved, as CVS does. 1b. Efficient tags and branches. It should be possible to create either by creating *one* metadata record, rather than touching every single file in the repository. 1c. Efficient delta storage algorithm, such that checking in a change on the tip of a branch is not orders of magnitude slower than checking in a change on the tip of the trunk. There are several sane ways to do this. 1d. Efficient method for extracting a logical change after the fact, no matter how many files it touched. (Currently the easiest way to do this is: hunt through the gcc-cvs archive until you find the message describing the checkin you care about, then use wget on all of the per-file diff URLs in the list and glue them all together. Slow, painful, doesn't always work.) 2. Should support this laundry list of features, none of which is known to CVS. Most of them would be useful independent of the others, though there's not much point to 2b without 2a, nor 2e without 2d. 2a. Atomic application of a logical change that touches many files, possibly not all in the same directory. (This is commonly known as a "change set".) One checkin log per change set is adequate. 2b. Ability to back out an entire change set just as atomically as it went in. 2c. Ability to rename a file, including the ability for a file to have different names on different branches. 2d. Automatically remember that a merge occurred from branch A to branch B; later, when a second merge occurs from A to B, don't apply those changes again. 2e. Understand the notion of a single-delta merge, either applying just one change from branch A to branch B, or removing just one change formerly on branch A ("subtractive merge"). 2f. Perform conflict resolution by automatic formation of microbranches. 3. Should allow a user without commit privileges to generate a change set, making arbitrary changes to the repository (none of this "you can edit files and generate diffs but you can't add or delete files" nonsense), which can be applied by a user who does have commit privileges, and when the original author does an update he/she doesn't get spurious conflicts. 4. The repository's on-disk data should be stored in a highly compact format, to the maximum extent possible and consonant with being fast. Being fast is much more important; however, GCC's CVS repository is ~800MB in size and compresses down to ~100MB. You can do interesting things (like keep a copy of the entire repository on every developer's personal hard disk, as Bitkeeper does) with a 100MB repository that are not so practical when it's closer to a gigabyte. 5. Should have the ability to generate ChangeLog files automagically from the checkin comments. (When merging to basic-improvements I normally spend more time fixing up the ChangeLogs than anything else. Except maybe waiting for 'cvs tag' and 'cvs update -j...'.) zw ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 19:13 ` Zack Weinberg @ 2002-12-09 10:33 ` Phil Edwards 2002-12-09 11:06 ` Joseph S. Myers 1 sibling, 0 replies; 60+ messages in thread From: Phil Edwards @ 2002-12-09 10:33 UTC (permalink / raw) To: Zack Weinberg; +Cc: Tom Lord, dewar, gcc On Sun, Dec 08, 2002 at 04:55:14PM -0800, Zack Weinberg wrote: > I'm now going to list the requirements which I would place on a > replacement for CVS, in rough decreasing order of importance. I > haven't done any research to back them up -- this is just off the top > of my head (but having thought about the issue quite a bit). With the exception of 2[def] and 5, I believe subversion does all of those. I handle 5 with a wrapper script for checkins. Phil -- I would therefore like to posit that computing's central challenge, viz. "How not to make a mess of it," has /not/ been met. - Edsger Dijkstra, 1930-2002 ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 19:13 ` Zack Weinberg 2002-12-09 10:33 ` Phil Edwards @ 2002-12-09 11:06 ` Joseph S. Myers 2002-12-09 9:42 ` Zack Weinberg 1 sibling, 1 reply; 60+ messages in thread From: Joseph S. Myers @ 2002-12-09 11:06 UTC (permalink / raw) To: Zack Weinberg; +Cc: gcc On Sun, 8 Dec 2002, Zack Weinberg wrote: > 0a. All data stored in the repository is under an end-to-end > checksum. All data transmitted over the network is independently > checksummed (yes, redundant with TCP-layer checksums). CVS does > no checksumming at all. Doesn't SSH? (And CVS does checksum checkouts/updates: if after applying a diff in cvs update the file checksum doesn't match, it warns and regets the whole file, which can indicate something was broken in the latest checkin to the file (yielding a bogus delta). This is however highly suboptimal - it should be an error not a warning (with a warning sent to the repository maintainers) and lots more checksumming should be done. In addition: 0aa. Checksums stored in the repository format for all file revisions, deltas, log messages etc., with an easy way to verify them - to detect corruption early.) > 5. Should have the ability to generate ChangeLog files automagically > from the checkin comments. (When merging to basic-improvements I > normally spend more time fixing up the ChangeLogs than anything > else. Except maybe waiting for 'cvs tag' and 'cvs update -j...'.) The normal current practice here is for branch ChangeLogs to be kept in a separate file, not the ChangeLogs that need merging from mainline. (In the case of BIB the branch ChangeLog then goes on the top of the mainline one (with an overall "merge from BIB" comment) when the merge back to mainline is done. For branches developing new features a new ChangeLog entry describing the overall logical effect of the branch changes, not the details of how that state was reached, is more appropriate.) -- Joseph S. Myers jsm28@cam.ac.uk ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-09 11:06 ` Joseph S. Myers @ 2002-12-09 9:42 ` Zack Weinberg 2002-12-09 11:00 ` Jack Lloyd 0 siblings, 1 reply; 60+ messages in thread From: Zack Weinberg @ 2002-12-09 9:42 UTC (permalink / raw) To: Joseph S. Myers; +Cc: gcc "Joseph S. Myers" <jsm28@cam.ac.uk> writes: E> On Sun, 8 Dec 2002, Zack Weinberg wrote: > >> 0a. All data stored in the repository is under an end-to-end >> checksum. All data transmitted over the network is independently >> checksummed (yes, redundant with TCP-layer checksums). CVS does >> no checksumming at all. > > Doesn't SSH? I assume it has to, since cryptography usually requires that. > (And CVS does checksum checkouts/updates: if after applying a diff in cvs > update the file checksum doesn't match, it warns and regets the whole > file, which can indicate something was broken in the latest checkin to the > file (yielding a bogus delta). I didn't know that. But, as you say, it's not nearly enough. (When was the last time we got a block of binary zeroes in a ,v file and nobody noticed for months?) > 0aa. Checksums stored in the repository format for all file > revisions, deltas, log messages etc., with an easy way to verify > them - to detect corruption early.) Worth pointing out that subversion doesn't do as much checksumming as we'd like, either. > The normal current practice here is for branch ChangeLogs to be kept > in a separate file, not the ChangeLogs that need merging from > mainline. (In the case of BIB the branch ChangeLog then goes on the > top of the mainline one (with an overall "merge from BIB" comment) > when the merge back to mainline is done. For branches developing > new features a new ChangeLog entry describing the overall logical > effect of the branch changes, not the details of how that state was > reached, is more appropriate.) Unfortunately, this is not how BIB was done, and I'm stuck with the way it is being done now (the normal ChangeLog files are used, and I resolve the conflict on every merge). Next time around, it would certainly be easier to use a separate file -- but better still to avoid maintaining the files at all. zw ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-09 9:42 ` Zack Weinberg @ 2002-12-09 11:00 ` Jack Lloyd 0 siblings, 0 replies; 60+ messages in thread From: Jack Lloyd @ 2002-12-09 11:00 UTC (permalink / raw) To: Zack Weinberg; +Cc: gcc On Mon, 9 Dec 2002, Zack Weinberg wrote: > > 0aa. Checksums stored in the repository format for all file > > revisions, deltas, log messages etc., with an easy way to verify > > them - to detect corruption early.) > > Worth pointing out that subversion doesn't do as much checksumming as > we'd like, either. OpenCM (opencm.org) does really good checksumming; everything is based off of strong hashes (and RSA signatures where needed). In particular you're 0d requirement is met in a way that no other CM system (that I've heard about) can do. Nobody (even root) can substitute one file for another or similiar nastiness. Well, unless they can break SHA-1 in a real serious way. I'll mention I work on OpenCM (it's my day job), and additionally I promise I won't go endlessly promoting it on the list. I'd be happy to answer any questions off list if you like, but this is the first and last time I'll bring it up here. -Jack ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 16:09 ` Phil Edwards 2002-12-08 19:13 ` Zack Weinberg @ 2002-12-09 15:10 ` Walter Landry 2002-12-09 15:27 ` Joseph S. Myers 1 sibling, 1 reply; 60+ messages in thread From: Walter Landry @ 2002-12-09 15:10 UTC (permalink / raw) To: gcc Hi, > I'm now going to list the requirements which I would place on a > replacement for CVS, in rough decreasing order of importance. I > haven't done any research to back them up -- this is just off the top > of my head (but having thought about the issue quite a bit). > > 0. Must be at least as reliable To my knowledge, arch doesn't have any real reliability problems. Tom didn't make a fast implementation, but it is reliable. There is a bug database [1] with all of the bugs that I can think of, so you can decide for yourself whether it is reliable. > and at least as portable as CVS. Arch currently doesn't work on 64 bit machines. The problem is in the non-shell parts. It is not an insurmountable problem, it is just that no one has taken the time to try to fix the bugs. Someone got it running under cygwin once, but the patches have disappeared. It wasn't usable there. Too slow. Otherwise, it seems to work on posix machines. > GCC is a very large development effort. We can't afford to lose > contributors because their preferred platform is shut out, nor > can we afford to lose work due to bugs, and we *especially* > cannot risk a system which has not been audited for security > exposures. It would be relatively easy to give much stronger > data integrity guarantees than CVS currently manages: arch doesn't interact at all with root. The remote repositories are all done with sftp, ftp, and http, which is as secure as those servers are. > 0a. All data stored in the repository is under an end-to-end > checksum. All data transmitted over the network is independently > checksummed (yes, redundant with TCP-layer checksums). CVS does > no checksumming at all. Sort of. Patches are gzipped, which have their own checksum, but there is isn't any way to make sure that what you get is the same thing as what you put in. That is, there is some individual checksums, but no end-to-end checksum. > 0b. Anonymous repository access is done under a user ID that has only > OS-level read privileges on the repository's files. This cannot > be done with (unpatched) CVS. Is http access good enough? > 0c. Remote write operations on the repository intrinsically require > the use of a protocol which makes strong cryptographic integrity > and authority guarantees. CVS can be set up like this, but it's > not built into the design. Currently, we allow writeable ftp servers and sftp servers. If we disallowed writeable ftp servers, would that be good enough? (Don't tempt me. I've considered it in the past.) > 0d. The data stored in the repository cannot be modified by > unprivileged local users except by going through the version > control system. Presently I could take 'vi' to one of the ,v > files in /cvs/gcc and break it thoroughly, or sneak something into > the file content, and leave no trace. There is no interaction with root, so if you own the archive, you can always do what you want. To get anything approaching this, you have to deal with PGP signatures, SHA hashes, and the like. OpenCM is probably the only group (including BitKeeper) that even comes close to doing this right. > 1. Must be at least as fast as CVS for all operations, and should be > substantially faster for all operations where CVS uses a braindead > algorithm. I would venture to guess that everyone's #1 complaint > about CVS is the amount of time we waste waiting for it to complete > this or that request. To be more specific: Arch is slow, slow, slow. Don't let Tom beguile you into thinking that it is even reasonably fast right now. It isn't. It is a subject of great interest to the developers, but we're not there yet. Part of this is the shell implementation. Once certain parts are rewritten in a compiled language, it should get _much_ better. > 1a. Efficient network protocol. Specifically, a network protocol that, > for *all* operations, transmits a volume of data proportional -- > with a small constant! -- to the size of the diff involved, *not* > the total size of all the files touched by the diff involved, as > CVS does. Arch has this, although some of the implementations could do with a little improvement (e.g. the mirroring script seems to take forever). > 1b. Efficient tags and branches. It should be possible to create > either by creating *one* metadata record, rather than touching > every single file in the repository. Don't know. I haven't looked at the actual implementation. There isn't a fundamental reason why not, though. > 1c. Efficient delta storage algorithm, such that checking in a change > on the tip of a branch is not orders of magnitude slower than > checking in a change on the tip of the trunk. There are several > sane ways to do this. Arch has this > 1d. Efficient method for extracting a logical change after the fact, > no matter how many files it touched. (Currently the easiest way > to do this is: hunt through the gcc-cvs archive until you find the > message describing the checkin you care about, then use wget on > all of the per-file diff URLs in the list and glue them all > together. Slow, painful, doesn't always work.) Arch has this > 2. Should support this laundry list of features, none of which is > known to CVS. Most of them would be useful independent of the > others, though there's not much point to 2b without 2a, nor 2e > without 2d. > > 2a. Atomic application of a logical change that touches many files, > possibly not all in the same directory. (This is commonly known as > a "change set".) One checkin log per change set is adequate. Arch has this. It's why I started using it. > 2b. Ability to back out an entire change set just as atomically as it > went in. In theory, easy to do (just a few rm's and an mv). There are larger policy questions, though (Do we want to allow that?). Some day, I may just hack something together that does that. > 2c. Ability to rename a file, including the ability for a file to have > different names on different branches. Arch has this > 2d. Automatically remember that a merge occurred from branch A to > branch B; later, when a second merge occurs from A to B, don't > apply those changes again. Arch has this > 2e. Understand the notion of a single-delta merge, either applying > just one change from branch A to branch B, or removing just one > change formerly on branch A ("subtractive merge"). Single delta forward merges are no problem. Reverse merges are more difficult. This is one of those "lurking design issues" that I mentioned earlier. > 2f. Perform conflict resolution by automatic formation of > microbranches. I'm not quite sure what you mean here. > 3. Should allow a user without commit privileges to generate a change > set, making arbitrary changes to the repository (none of this "you > can edit files and generate diffs but you can't add or delete > files" nonsense), which can be applied by a user who does have > commit privileges, and when the original author does an update > he/she doesn't get spurious conflicts. Are you thinking of sending patches by email? Arch doesn't have that. > 4. The repository's on-disk data should be stored in a highly compact > format, to the maximum extent possible and consonant with being > fast. Being fast is much more important; however, GCC's CVS > repository is ~800MB in size and compresses down to ~100MB. You > can do interesting things (like keep a copy of the entire > repository on every developer's personal hard disk, as Bitkeeper > does) with a 100MB repository that are not so practical when it's > closer to a gigabyte. Arch stores the repository as tar.gz of the initial revision, plus tar.gz of the patches. This will be about as compact as anything. The problem comes when you want to get older revisions. If you're at patch-51, getting patch-48 means starting from patch-0 and applying all 48 patches. This can be sped up by saving entire trees along the way, but that kills the "highly compact format". > 5. Should have the ability to generate ChangeLog files automagically > from the checkin comments. (When merging to basic-improvements I > normally spend more time fixing up the ChangeLogs than anything > else. Except maybe waiting for 'cvs tag' and 'cvs update -j...'.) That apparently works, although I've never used it. By the way, I thought that your comments were quite illuminating, so I put them up on the arch web site [2]. I also think that Tom should stop telling everyone to work on arch. At this point, it just causes more trouble than any help I'll get. Regards, Walter Landry wlandry@ucsd.edu [1] http://bugs.fifthvision.net:8080/ [2] http://www.fifthvision.net/open/bin/view/Arch/GccHackers ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-09 15:10 ` Walter Landry @ 2002-12-09 15:27 ` Joseph S. Myers 2002-12-09 17:05 ` Walter Landry 2002-12-11 1:11 ` Branko Čibej 0 siblings, 2 replies; 60+ messages in thread From: Joseph S. Myers @ 2002-12-09 15:27 UTC (permalink / raw) To: Walter Landry; +Cc: gcc On Mon, 9 Dec 2002, Walter Landry wrote: > arch doesn't interact at all with root. The remote repositories are > all done with sftp, ftp, and http, which is as secure as those servers > are. Is this - for anonymous access - _plain_ HTTP, or HTTP + WebDAV + DeltaV which svn uses? One problem there was with SVN - it may have been fixed by now, and a fix would be necessary for it to be usable for GCC - was its use of HTTP and HTTPS (for write access); these tend to be heavily controlled by firewalls and the ability to tunnel over SSH (with just that one port needing to be open) would be necessary. "Transparent" proxies may pass plain HTTP OK, but not the WebDAV/DeltaV extensions SVN needs. > > 0d. The data stored in the repository cannot be modified by > > unprivileged local users except by going through the version > > control system. Presently I could take 'vi' to one of the ,v > > files in /cvs/gcc and break it thoroughly, or sneak something into > > the file content, and leave no trace. > > There is no interaction with root, so if you own the archive, you can > always do what you want. To get anything approaching this, you have > to deal with PGP signatures, SHA hashes, and the like. OpenCM is > probably the only group (including BitKeeper) that even comes close to > doing this right. This sort of thing has been done simply by a modified setuid (to a cvs user, not root) cvs binary so users can't access the repository directly, only through that binary. More generically, with a reasonable protocol for local repository access it should be possible to use GNU userv to separate the repository from the users. > > 2b. Ability to back out an entire change set just as atomically as it > > went in. > > In theory, easy to do (just a few rm's and an mv). There are larger > policy questions, though (Do we want to allow that?). Some day, I may > just hack something together that does that. A change set is applied. It turns out to have problems, so needs to be reverted - common enough. Of course the version history and ChangeLog shows both the original application and reversion. The reversion might in fact be of the original change set and a series of subsequent failed attempts at patching it up. But intermediate unrelated changes to the tree should not be backed out in the process. > > 3. Should allow a user without commit privileges to generate a change > > set, making arbitrary changes to the repository (none of this "you > > can edit files and generate diffs but you can't add or delete > > files" nonsense), which can be applied by a user who does have > > commit privileges, and when the original author does an update > > he/she doesn't get spurious conflicts. > > Are you thinking of sending patches by email? Arch doesn't have that. Patches by email (with distributed patch review by multiple people reading gcc-patches, including those who can't actually approve the patch) is the normal way GCC development works. Presume that most contributors will not want to deal with security issues of making any local repository accessible to other machines, even if it's on a permanently connected machine and local firewalls or policy don't prevent this. A patch for use with a better version control system would need to include some encoding for that system of renames / deletes / ... - but that needs to be just as human-readable as context diffs / unidiffs are. -- Joseph S. Myers jsm28@cam.ac.uk ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-09 15:27 ` Joseph S. Myers @ 2002-12-09 17:05 ` Walter Landry 2002-12-09 17:10 ` Joseph S. Myers 2002-12-09 17:50 ` Zack Weinberg 2002-12-11 1:11 ` Branko Čibej 1 sibling, 2 replies; 60+ messages in thread From: Walter Landry @ 2002-12-09 17:05 UTC (permalink / raw) To: gcc "Joseph S. Myers" <jsm28@cam.ac.uk> wrote: > On Mon, 9 Dec 2002, Walter Landry wrote: > > > arch doesn't interact at all with root. The remote repositories are > > all done with sftp, ftp, and http, which is as secure as those servers > > are. > > Is this - for anonymous access - _plain_ HTTP, or HTTP + WebDAV + DeltaV > which svn uses? One problem there was with SVN - it may have been fixed > by now, and a fix would be necessary for it to be usable for GCC - was its > use of HTTP and HTTPS (for write access); these tend to be heavily > controlled by firewalls and the ability to tunnel over SSH (with just that > one port needing to be open) would be necessary. "Transparent" proxies > may pass plain HTTP OK, but not the WebDAV/DeltaV extensions SVN needs. Anonymous access requires for HTTP + WebDAV (no DeltaV). However, the set of WebDAV commands needed are much smaller than what subversion needs. It just needs whatever anonymous ftp has that http doesn't (I believe PROPFIND is one). In particular, you can run a server using apache 1.3. > > > 0d. The data stored in the repository cannot be modified by > > > unprivileged local users except by going through the version > > > control system. Presently I could take 'vi' to one of the ,v > > > files in /cvs/gcc and break it thoroughly, or sneak something into > > > the file content, and leave no trace. > > > > There is no interaction with root, so if you own the archive, you can > > always do what you want. To get anything approaching this, you have > > to deal with PGP signatures, SHA hashes, and the like. OpenCM is > > probably the only group (including BitKeeper) that even comes close to > > doing this right. > > This sort of thing has been done simply by a modified setuid (to a cvs > user, not root) cvs binary so users can't access the repository directly, > only through that binary. More generically, with a reasonable protocol > for local repository access it should be possible to use GNU userv to > separate the repository from the users. This is a different security model. Arch is secure because it doesn't depend on having priviledged access. For example, there is an "rm -rf" command built into arch. I have a feeling that you are thinking of how CVS handles things, with a centralized server. Part of the whole point of arch is that there is no centralized server. So, for example, I can develop arch independently of whether Tom thinks that I am worthy enough to do so. I can screw up my archive as much as I want (and I have), and Tom can be blissfully unaware. Easy merging is what makes this possible. So you don't, in general, have a repository that is writeable by more than one person. > > > 2b. Ability to back out an entire change set just as atomically as it > > > went in. > > > > In theory, easy to do (just a few rm's and an mv). There are larger > > policy questions, though (Do we want to allow that?). Some day, I may > > just hack something together that does that. > > A change set is applied. It turns out to have problems, so needs to be > reverted - common enough. Of course the version history and ChangeLog > shows both the original application and reversion. The reversion might in > fact be of the original change set and a series of subsequent failed > attempts at patching it up. But intermediate unrelated changes to the > tree should not be backed out in the process. To get what you really want means that we can reverse our patches. Then you could simply unapply a patch. But that isn't possible right now, and is not going to be done real soon. That would require someone who is actually working on the code to understand the current patch format. Regards, Walter Landry wlandry@ucsd.edu ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-09 17:05 ` Walter Landry @ 2002-12-09 17:10 ` Joseph S. Myers 2002-12-09 18:27 ` Walter Landry 2002-12-09 17:50 ` Zack Weinberg 1 sibling, 1 reply; 60+ messages in thread From: Joseph S. Myers @ 2002-12-09 17:10 UTC (permalink / raw) To: Walter Landry; +Cc: gcc On Mon, 9 Dec 2002, Walter Landry wrote: > Anonymous access requires for HTTP + WebDAV (no DeltaV). However, the > set of WebDAV commands needed are much smaller than what subversion > needs. It just needs whatever anonymous ftp has that http doesn't (I > believe PROPFIND is one). In particular, you can run a server using > apache 1.3. I'm sure some "transparent" proxies will fail to pass even that (though WebDAV may be better supported by them than DeltaV). This is similar to Zack's first point - just as any new system must be no less portable to running on different systems, it must be no less portable to working through networks restricted in different ways. > I have a feeling that you are thinking of how CVS handles things, with > a centralized server. Part of the whole point of arch is that there > is no centralized server. So, for example, I can develop arch > independently of whether Tom thinks that I am worthy enough to do so. > I can screw up my archive as much as I want (and I have), and Tom can > be blissfully unaware. Easy merging is what makes this possible. > > So you don't, in general, have a repository that is writeable by more > than one person. For GCC there clearly needs to be some server that has the mainline of development we advertise on our web pages for users, from which release branches are made, which has some vague notions of the machine being securely maintained, having adequate bandwidth, having some backup procedure, having maintainers for the server keeping it up reliably, having a reasonable expectation that the development lines in there will still be available in 20 years' time when current developers have lost interest. (gcc.gnu.org presents a remarkably good impression of this to the outside world, considering how it operates purely by volunteer effort.) There may be many other servers - private and public - but some server provides the line of development that gets branched into new releases, and inevitably multiple people may write to that line. (I'm also presuming - see <http://gcc.gnu.org/ml/gcc/2002-12/msg00436.html> - that all the developments in any third party repository that get discussed on the lists should be mirrored into this main one to give some hope of long term survival and availability. In developing GCC with list archives and version control we are simultaneously acting as curators of the history of GCC development, which means attempting to preserve that history for posterity (a period beyond the involvement of any one individual).) -- Joseph S. Myers jsm28@cam.ac.uk ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-09 17:10 ` Joseph S. Myers @ 2002-12-09 18:27 ` Walter Landry 2002-12-09 19:16 ` Joseph S. Myers 0 siblings, 1 reply; 60+ messages in thread From: Walter Landry @ 2002-12-09 18:27 UTC (permalink / raw) To: gcc "Joseph S. Myers" <jsm28@cam.ac.uk> wrote: > On Mon, 9 Dec 2002, Walter Landry wrote: > > > Anonymous access requires for HTTP + WebDAV (no DeltaV). However, the > > set of WebDAV commands needed are much smaller than what subversion > > needs. It just needs whatever anonymous ftp has that http doesn't (I > > believe PROPFIND is one). In particular, you can run a server using > > apache 1.3. > > I'm sure some "transparent" proxies will fail to pass even that (though > WebDAV may be better supported by them than DeltaV). This is similar to > Zack's first point - just as any new system must be no less portable to > running on different systems, it must be no less portable to working > through networks restricted in different ways. Well, there is anonymous ftp. But if all you have is plain http, I would think that you would have problems checking out from CVS as well. > > So you don't, in general, have a repository that is writeable by more > > than one person. > > For GCC there clearly needs to be some server that has the mainline of > development we advertise on our web pages for users, from which release > branches are made, which has some vague notions of the machine being > securely maintained, having adequate bandwidth, having some backup > procedure, having maintainers for the server keeping it up reliably, > having a reasonable expectation that the development lines in there will > still be available in 20 years' time when current developers have lost > interest. (gcc.gnu.org presents a remarkably good impression of this to > the outside world, considering how it operates purely by volunteer > effort.) That, presumably, would be the release manager's branch. Periodically, people would say, "feature X is implemented on branch Y". If the release manager trusts them, then he does a simple update. If there is no trust, then the release manager can review the patches. In any case, assuming the submitter knows what they are doing, the patch will apply cleanly. It would be very quick. If it doesn't apply cleanly, then the release manager sends a curt note to the submitter (perhaps automatically) or tries to resolve it himself. This is how the Linux kernel development works, although a release manager wouldn't have to do as much work as Linus does. Regards, Walter Landry wlandry@ucsd.edu ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-09 18:27 ` Walter Landry @ 2002-12-09 19:16 ` Joseph S. Myers 2002-12-10 0:27 ` Zack Weinberg 0 siblings, 1 reply; 60+ messages in thread From: Joseph S. Myers @ 2002-12-09 19:16 UTC (permalink / raw) To: Walter Landry; +Cc: gcc On Mon, 9 Dec 2002, Walter Landry wrote: > Well, there is anonymous ftp. But if all you have is plain http, I > would think that you would have problems checking out from CVS as > well. FTP is probably useful in most such cases (for arch, since I didn't think svn provided FTP transport; and I don't know about the other systems mentioned). The case I was thinking of is the common situation where most outgoing ports are free but port 80 is redirected through a "transparent" proxy to save ISP bandwidth. (In such situations, a few other outgoing ports such 25 are probably proxied but are irrelevant here.) That's one common (consumer) situation, and FTP and HTTPS probably work there, but another (corporate) situation may well have HTTPS tied down more tightly. Where people have got pserver or ssh allowed through their firewall, there may be more problems with protocols used for other purposes and restricted or proxied for other reasons. (Some blocks might be avoided by choosing nonstandard ports, but then everyone is likely to choose different ports and create more confusion. Arranging that both write and anonymous access is tunnelled over ssh - as some sites do for anonymous CVS access - simplifies things.) > That, presumably, would be the release manager's branch. > Periodically, people would say, "feature X is implemented on branch > Y". If the release manager trusts them, then he does a simple update. > If there is no trust, then the release manager can review the patches. > In any case, assuming the submitter knows what they are doing, the > patch will apply cleanly. It would be very quick. If it doesn't > apply cleanly, then the release manager sends a curt note to the > submitter (perhaps automatically) or tries to resolve it himself. > This is how the Linux kernel development works, although a release > manager wouldn't have to do as much work as Linus does. There are about 100 people applying patches to the mainline (half maintainers of some of the code who can apply some patches without review, half needing review for all nonobvious patches). Having the release manager manually handle the patches from all 100 people is not a sensible scalable solution for GCC; the expectation is that anyone producing a reasonable number of good patches will get write access which reduces the reviewers' effort (to needing only to review the patch, not apply it) and means that the version control logs clearly show which user was responsible for a patch by who checked it in (the case of someone else, named in the log message, being responsible, being the exceptional case). Note that the 50 or so maintainers all do some patch review; it's only at a late stage on the actual release branches that the review is concentrated in the release manager. (You might then say that the release manager could have a bot automatically applying patches from developers who now have write access, but this has no real advantages over them all having write access and a lot of fragility added in.) The Linux model of one person controlling everything going into the mainline is exceptional; GCC, *BSD, etc., have many committers to mainline (the rules for who commits where with what review varying) and as Zack explains <http://gcc.gnu.org/ml/gcc/2002-12/msg00492.html> (albeit missing the footnote [1] on where releases are made from) this mainline on a master server will remain central, with new developments normally going there rapidly except for various major work on longer-term surrounding branches. Zack notes that practically the main use of a distributed system would be for individual developers to do their work offline, not to move the main repository for communication between developers off a single machine (though depending on the system, developers may naturally have repository mirrors) - it is not in general the case that development takes place on always-online systems or systems which can allow remote access to their repositories. I expect for most branches it will also be most convenient for the master server to host them. The exceptions are most likely to be for developments that aren't considered politically appropriate for mainstream GCC, or those that aren't assigned to the FSF or may have other legal problems, or those done under NDA (albeit legally dodgy), or corporate developments whose public visibility too early would give away sensitive information or which a customer would like to have before they go public (i.e., work eventually destined for public GCC unless too specialised or ugly, where the customer and company would be free under the GPL to release the work early but choose not to). In general, I expect most development would be on the central server, except for small-scale individual development (often offline) on personal servers and corporate development on internal systems that definitely will not be accessible to the public. This is, of course, all just hypothesis about how GCC development would work with distributed CM, but it seems a reasonable extrapolation supposing we start from wanting to preserve security, accessibility and long-term survival of all the version history of developments that presently go in the public repository (mainline and branches). -- Joseph S. Myers jsm28@cam.ac.uk ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-09 19:16 ` Joseph S. Myers @ 2002-12-10 0:27 ` Zack Weinberg 2002-12-10 0:41 ` Tom Lord ` (3 more replies) 0 siblings, 4 replies; 60+ messages in thread From: Zack Weinberg @ 2002-12-10 0:27 UTC (permalink / raw) To: Joseph S. Myers; +Cc: Walter Landry, gcc "Joseph S. Myers" <jsm28@cam.ac.uk> writes: > The Linux model of one person controlling everything going into the > mainline is exceptional; GCC, *BSD, etc., have many committers to > mainline (the rules for who commits where with what review varying) > and as Zack explains <http://gcc.gnu.org/ml/gcc/2002-12/msg00492.html> > (albeit missing the footnote [1] on where releases are made from) > this mainline on a master server will remain central, with new > developments normally going there rapidly except for various major > work on longer-term surrounding branches. The missing footnote was going to be an argument that the Linux model is not just exceptional, but pathological. Not something I think we should emulate with GCC, and not something I consider worth designing a version control system to support. Linux is a large project - 4.3 million lines of code - but only one person has commit privileges on the official tree, for any given release branch. No matter how good their tools are, this cannot be expected to scale, and indeed it does not. I have not actually measured it, but the appearance of the traffic on linux-kernel is that Linus drops patches on the floor just as often as he did before he started using Bitkeeper. However, Bitkeeper facilitates other people maintaining their own semi-official versions of the tree, in which some of these patches get sucked up. That is bad. It means users have to choose between N different variants; as time goes by it becomes increasingly difficult to put them all back together again; eventually will come a point where critical feature A is available only in tree A, critical feature B is available only in tree B, and the implementations conflict, because no one's exerting adequate centripetal force. Possibly I am too pessimistic. zw ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-10 0:27 ` Zack Weinberg @ 2002-12-10 0:41 ` Tom Lord 2002-12-10 12:05 ` Phil Edwards ` (2 subsequent siblings) 3 siblings, 0 replies; 60+ messages in thread From: Tom Lord @ 2002-12-10 0:41 UTC (permalink / raw) To: zack; +Cc: gcc Zack: Linux is a large project - 4.3 million lines of code - but only one person has commit privileges on the official tree, for any given release branch. No matter how good their tools are, this cannot be expected to scale, and indeed it does not. I hope you'll have a look at the process automation scenario in my reply to Joseph S. Myers ("new patch of replies (B)"). -t ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-10 0:27 ` Zack Weinberg 2002-12-10 0:41 ` Tom Lord @ 2002-12-10 12:05 ` Phil Edwards 2002-12-10 19:44 ` Mark Mielke 2002-12-14 13:43 ` Linus Torvalds 3 siblings, 0 replies; 60+ messages in thread From: Phil Edwards @ 2002-12-10 12:05 UTC (permalink / raw) To: Zack Weinberg; +Cc: Joseph S. Myers, Walter Landry, gcc On Mon, Dec 09, 2002 at 11:26:01PM -0800, Zack Weinberg wrote: > the implementations conflict, because no one's exerting adequate > centripetal force. Heh. I never thought I'd hear that term applied to software development. I like it. Phil -- I would therefore like to posit that computing's central challenge, viz. "How not to make a mess of it," has /not/ been met. - Edsger Dijkstra, 1930-2002 ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-10 0:27 ` Zack Weinberg 2002-12-10 0:41 ` Tom Lord 2002-12-10 12:05 ` Phil Edwards @ 2002-12-10 19:44 ` Mark Mielke 2002-12-10 19:57 ` David S. Miller 2002-12-14 13:43 ` Linus Torvalds 3 siblings, 1 reply; 60+ messages in thread From: Mark Mielke @ 2002-12-10 19:44 UTC (permalink / raw) To: Zack Weinberg; +Cc: Joseph S. Myers, Walter Landry, gcc On Mon, Dec 09, 2002 at 11:26:01PM -0800, Zack Weinberg wrote: > Linux is a large project - 4.3 million lines of code - but only one > person has commit privileges on the official tree, for any given > release branch. No matter how good their tools are, this cannot be > expected to scale, and indeed it does not. I have not actually > measured it, but the appearance of the traffic on linux-kernel is that > Linus drops patches on the floor just as often as he did before he > started using Bitkeeper. However, Bitkeeper facilitates other people > maintaining their own semi-official versions of the tree, in which > some of these patches get sucked up. That is bad. It means users > have to choose between N different variants; as time goes by it > becomes increasingly difficult to put them all back together again; > eventually will come a point where critical feature A is available > only in tree A, critical feature B is available only in tree B, and > the implementations conflict, because no one's exerting adequate > centripetal force. > Possibly I am too pessimistic. Actually, the model used for Linux provides substantial freedom. Since no single site is the 'central' site, development can be fully distributed. Changes can be merged back and forth on demand, and remote users require no resources to run, other than the resources to periodically synchronize the data. Unfortunately -- this freedom (as always) comes with a price. The price is that the fully distributed model means that there is no enforced regulation. There is no control, and the same freedom that allows anybody to create a variant, allows them to keep a variant. The models are substantially different, however, I would suggest that neither is wrong in the generic sense. The only questions that really matters are: 1) are you more comfortable in a regulated environment, and if so, then 2) are you willing to live with the limitations that a regulated environment gives you? Some of these limitations include the need to maintain contact with a central repository of some sort, and the need for processing at a central repository of some sort. Personally, I'm with you in that I prefer regulation and enforcement. It keeps me from fsck'ing up my own data. mark -- mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-10 19:44 ` Mark Mielke @ 2002-12-10 19:57 ` David S. Miller 2002-12-10 20:02 ` Phil Edwards 0 siblings, 1 reply; 60+ messages in thread From: David S. Miller @ 2002-12-10 19:57 UTC (permalink / raw) To: mark; +Cc: zack, jsm28, wlandry, gcc From: Mark Mielke <mark@mark.mielke.cc> Date: Tue, 10 Dec 2002 22:42:24 -0500 On Mon, Dec 09, 2002 at 11:26:01PM -0800, Zack Weinberg wrote: > Linux is a large project - 4.3 million lines of code - but only one > person has commit privileges on the official tree, for any given > release branch. No matter how good their tools are, this cannot be > expected to scale, and indeed it does not. I have not actually > measured it, but the appearance of the traffic on linux-kernel is that > Linus drops patches on the floor just as often as he did before he > started using Bitkeeper. However, Bitkeeper facilitates other people > maintaining their own semi-official versions of the tree, in which > some of these patches get sucked up. That is bad. It means users > have to choose between N different variants; as time goes by it > becomes increasingly difficult to put them all back together again; > eventually will come a point where critical feature A is available > only in tree A, critical feature B is available only in tree B, and > the implementations conflict, because no one's exerting adequate > centripetal force. > Possibly I am too pessimistic. Actually, the model used for Linux provides substantial freedom. Since no single site is the 'central' site, development can be fully distributed. Changes can be merged back and forth on demand, and remote users require no resources to run, other than the resources to periodically synchronize the data. I think some assesments are wrong here. Linus does get more patches applied these days, and less gets dropped on the floor. Near the end of November, as we were approaching the feature freeze deadline, he was merging on the order of 4MB of code per day if not more. What really ends up happening also is that Linus begins to trust people with entire subsystems. So when Linus pulls changes from their BK tree, he can see if they touch any files outside of their areas of responsibility. Linus used to drop my work often, and I would just retransmit until he took it. Now with BitKeeper, I honestly can't remember the last time he silently dropped a code push I sent to him. The big win with BitKeeper is the whole disconnected operation bit. When the net goes down, I can't check RCS history and make diffs against older versions of files in the gcc tree. With Bitkeeper I have all the revision history in my cloned tree so there is zero need for me to every go out onto the network to do work until I want to share my changes with other people. This also decreases the load on the machine with the "master" repository. There is nothing about this which makes it incompatible with how GCC works today. So if arch and/or subversions can support the kind of model BitKeeper can, we'd set it up like so: 1) gcc.gnu.org would still hold the "master" repository 2) there would be trusted people with write permission who could thusly push their changes into the master tree Releases and tagging would still be done by someone like Mark except it hopefully wouldn't take several hours to do it :-) ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-10 19:57 ` David S. Miller @ 2002-12-10 20:02 ` Phil Edwards 2002-12-10 23:07 ` David S. Miller 0 siblings, 1 reply; 60+ messages in thread From: Phil Edwards @ 2002-12-10 20:02 UTC (permalink / raw) To: David S. Miller; +Cc: mark, zack, jsm28, wlandry, gcc On Tue, Dec 10, 2002 at 07:41:05PM -0800, David S. Miller wrote: > When the net goes down, I can't check RCS history and make diffs > against older versions of files in the gcc tree. I just rsync the repository and do everything but checkins locally. Very very fast. > With Bitkeeper I have all the revision history in my cloned tree so > there is zero need for me to every go out onto the network to do work > until I want to share my changes with other people. This also > decreases the load on the machine with the "master" repository. So does the rsync-repo technique. Phil -- I would therefore like to posit that computing's central challenge, viz. "How not to make a mess of it," has /not/ been met. - Edsger Dijkstra, 1930-2002 ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-10 20:02 ` Phil Edwards @ 2002-12-10 23:07 ` David S. Miller 2002-12-11 6:31 ` Phil Edwards 0 siblings, 1 reply; 60+ messages in thread From: David S. Miller @ 2002-12-10 23:07 UTC (permalink / raw) To: phil; +Cc: mark, zack, jsm28, wlandry, gcc From: Phil Edwards <phil@jaj.com> Date: Tue, 10 Dec 2002 22:57:24 -0500 > With Bitkeeper I have all the revision history in my cloned tree so > there is zero need for me to every go out onto the network to do work > until I want to share my changes with other people. This also > decreases the load on the machine with the "master" repository. So does the rsync-repo technique. That's not distributed source management, that's "I copy the entire master tree onto my computer." If you make modifications to your local rsync'd master tree, you can't transparently push those changes to other people unless you setup anoncvs on your computer and tell them "use this as your master repo instead of gcc.gnu.org to get my changes". That's bolted onto the side, not part of the design. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-10 23:07 ` David S. Miller @ 2002-12-11 6:31 ` Phil Edwards 0 siblings, 0 replies; 60+ messages in thread From: Phil Edwards @ 2002-12-11 6:31 UTC (permalink / raw) To: David S. Miller; +Cc: mark, zack, jsm28, wlandry, gcc On Tue, Dec 10, 2002 at 07:58:36PM -0800, David S. Miller wrote: > From: Phil Edwards <phil@jaj.com> > Date: Tue, 10 Dec 2002 22:57:24 -0500 > > > With Bitkeeper I have all the revision history in my cloned tree so > > there is zero need for me to every go out onto the network to do work > > until I want to share my changes with other people. This also > > decreases the load on the machine with the "master" repository. > > So does the rsync-repo technique. > > That's not distributed source management, that's "I copy the entire > master tree onto my computer." I'm not claiming otherwise. I'm simply offering a tip to make life easier for current users in the current situation. What I said is still true with regards to the paragraph I quoted. Phil -- I would therefore like to posit that computing's central challenge, viz. "How not to make a mess of it," has /not/ been met. - Edsger Dijkstra, 1930-2002 ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-10 0:27 ` Zack Weinberg ` (2 preceding siblings ...) 2002-12-10 19:44 ` Mark Mielke @ 2002-12-14 13:43 ` Linus Torvalds 2002-12-14 14:06 ` Tom Lord ` (2 more replies) 3 siblings, 3 replies; 60+ messages in thread From: Linus Torvalds @ 2002-12-14 13:43 UTC (permalink / raw) To: zack, gcc [ See the blurb about OpenCM at the end. ] In article <87isy2mj1y.fsf@egil.codesourcery.com>, Zack Weinberg <zack@codesourcery.com> wrote: > >Linux is a large project - 4.3 million lines of code - but only one >person has commit privileges on the official tree, for any given >release branch. No. That's not how it works. Linux, unlike _every_ other project I know of, has always actively encouraged "personal/vendor branches", and that is in fact how 99% of all development has happened. Most development happens in trees that have _nothing_ to do with the official tree. To me, the whole CVS model (many branches in one centralized repository) is just incredibly broken, and you should realize that that isn't how Linux has ever worked. My tree is often called the "official" tree, but what it really is is just a base tree that many people maintain their own forks from. This is fundamentally _more_ scalable than the CVS mess that is gcc, since it much more easily allows for very radical branches that do not need any centralized permission from me. Think of it this way: in gcc, the egcs split was a very painful thing. In Linux, those kinds of splits (people doing what they think is right, _without_ support from the official maintainers) is how _everything_ gets done. Linux is a "constantly forking" project, and that's how development very fundamentally happens. And a fork is a lot more scalable than a branch. It's also a lot more powerful: it gives _full_ rights to the forker. That implies that a forked source tree should be a first-class citizen, not just something that was copied off somebody elses CVS tree. The BitKeeper "clone" thing is a beautiful implementation of the Linux development model. > No matter how good their tools are, this cannot be >expected to scale, and indeed it does not. Sorry, but you're wrong. Probably simply because you're too used to the broken CVS model. I would like to point out that Linux development has scaled a lot better than gcc, to a larger source base (it's 5+ M lines) with much more fundamental programming issues (concurrency etc). I will bet you that the Linux kernel merges are a lot bigger than the gcc ones, that development happens faster, and that there are more independent developers working on their own versions of Linux than there are of gcc. There aren't just a handful of branches, there are _hundreds_. Many of them end up not being interesting, or ever necessarily merged back. And _none_ of them required write access to my tree. I'd also like to point out that Linux has _never_ had a flap like the gcc/egcs/emacs/xemacs splits. Exactly because of the _much_ more scalable approach of just fundamentally always having had a distributed development model that allows _anybody_ to contribute easily, instead of having a model that makes certain people have "special powers". In short, _my_ tree is _not_ the same thing as the gcc CVS sources. > I have not actually >measured it, but the appearance of the traffic on linux-kernel is that >Linus drops patches on the floor just as often as he did before he >started using Bitkeeper. Measure the number of changes accepted, and I bet the Linux kernel approach had an order of magnitude more changes than gcc has _ever_ had. Even before using Bitkeeper. The proof is in the pudding - care to compare real numbers, and compare sizes of weekly merged patches? I bet gcc will come in _far_ behind. > However, Bitkeeper facilitates other people >maintaining their own semi-official versions of the tree, in which >some of these patches get sucked up. That is bad. No. Have you ever used Bitkeeper? Really _used_ it? I've used both bitkeeper and CVS (I refuse to touch CVS with a ten-foot pole for my "fun" projects, but I've used CVS for big projects at work), and I can tell you, CVS doesn't even come _close_. Not even with various wrapper helper tools to make things like CVS branches look even remotely palatable. The part that you're missing, simply because you've probably used CVS for too long, is the _distributed_ nature of Bitkeeper, and of Linux development. Repeat after me: "There is no single tree". Everything is distributed. Any source control system that has "write access" issues is fundamentally broken in my opinion. Your repository should be _yours_, and nobody elses. There is no "write access". There is only some way to expedite a merge between two repositories. The source control management should make it easy for you to export your changes to other repositories. In fact, it should make it easy for you to have many different repositories - for different things you're working on. Bitkeeper does this very well. It's _the_ reason I use bitkeeper. BK does other things too, but they all pale to the _fundamental_ idea of each repository being a thing unto itself, and having no stupid "branches", but simply having truly distributed repositories. Some people think that is a "offline" feature, but nothing could be further from the truth. The _real_ issue about independent repositories is that it makes it possible to do truly independent development, and makes notions like branches such an outdated idea. Projects like Subversion never seem to have really _understood_ the notion of true distributed repositories. And by not understanding them, like you they miss the whole point of truly scalable development. Development that scales _past_ the notion of one central repository. >Possibly I am too pessimistic. No. You're not pessimistic, you just don't _understand_. You don't have to believe me. Believe the numbers. Look at which project gets more done. And realize that even before Linux used Bitkeeper, it used the truly distributed _model_. The model is independent from what SCM you use, although some SCMs obviously cannot support some models (and CVS in particular forces its users to use a particularly broken model). Btw, I realize that there's no way in hell gcc will use bitkeeper. I'm not trying to push that. I'm just hoping that if gcc does change to something smarter than CVS, it would change to something that is truly distributed, and doesn't have that broken "branch" notion, or the notion of needing write permissions to some stupid central repository in order to enjoy the idea of SCM. Looking at the current projects out there, the only one that looks like it has more than half a clue is "OpenCM". It doesn't seem to really do the distributed thing right, but at least from what I've seen it looks like they have the right notions about doing it. The OpenCM project seems to still believe that distribution is just about "disconnected commits" rather than understanding that if you do distributed repositories right you shouldn't have branches at all (instead of a branch, you should just have a _different_ repository), but they at least seem to understand the importance of true distribution. I hope gcc developers are giving it a look. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-14 13:43 ` Linus Torvalds @ 2002-12-14 14:06 ` Tom Lord 2002-12-14 17:44 ` Linus Torvalds 2002-12-14 14:41 ` Neil Booth 2002-12-14 15:33 ` Momchil Velikov 2 siblings, 1 reply; 60+ messages in thread From: Tom Lord @ 2002-12-14 14:06 UTC (permalink / raw) To: gcc; +Cc: torvalds The OpenCM project seems to still believe that distribution is just about "disconnected commits" rather than understanding that if you do distributed repositories right you shouldn't have branches at all (instead of a branch, you should just have a _different_ repository), Branches can span repository boundaries just fine, and that's a nice way to keep useful track of the history that relates the two forks. Distribution is orthogonal to branching. Two repositories can be separately administered and fully severable, yet branches can usefully exist between them. For the "branched from" repository, this is a passive, read-only operation. Looking at the current projects out there, the only one that looks like it has more than half a clue is "OpenCM". It doesn't seem to really do the distributed thing right, but at least from what I've seen it looks like they have the right notions about doing it. What aspect of arch has you confused? or, alternatively, what flaw do you see in arch's approach to distribution? The _real_ issue about independent repositories is that it makes it possible to do truly independent development, and makes notions like branches such an outdated idea. Arranging that one line is a branch of another (even when they are in two independent, severable repositories) facilitates simpler and more accurate queries about how their histories relate. Among other things, such queries can (a) take more of the drudgery out of some common merging tasks, (b) better facilitate process automation when the forks are, in fact, contributing to a common line of development. GCC development faces a problem which Linux kernel development, you seem to have said elsewhere, avoids by social means: it has direct and appreciated contributors to the mainline who, nevertheless, are asked to contribute their changes indirectly through a formal review and testing process (rather than through, say, a "trusted lieutenant" -- in other words, in GCC, the work pool of the "lieutenants" ("maintainers", actually) is collected and shared among them in flexible, fine-grained ways that are performed with considerable discipline). Distribution _with_ branches can be a boon to those contributors and the maintainers. Overall -- I don't think there can be _that_ much contrast between the GCC and LK development processes. GCC is a bit like LK, except that instead of a Linus, GCC has a team. That team needs (and has) tools to make them effective as the "virtual linus". (Some of us have ideas for even better tools :-) That there is less of a tendancy for 3rd parties to throw up their arms and make their own forks may not have quite the implications you assert: the natures of the two systems and the uses they are put to make comparison very difficult. -t ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-14 14:06 ` Tom Lord @ 2002-12-14 17:44 ` Linus Torvalds 2002-12-14 19:45 ` Tom Lord 0 siblings, 1 reply; 60+ messages in thread From: Linus Torvalds @ 2002-12-14 17:44 UTC (permalink / raw) To: Tom Lord; +Cc: gcc On Sat, 14 Dec 2002, Tom Lord wrote: > > What aspect of arch has you confused? or, alternatively, what flaw > do you see in arch's approach to distribution? To be honest, I tried arch back when I was testing different SCM's for the kernel, and even just the setup confused me enough that I never got past that phase. I suspect I just tried it too early in the development cycle, and that turned me off it. Also, the oft-repeated performance issues have kept me wary about arch. Bitkeeper is quite fast, but even so Larry and friends actually ended having to make some major performance improvements to the bk-3 release simply because they were taken by surprise at just _how_ much data the kernel SCM ends up needing to process. I realize that there are a lot of advantages to keeping to high-level scripting languages for the SCM, but it's also quite important to try to avoid making the SCM itself be a distraction from a performance standpoint. However, since I never got very far with arch, I really only parrot what I've heard from others about its performance, so this may be unfair. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-14 17:44 ` Linus Torvalds @ 2002-12-14 19:45 ` Tom Lord 0 siblings, 0 replies; 60+ messages in thread From: Tom Lord @ 2002-12-14 19:45 UTC (permalink / raw) To: torvalds; +Cc: gcc > Also, the oft-repeated performance issues have kept me wary > about arch. Fair if you're evaluating it from the "should I start using this tomorrow" perspective (don't). I think most of us who are fairly deep into arch think these problems have straightforward solutions, and my goal is to try to find a solution to the resource crisis that keeps me from finishing the work. I realize that there are a lot of advantages to keeping to high-level scripting languages for the SCM, but it's also quite important to try to avoid making the SCM itself be a distraction from a performance standpoint. However, since I never got very far with arch, I really only parrot what I've heard from others about its performance, so this may be unfair. The prototype/reference implementation of arch _is_ a mixture of shell scripts and small C programs. I think the enforced simplicity is very good for the architecture and I'm quite optimistic about the future performance potential of this code. arch is tiny, and I'm encouraging alternative implementations for a variety of purposes. I hear that (have some salt grains with this) someone is working on one in C++, and someone else on one in Python. A Perl translation was made, but work on it seems to have stopped (perhaps because the author changed work contexts) around the time it was starting to function. It is not quite accurate to say "the current implementation is slow because it uses sh" -- some sh parts need recasting in C, many don't, some of the admin tweaks that improve performance need to be made more automatic....things like that. It's an optimizable prototype that has not been prematurely optimized. Just reading what you say here: the arch design has everything you like about BK and probably a bit more to boot. It's just a resource problem to get it to a 1.0 that is as comfortable to adopt as you've found BK. Rumours that that will require $12M are exaggerated by, in my estimate, about a factor of 10. To be honest, I tried arch back when I was testing different SCM's for the kernel, and even just the setup confused me enough that I never got past that phase. I suspect I just tried it too early in the development cycle, and that turned me off it. Perhaps. The currently active developers seem to be giving a lot of attention to encapsulating matters such as that in convenience commands layered over the core. -t ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-14 13:43 ` Linus Torvalds 2002-12-14 14:06 ` Tom Lord @ 2002-12-14 14:41 ` Neil Booth 2002-12-14 15:47 ` Zack Weinberg 2002-12-14 15:33 ` Momchil Velikov 2 siblings, 1 reply; 60+ messages in thread From: Neil Booth @ 2002-12-14 14:41 UTC (permalink / raw) To: Linus Torvalds; +Cc: zack, gcc Linus Torvalds wrote:- > The part that you're missing, simply because you've probably used CVS > for too long, is the _distributed_ nature of Bitkeeper, and of Linux > development. Repeat after me: "There is no single tree". Everything is > distributed. Uh, careful, Zack wrote parts of Bitkeeper, including designing the network protocols IIRC. Neil. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-14 14:41 ` Neil Booth @ 2002-12-14 15:47 ` Zack Weinberg 0 siblings, 0 replies; 60+ messages in thread From: Zack Weinberg @ 2002-12-14 15:47 UTC (permalink / raw) To: Neil Booth; +Cc: Linus Torvalds, gcc Neil Booth <neil@daikokuya.co.uk> writes: > Linus Torvalds wrote:- > >> The part that you're missing, simply because you've probably used CVS >> for too long, is the _distributed_ nature of Bitkeeper, and of Linux >> development. Repeat after me: "There is no single tree". Everything is >> distributed. > > Uh, careful, Zack wrote parts of Bitkeeper, including designing the network > protocols IIRC. It is my understanding that the network protocol I designed is no longer in use, and good riddance, it was my first try at such things and I didn't know what I was doing. But yes, I worked on Bitkeeper for about six months in 2000, so I do know what its architecture is like. zw ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-14 13:43 ` Linus Torvalds 2002-12-14 14:06 ` Tom Lord 2002-12-14 14:41 ` Neil Booth @ 2002-12-14 15:33 ` Momchil Velikov 2002-12-14 16:06 ` Linus Torvalds 2 siblings, 1 reply; 60+ messages in thread From: Momchil Velikov @ 2002-12-14 15:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: zack, gcc >>>>> "Linus" == Linus Torvalds <torvalds@transmeta.com> writes: Linus> My tree is often called the "official" tree, but what it Linus> really is is just a base tree that many people maintain Linus> their own forks from. This is fundamentally _more_ Err, haven't you noticed that this is the tree that many (all) people want to merge their forks into ? I think this is "what it really is". When evaluating a SCM tool, IMHO, the most important is the ease of merges - remove the need for later merges and any sophisticated "fork" tool boils down to a "cp -R". Linus> scalable than the CVS mess that is gcc, since it much more Linus> easily allows for very radical branches that do not need Linus> any centralized permission from me. I surely have a "fork" of GCC and I ain't got nobody's permission. Permission is needed not when forking, but when merging. Linus> Think of it this way: in gcc, the egcs split was a very Linus> painful thing. In Linux, those kinds of splits (people Linus> doing what they think is right, _without_ support from the Linus> official maintainers) is how _everything_ gets done. Linux Linus> is a "constantly forking" project, and that's how Linus> development very fundamentally happens. Linus> And a fork is a lot more scalable than a branch. It's also There's no difference, unless by "branch" and "fork" you mean the corresponding implementations in CVS and BK of one and the same development model. Linus> I would like to point out that Linux development has scaled Linus> a lot better than gcc, to a larger source base (it's 5+ M Linus> lines) with much more fundamental programming issues Linus> (concurrency etc). I will bet you that the Linux kernel Linus> merges are a lot bigger than the gcc ones, that development Linus> happens faster, and that there are more independent Linus> developers working on their own versions of Linux than Linus> there are of gcc. How about a different view on the subject ? IMHO a good metric of the complexity of a particular problem/domain is the overall ability of the mankind to cope with it. Thus, what you describe, may be due to the fact that people capable of kernel programming are a lot more than people capable of compiler programming, IOW, that most kernel programming requires rather basic programming knowledge, compared to most compilers programming. No ? Linus> Any source control system that has "write access" issues is Linus> fundamentally broken in my opinion. Your repository should Linus> be _yours_, and nobody elses. There is no "write access". Linus> There is only some way to expedite a merge between two Linus> repositories. The source control management should make it Linus> easy for you to export your changes to other repositories. A SCM should facilitate collaboration. Any SCM that requires single person's permission for modifications to the source base (e.g. by having only private repositories) is broken beyond repair and scalable exactly like a BitKeeper^WBKL. ~velco ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-14 15:33 ` Momchil Velikov @ 2002-12-14 16:06 ` Linus Torvalds 2002-12-15 3:59 ` Momchil Velikov ` (2 more replies) 0 siblings, 3 replies; 60+ messages in thread From: Linus Torvalds @ 2002-12-14 16:06 UTC (permalink / raw) To: Momchil Velikov; +Cc: zack, gcc On 15 Dec 2002, Momchil Velikov wrote: > > I surely have a "fork" of GCC and I ain't got nobody's permission. > Permission is needed not when forking, but when merging. But the point is, the "CVS mentality" means that a fork is harder to merge than a branch, and you often lose all development history when you merge a fork as a result of this (yeah, you can do a _lot_ of work, and try to also merge the SCM information on a fork merge, but it's almost never done because it is so painful). That's why I think the CVS mentality sucks. You have only branches that are "first-class" citizens, and they need write permission to create and are very expensive to create. Note: I'm not saying they are slow - that's just a particular CVS implementation detail. By "expensive" I mean that they cannot easily be created and thrown away, so with the "CVS mentality" those branches only get created for "big" things. And the "cheap" branches (private check-outs that don't need write permissions and can be done by others) lose all access to real source control except the ability to track the original. Two of the cheap branches cannot track each other in any sane way. And they have no revision history at all even internally. Yet it is the _cheap_ branches that should be the true first-class citizen. Potentially throw-away code that may end up being really really useful, but might just be a crazy pipe-dream. The experimental stuff that would _really_ want to have nice source control. And the "CVS mentality" totally makes that impossible. Subversion seems to be only a "better CVS", and hasn't gotten away from that mentality, which is sad. > Linus> And a fork is a lot more scalable than a branch. It's also > > There's no difference, unless by "branch" and "fork" you mean the > corresponding implementations in CVS and BK of one and the same > development model. Basically, by "branch" I mean something that fundamentally is part of the "official site". If a branch has to be part of the official site, then a branch is BY DEFINITION useless for 99% of developers. Such branches SHOULD NOT EXIST, since they are fundamentally against the notion of open development! A "fork" is something where people can just take the tree and do their own thing to it. Forking simply doesn't work with the CVS mentality, yet forking is clearly what true open development requires. > IMHO a good metric of the complexity of a particular problem/domain is > the overall ability of the mankind to cope with it. > > Thus, what you describe, may be due to the fact that people capable of > kernel programming are a lot more than people capable of compiler > programming, IOW, that most kernel programming requires rather basic > programming knowledge, compared to most compilers programming. > > No ? No. Sure, you can want to live in your own world, and try to keep the riff-raff out. That's the argument I hear from a lot of commercial developers ("we don't want random hackers playing with our code, we don't believe they can do as good a job as our highly paid professionals"). The argument is crap. It was crap for the kernel, it's crap for gcc. The only reason you think "anybody" can program kernels is the fact that Linux has _shown_ that anybody can do so. If gcc had less of a restrictive model for accepting patches, you'd have a lot more random people who would do them, I bet. But gcc development not only has the "CVS mentality", it has the "FSF disease" with the paperwork crap and copyright assignment crap. So you keep outsiders out, and then you say it's because they couldn't do what you can do anyway. Crap crap crap arguments. Trust me, there are more intelligent people out there than you believe, and they can do a hell of a lot better work than you currently allow them to do. Often with very little formal schooling. > A SCM should facilitate collaboration. Any SCM that requires single > person's permission for modifications to the source base (e.g. by > having only private repositories) is broken beyond repair and scalable > exactly like a BitKeeper^WBKL. But you don't _undestand_. BK allows hundreds of people to work on the same repository, if you want to. You just give them BK accounts on the machine, the same way you do with CVS. But that's not the scalable way to do things. The _scalable_ thing is to let everybody have their own tree, and _not_ have that "one common point" disease. You have the networking people working on their networking trees _without_ merging back to me, because they have their own development branches that simply aren't ready yet, for example. Having a single point for work like that is WRONG, and it's definitely _not_ scalable. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-14 16:06 ` Linus Torvalds @ 2002-12-15 3:59 ` Momchil Velikov 2002-12-15 8:26 ` Momchil Velikov 2002-12-15 17:09 ` Stan Shebs 2 siblings, 0 replies; 60+ messages in thread From: Momchil Velikov @ 2002-12-15 3:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: zack, gcc >>>>> "Linus" == Linus Torvalds <torvalds@transmeta.com> writes: >> A SCM should facilitate collaboration. Any SCM that requires >> single person's permission for modifications to the source base >> (e.g. by having only private repositories) is broken beyond >> repair and scalable exactly like a BitKeeper^WBKL. Linus> But you don't _undestand_. BK allows hundreds of people to Linus> work on the same repository, if you want to. You just give Linus> them BK accounts on the machine, the same way you do with Linus> CVS. Ah, I _do_ understand that this is possible. I _do_ understand very well that there's no "Linux Kernel Project", but there is "Linus's kernel tree" . You seem to not understand that there _is_ "GCC Project" as well as "GNU Project". Linus> But that's not the scalable way to do things. The Linus> _scalable_ thing is to let everybody have their own tree, In that case you don't need a SCM at all - you can do pretty well with few simple utilities to maintain a number of hardlinked trees. "cp -Rl" - branch, tag, fork, whatever "share <src> <dst>" - make identical files hardlinks "unshare <path>" - make <path> a file with one link (recursively) "dmerge <old> <new> <mine> - same as merge(1), but for trees "diff -r" - make a changeset "mail" - send a changeset "patch" - apply a changeset "rm -rf" - transaction rollback, so we have atomicity, see :) Linus> and _not_ have that "one common point" disease. You have This is not a disease, it is a _natural_ consequence of _collaboration_. And collaboration is an _absolute necessity_ when you are above certain degree of coupling between the components. A change in the network stack can hardly affect the operation of the ATA driver - however this is not the case in GCC [1]. Changes in particular phase _do_ affect other phases and this is not a coincidence - it is a consequence from the fact that GCC components are tightly coupled by the virtue of working on a common data structure. The degree module coupling can be characterized as follows [2] (from loose to tight): Degree Description ------ ----------- 0 Independent - no coupling 1 Data coupling - interaction between the modules is with simple, unstructured data types, via interface functions. 3 Template coupling - interface function parameters include structured data types. 4 Common data - when modules share common data structure. 5 Control - when one module controls others with flags, switches, command codes, etc. The Linux kernel tends to be in {0, 1, 3}. GCC tends to be in {4, 5}. IOW, GCC components are roughly from three to five times more tighly coupled than Linux kernel components. My point is that the Linux kernel development model [3], while obviously successful, it not necessarily adequate for other projects, particularly for GCC. ~velco [1] AFAICT, I'm not a GCC developer. [2] there may be other mertics, I've found that one adequate, though YMMV. [3] And, yes, I claim I fully understand it, at least I fully understand what _you_ want it to be. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-14 16:06 ` Linus Torvalds 2002-12-15 3:59 ` Momchil Velikov @ 2002-12-15 8:26 ` Momchil Velikov 2002-12-15 12:02 ` Linus Torvalds 2002-12-15 17:09 ` Stan Shebs 2 siblings, 1 reply; 60+ messages in thread From: Momchil Velikov @ 2002-12-15 8:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: zack, gcc >>>>> "Linus" == Linus Torvalds <torvalds@transmeta.com> writes: Linus> Crap crap crap arguments. Trust me, there are more Linus> intelligent people out there than you believe, and they can Linus> do a hell of a lot better work than you currently allow Linus> them to do. Often with very little formal schooling. But yes, there are lots of intelligent people out there, but while intelligence is usually sufficient for working on a kernel, working on a compiler requires _knowledge_ (no matter formal or not). ~velco ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-15 8:26 ` Momchil Velikov @ 2002-12-15 12:02 ` Linus Torvalds 2002-12-15 14:16 ` Momchil Velikov 0 siblings, 1 reply; 60+ messages in thread From: Linus Torvalds @ 2002-12-15 12:02 UTC (permalink / raw) To: Momchil Velikov; +Cc: zack, gcc On 15 Dec 2002, Momchil Velikov wrote: > >>>>> "Linus" == Linus Torvalds <torvalds@transmeta.com> writes: > > Linus> Crap crap crap arguments. Trust me, there are more > Linus> intelligent people out there than you believe, and they can > Linus> do a hell of a lot better work than you currently allow > Linus> them to do. Often with very little formal schooling. > > But yes, there are lots of intelligent people out there, but while > intelligence is usually sufficient for working on a kernel, working on > a compiler requires _knowledge_ (no matter formal or not). Blaah. I _bet_ that is not true. I actually had my own gcc tree for Linux kernel development back when I started, mostly because I just enjoyed it and found the compiler interesting. I added builtins for things like memcpy() etc because I cared (and it was more fun that writing assembly language library routines), and because gcc at that time didn't have hardly any support for things like that. I didn't understand the whole compiler, BUT THAT DID NOT MATTER. The same way that most Linux kernel developers don't understand the whole kernel, and do not even need to. Sure, you need people with specialized knowledge for specialized areas (designing the big picture etc), but that's a small small part of it. To paraphrase, programming is often 1% inspiration and 99% perspiration. In short, your argument is elitist and simply _wrong_. It's true that to create a whole compiler you need a whole lot of knowledge, but that's true of any project - including operating systems. But that doesn't matter, because there isn't "one" person who needs to know everything. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-15 12:02 ` Linus Torvalds @ 2002-12-15 14:16 ` Momchil Velikov 2002-12-15 15:20 ` Pop Sébastian 0 siblings, 1 reply; 60+ messages in thread From: Momchil Velikov @ 2002-12-15 14:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: zack, gcc >>>>> "Linus" == Linus Torvalds <torvalds@transmeta.com> writes: Linus> In short, your argument is elitist and simply _wrong_. *shrug* That's my explanation to what I observe - more people develop kernels than compilers. Particular compiler's development model or patch review and acceptance policy do not matter at all - if they are an obstacle people's creativity would be redirected somewhere else. I may be wrong. But I'm yet to hear a more credible explanation for this simple fact. ~velco ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-15 14:16 ` Momchil Velikov @ 2002-12-15 15:20 ` Pop Sébastian 2002-12-15 16:09 ` Linus Torvalds 0 siblings, 1 reply; 60+ messages in thread From: Pop Sébastian @ 2002-12-15 15:20 UTC (permalink / raw) To: Momchil Velikov; +Cc: Linus Torvalds, zack, gcc On Sun, Dec 15, 2002 at 11:41:04PM +0200, Momchil Velikov wrote: > >>>>> "Linus" == Linus Torvalds <torvalds@transmeta.com> writes: > > Linus> In short, your argument is elitist and simply _wrong_. > > *shrug* That's my explanation to what I observe - more people develop > kernels than compilers. Particular compiler's development model or > patch review and acceptance policy do not matter at all - if they are > an obstacle people's creativity would be redirected somewhere else. > > I may be wrong. But I'm yet to hear a more credible explanation for > this simple fact. > Maybe it's true because for writing compiler optimizations one should have some knowledge in mathematics. Most of the new techniques developped for optimizing compilers use abstract representations based on mathematical objects (such as graphs, lattices, vectorial spaces, polyhedras, ...) Maybe we're wrong but the percentage of mathematicians who contribute to GCC could be slightly bigger than for LK. Sebastian ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-15 15:20 ` Pop Sébastian @ 2002-12-15 16:09 ` Linus Torvalds 2002-12-15 16:49 ` Bruce Stephens 2002-12-16 17:22 ` source mgt. requirements solicitation Mike Stump 0 siblings, 2 replies; 60+ messages in thread From: Linus Torvalds @ 2002-12-15 16:09 UTC (permalink / raw) To: Pop Sébastian; +Cc: Momchil Velikov, zack, gcc On Sun, 15 Dec 2002, Pop Sébastian wrote: > > > > I may be wrong. But I'm yet to hear a more credible explanation for > > this simple fact. > > > Maybe it's true because for writing compiler optimizations one > should have some knowledge in mathematics. Naah. It's simple - kernels are just sexier. Seriously, I think it's just that a kernel tends to have more different _kinds_ of problems, and thus tend to attract different kinds of people, and more of them. Compilers are complicated, no doubt about that, but the complicated stuff tends to be mostly of the same type (ie largely fairly algorithmic transformations for the different optimization passes). In kernels, you have many _different_ kinds of issues, and as a result you'll find more people who are interested in one of them. So you'll find people who care about filesystems, or people who care about memory management, or people who find it interesting to do concurrency work or IO paths. That is obviously also why the kernel ends up being a lot of lines of code. I think it's about an order of magnitude bigger in size than all of gcc - not because it is an order of magnitude more complex, obviously, but simply because it has many more parts to it. And that directly translates to more pieces that people can cut their teeth on. > Maybe we're wrong but the percentage of mathematicians who contribute > to GCC could be slightly bigger than for LK. I don't think you're wrong per se. The "transformation" kind of code is just much more common in a compiler, and the kind of people who work on it are more likely to be the mathematical kind of people. It's not the only part of gcc, obviously (I think parsing is underrated, and I'm happy that the preprocessing front-end has gotten so much attention in the last few years), but it's one of the bigger parts. And people clearly seek out projects that satisfy their interests. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-15 16:09 ` Linus Torvalds @ 2002-12-15 16:49 ` Bruce Stephens 2002-12-15 16:59 ` Linus Torvalds 2002-12-16 17:22 ` source mgt. requirements solicitation Mike Stump 1 sibling, 1 reply; 60+ messages in thread From: Bruce Stephens @ 2002-12-15 16:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pop Sébastian, Momchil Velikov, zack, gcc Linus Torvalds <torvalds@transmeta.com> writes: [...] > That is obviously also why the kernel ends up being a lot of lines of > code. I think it's about an order of magnitude bigger in size than all of > gcc - not because it is an order of magnitude more complex, obviously, but > simply because it has many more parts to it. And that directly translates > to more pieces that people can cut their teeth on. The gcc tree I have seems to have 4145483 lines, whereas the 2.4.20 kernel seems to have 4841227 lines. (Not lines of code; this includes all files in the unbuilt tree (including CVS directories for gcc, although this is probably trivial), and it includes comments and files which are not code. In the gcc case, it may include some generated files; I'm not sure how Ada builds nowadays.) Excluding the gcc testsuites, gcc has 3848080 lines. So gcc (the whole of gcc, with all its languages) seems to be a bit smaller than the kernel, but probably not by an order of magnitude. This is reenforced by "du -s": the gcc tree takes up 187144K, the kernel takes up 170676K. None of this is particularly precise, obviously, but it points to the two projects (with all their combined bits) being not too dissimilar in size. Which is a possibly interesting coincidence. (The 2.5 kernel may be much bigger; I haven't looked. The tarballs don't look *that* much bigger, however.) [...] ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-15 16:49 ` Bruce Stephens @ 2002-12-15 16:59 ` Linus Torvalds 2002-12-15 18:10 ` Bruce Stephens 2002-12-16 8:32 ` Diego Novillo 0 siblings, 2 replies; 60+ messages in thread From: Linus Torvalds @ 2002-12-15 16:59 UTC (permalink / raw) To: Bruce Stephens; +Cc: Pop Sébastian, Momchil Velikov, zack, gcc On Mon, 16 Dec 2002, Bruce Stephens wrote: > > The gcc tree I have seems to have 4145483 lines Hmm, might be my mistake. I only have an old and possibly pared-down tree online. However, I also counted lines differently: I only counted *.[chS] files, and you may have counted everything (the gcc changelogs and .texi files in particular are _huge_ if you have a full complement of them there). What does "find . -name '*.[chS]' | xargs cat | wc" say? (But you're right - I should _at_least_ count the .md files too, so my count was at least as bogus as I suspect yours was) Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-15 16:59 ` Linus Torvalds @ 2002-12-15 18:10 ` Bruce Stephens 2002-12-16 8:32 ` Diego Novillo 1 sibling, 0 replies; 60+ messages in thread From: Bruce Stephens @ 2002-12-15 18:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pop Sébastian, Momchil Velikov, zack, gcc Linus Torvalds <torvalds@transmeta.com> writes: > On Mon, 16 Dec 2002, Bruce Stephens wrote: >> >> The gcc tree I have seems to have 4145483 lines > > Hmm, might be my mistake. I only have an old and possibly pared-down tree > online. However, I also counted lines differently: I only counted *.[chS] > files, and you may have counted everything (the gcc changelogs and .texi > files in particular are _huge_ if you have a full complement of them > there). The ChangeLog files give a total of 306201 lines. texi files (and info files) add another 426482. So that's a lot, yes. (In terms of the size of the project, probably the texi files at least ought to be counted, just as the stuff in Documentation ought to be counted in some way for the Linux kernel. But not the generated .info files.) > What does "find . -name '*.[chS]' | xargs cat | wc" say? 1445809 5700810 43690421 But that doesn't include the ada or java files (or the C++ standard library). Quite possibly it doesn't include some Objective C runtime source, too. > (But you're right - I should _at_least_ count the .md files too, so > my count was at least as bogus as I suspect yours was) Sure. It's all pretty meaningless---I think the two projects happen to be approximately the same size (with the Linux kernel bigger), but I don't think it's anything other than coincidence. gcc/ada accounts for about 800K lines, for example, and that's relatively recent, IIRC. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-15 16:59 ` Linus Torvalds 2002-12-15 18:10 ` Bruce Stephens @ 2002-12-16 8:32 ` Diego Novillo 2002-12-17 3:36 ` Pop Sébastian 1 sibling, 1 reply; 60+ messages in thread From: Diego Novillo @ 2002-12-16 8:32 UTC (permalink / raw) To: Linus Torvalds Cc: Bruce Stephens, Pop Sébastian, Momchil Velikov, zack, gcc On Sun, 15 Dec 2002, Linus Torvalds wrote: > On Mon, 16 Dec 2002, Bruce Stephens wrote: > > > > The gcc tree I have seems to have 4145483 lines > > Hmm, might be my mistake. I only have an old and possibly pared-down tree > online. However, I also counted lines differently: I only counted *.[chS] > files, and you may have counted everything (the gcc changelogs and .texi > files in particular are _huge_ if you have a full complement of them > there). > Output of sloccount on a relatively recent snapshot: ----------------------------------------------------------------------------- SLOC Directory SLOC-by-Language (Sorted) 1274221 gcc ansic=839349,ada=298101,cpp=73596,yacc=23251,asm=20244, fortran=6934,exp=4706,sh=4430,objc=2751,lex=559,perl=189,awk=111 225571 libjava java=131300,cpp=65054,ansic=27198,exp=1213,perl=782, awk=24 67452 libstdc++-v3 cpp=49425,ansic=17270,sh=525,exp=193,awk=39 34729 boehm-gc ansic=25682,sh=7631,cpp=972,asm=444 21798 libiberty ansic=21495,perl=283,sed=20 11657 top_dir sh=11657 10376 libbanshee ansic=10376 10358 libf2c ansic=10037,fortran=321 9581 zlib ansic=8309,asm=712,cpp=560 8904 libffi ansic=5545,asm=3359 8002 libobjc ansic=7233,objc=397,cpp=372 3721 contrib cpp=2306,sh=935,perl=324,awk=67,lisp=59,ansic=30 3074 libmudflap ansic=3074 2506 fastjar ansic=2325,sh=181 1463 include ansic=1432,cpp=31 667 maintainer-scripts sh=667 0 config (none) 0 CVS (none) 0 INSTALL (none) Totals grouped by language (dominant language first): ansic: 979355 (57.81%) ada: 298101 (17.60%) cpp: 192316 (11.35%) java: 131300 (7.75%) sh: 26026 (1.54%) asm: 24759 (1.46%) yacc: 23251 (1.37%) fortran: 7255 (0.43%) exp: 6112 (0.36%) objc: 3148 (0.19%) perl: 1578 (0.09%) lex: 559 (0.03%) awk: 241 (0.01%) lisp: 59 (0.00%) sed: 20 (0.00%) Total Physical Source Lines of Code (SLOC) = 1,694,080 Development Effort Estimate, Person-Years (Person-Months) = 491.37 (5,896.47) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 5.64 (67.73) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 87.06 Total Estimated Cost to Develop = $ 66,377,705 (average salary = $56,286/year, overhead = 2.40). SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL. Please credit this data as "generated using 'SLOCCount' by David A. Wheeler." ----------------------------------------------------------------------------- Diego. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-16 8:32 ` Diego Novillo @ 2002-12-17 3:36 ` Pop Sébastian 2002-12-17 13:14 ` Tom Lord 0 siblings, 1 reply; 60+ messages in thread From: Pop Sébastian @ 2002-12-17 3:36 UTC (permalink / raw) To: Diego Novillo; +Cc: Linus Torvalds, Bruce Stephens, Momchil Velikov, zack, gcc For comparison I've ran sloccount on LK: $ sloccount ./linux-2.5.52 [...] SLOC Directory SLOC-by-Language (Sorted) 1664092 drivers ansic=1659643,asm=1949,yacc=1177,perl=813,lex=352, sh=158 678895 arch ansic=507796,asm=170311,sh=624,awk=119,perl=45 365490 include ansic=364696,cpp=794 340797 fs ansic=340797 261122 sound ansic=260940,asm=182 193052 net ansic=193052 14814 kernel ansic=14814 13523 mm ansic=13523 11086 scripts ansic=6830,perl=1339,cpp=1218,yacc=531,tcl=509,lex=359, sh=285,awk=15 6988 crypto ansic=6988 6083 lib ansic=6083 2740 ipc ansic=2740 1787 init ansic=1787 1748 Documentation sh=898,ansic=567,lisp=218,perl=65 1081 security ansic=1081 119 usr ansic=119 0 top_dir (none) Totals grouped by language (dominant language first): ansic: 3381456 (94.89%) asm: 172442 (4.84%) perl: 2262 (0.06%) cpp: 2012 (0.06%) sh: 1965 (0.06%) yacc: 1708 (0.05%) lex: 711 (0.02%) tcl: 509 (0.01%) lisp: 218 (0.01%) awk: 134 (0.00%) Total Physical Source Lines of Code (SLOC) = 3,563,417 Development Effort Estimate, Person-Years (Person-Months) = 1,072.73 (12,872.75) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 7.59 (91.12) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 141.27 Total Estimated Cost to Develop = $ 144,911,083 (average salary = $56,286/year, overhead = 2.40). SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL. Please credit this data as "generated using 'SLOCCount' by David A. Wheeler." ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-17 3:36 ` Pop Sébastian @ 2002-12-17 13:14 ` Tom Lord 2002-12-17 15:28 ` Itching and scratching (Re: source mgt. requirements solicitation) Stan Shebs 0 siblings, 1 reply; 60+ messages in thread From: Tom Lord @ 2002-12-17 13:14 UTC (permalink / raw) To: pop; +Cc: dnovillo, torvalds, bruce, velco, zack, gcc GCC: Total Estimated Cost to Develop = $ 66,377,705 LK: Total Estimated Cost to Develop = $ 144,911,083 and: (average salary = $56,286/year, overhead = 2.40). (That's an appallingly low average salary, btw., and a needlessly large overhead. If we're thinking of a nearly 20 year average, maybe it's not _too_ badly removed from reality, but it's not a realistic basis for planning moving forward.) Someone did a sloccount run on a bunch of my 1-man-effort software, developed over about 10 calendar years, and the person-years count was surprisingly accurate. In general, there is something of a business crisis in the free software world. It's particularly noticable around businesses based on linux distributions. Those distributions represent a huge amount of unpaid work. Businesses using them got some free help bootstrapping themselves into now favorable positions. So, not only did they get the unpaid work for free (as in beer), they traded that for favorable market positions that raise the barrier of entry to new competitors. While in theory "anyone" can start selling their own distro, in reality, there's only a few established companies and investors with deep pockets who have any chance in this area. So what's the crisis? Well, those freeloaders aren't exactly being agressive about figuring out how to sustain the free software movement with R&D investment. Companies spend a little on public projects, sure, but you can count the number of employees participating, industry wide, on the fingers of a few 10s of people and (total, industry-wide) corporate donations to code-generating individuals and NPOs with no more than 7 significant digits per year. When they do spend on public projects, it is most often for very narrow tactical purposes -- not to make the ecology of projects healthier overall. In significant proportions, they spend R&D money on entirely in-house projects that, while rooted in free software, benefit nobody but the companies themselves. You know, it's easy to make a few quarters for your business unit when you, in essence, cheat. So the crisis is that in the medium term, as engineering businesses go, these aren't sustainable models. And when they start leading volunteers and soaking up volunteer work for their own aims, and capturing mind-share in the press, one has to start to wonder whether they aren't, overall, doing more harm than good. And then there's some social justice and labor issues.... Bill Gates, when he says that free software is a threat to innovation, is currently correct. UnAmerican? You bet! And, btw, surprise!: In the free software world, corporate GCC hackers are the relative fat cats. Go figure. -t ^ permalink raw reply [flat|nested] 60+ messages in thread
* Itching and scratching (Re: source mgt. requirements solicitation) 2002-12-17 13:14 ` Tom Lord @ 2002-12-17 15:28 ` Stan Shebs 2002-12-17 16:07 ` Tom Lord 0 siblings, 1 reply; 60+ messages in thread From: Stan Shebs @ 2002-12-17 15:28 UTC (permalink / raw) To: Tom Lord; +Cc: gcc Tom Lord wrote: > >And, btw, surprise!: In the free software world, corporate GCC hackers >are the relative fat cats. Go figure. > That's because GCC hackers are doing things that are worth serious amounts of money to people that have it to spend. Apple has signed up with GCC because it solves more of Apple's problems more cheaply than the several proprietary possibilities, and having made it part of Mac OS X, Apple's overall corporate health is now partly dependent on GCC continuing to be a good compiler, and on fixing remaining problems, such as slowness. If you were able to convince Apple mgmt that you could make GCC 10x faster not using precompiled headers, I think you could name your price and get hired the same day; that's how important the problem is to Apple. (You're going to have to be really convincing though; our mgmt has listened to a hundred pitches already.) Speaking more generally, the folks that get paid to do free software are the ones who are solving the problems of people with the money. It's up to us to be clever enough to figure out to solve the specific problems in a way that improves architecture and infrastructure. That was a key but underappreciated aspect of Cygnus' development contracts; we would always try to go after projects that included infrastructure improvement, but if necessary we would do something that was random but lucrative and use the profits to pay for generic work. To put it more simply, find a rich person with an itch, and offer to scratch it for them. :-) Stan ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Itching and scratching (Re: source mgt. requirements solicitation) 2002-12-17 15:28 ` Itching and scratching (Re: source mgt. requirements solicitation) Stan Shebs @ 2002-12-17 16:07 ` Tom Lord 2002-12-17 15:46 ` Stan Shebs 0 siblings, 1 reply; 60+ messages in thread From: Tom Lord @ 2002-12-17 16:07 UTC (permalink / raw) To: shebs; +Cc: gcc > And, btw, surprise!: In the free software world, corporate > GCC hackers are the relative fat cats. Go figure. That's because GCC hackers are doing things that are worth serious amounts of money to people that have it to spend. I didn't mean that it is wrong for you to be well paid. I meant that you have a lot of clout. Speaking more generally, the folks that get paid to do free software are the ones who are solving the problems of people with the money. It's up to us to be clever enough to figure out to solve the specific problems in a way that improves architecture and infrastructure. That was a key but underappreciated aspect of Cygnus' development contracts; we would always try to go after projects that included infrastructure improvement, but if necessary we would do something that was random but lucrative and use the profits to pay for generic work. Was it customers who underappreciated that? or was that a selling point? To put it more simply, find a rich person with an itch, and offer to scratch it for them. :-) Ah, well, "The Cabots speak only to Lodges and the Lodges speak only to God." -t ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Itching and scratching (Re: source mgt. requirements solicitation) 2002-12-17 16:07 ` Tom Lord @ 2002-12-17 15:46 ` Stan Shebs 0 siblings, 0 replies; 60+ messages in thread From: Stan Shebs @ 2002-12-17 15:46 UTC (permalink / raw) To: Tom Lord; +Cc: gcc Tom Lord wrote: > > Speaking more generally, the folks that get paid to do free > software are the ones who are solving the problems of people > with the money. It's up to us to be clever enough to figure > out to solve the specific problems in a way that improves > architecture and infrastructure. That was a key but > underappreciated aspect of Cygnus' development contracts; we > would always try to go after projects that included > infrastructure improvement, but if necessary we would do > something that was random but lucrative and use the profits to > pay for generic work. > >Was it customers who underappreciated that? or was that a selling >point? > Sometimes it was a selling point, sometimes the concept was too subtle for the customer to grasp. In the mid-90s, a good percentage of time still had to be spent explaining free software, reassuring people that GCC didn't cause its output to be GPLed, etc. It was interesting to see how much variation there was among customers, and also how important it was to have actual sales people in the process - engineers left to themselves would rathole on side issues and never get around to the actual dealmaking. Stan ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-15 16:09 ` Linus Torvalds 2002-12-15 16:49 ` Bruce Stephens @ 2002-12-16 17:22 ` Mike Stump 1 sibling, 0 replies; 60+ messages in thread From: Mike Stump @ 2002-12-16 17:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pop Sébastian, Momchil Velikov, zack, gcc On Sunday, December 15, 2002, at 03:45 PM, Linus Torvalds wrote: > That is obviously also why the kernel ends up being a lot of lines of > code. I think it's about an order of magnitude bigger in size than all > of > gcc bash-2.05a$ find gcc -type f -print | xargs cat | wc -l 4084979 [ ducking ] ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-14 16:06 ` Linus Torvalds 2002-12-15 3:59 ` Momchil Velikov 2002-12-15 8:26 ` Momchil Velikov @ 2002-12-15 17:09 ` Stan Shebs 2 siblings, 0 replies; 60+ messages in thread From: Stan Shebs @ 2002-12-15 17:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Momchil Velikov, zack, gcc Linus Torvalds wrote: >[...] > > If gcc had less of a restrictive model >for accepting patches, you'd have a lot more random people who would do >them, I bet. > I can assure you that there are lots of random GCC patches and forks out there, some of them drastically divergent from the main version. (I myself have been responsible for a few of them.) Nobody is being stopped from forking GCC and promoting their own versions. A large number of GCC developers have chosen to cooperate more closely on a single tree because we've empirically determined that we get a better quality compiler that way. Choice of source management systems is a minor detail, not a make-or-break issue. >But gcc development not only has the "CVS mentality", it has >the "FSF disease" with the paperwork crap and copyright assignment crap. > If AT&T had come down on GNU in the 80s the way that they did on BSD in the early 90s, you wouldn't have had any software to go with your kernel. RMS is much smarter than you seem to think. Stan ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-09 17:05 ` Walter Landry 2002-12-09 17:10 ` Joseph S. Myers @ 2002-12-09 17:50 ` Zack Weinberg 1 sibling, 0 replies; 60+ messages in thread From: Zack Weinberg @ 2002-12-09 17:50 UTC (permalink / raw) To: gcc I'm keeping around a lot of context. Scroll down. >> > > 0d. The data stored in the repository cannot be modified by >> > > unprivileged local users except by going through the version >> > > control system. Presently I could take 'vi' to one of the ,v >> > > files in /cvs/gcc and break it thoroughly, or sneak something into >> > > the file content, and leave no trace. >> > >> > There is no interaction with root, so if you own the archive, you can >> > always do what you want. To get anything approaching this, you have >> > to deal with PGP signatures, SHA hashes, and the like. OpenCM is >> > probably the only group (including BitKeeper) that even comes close to >> > doing this right. >> >> This sort of thing has been done simply by a modified setuid (to a cvs >> user, not root) cvs binary so users can't access the repository directly, >> only through that binary. More generically, with a reasonable protocol >> for local repository access it should be possible to use GNU userv to >> separate the repository from the users. > > This is a different security model. Arch is secure because it doesn't > depend on having priviledged access. For example, there is an "rm > -rf" command built into arch. > > I have a feeling that you are thinking of how CVS handles things, with > a centralized server. Part of the whole point of arch is that there > is no centralized server. So, for example, I can develop arch > independently of whether Tom thinks that I am worthy enough to do so. > I can screw up my archive as much as I want (and I have), and Tom can > be blissfully unaware. Easy merging is what makes this possible. > > So you don't, in general, have a repository that is writeable by more > than one person. Let me be specific about the problem I'm worried about. As Joseph pointed out, GCC development is and will be centered around a 'master' server. If we wind up using a distributed system, individual developers will take advantage of it to do offline work, but the master repository will still act as a communication nexus between us all, and official releases will be cut from there. I doubt anyone will do releases except from there.[1] The security of this master server is mission-critical. The present situation, with CVS pserver enabled for read-only anonymous access, and write privilege available via CVS-over-ssh, has two potentially exploitable vulnerabilities that should be easy to address in a new system. _Imprimis_, the CVS pserver requires write privileges on the CVS repository directories, even if it is providing only read access. Therefore, if the 'anoncvs' user is somehow compromised -- for instance, by a buffer overflow bug in the pserver itself -- the attacker could potentially modify any of the ,v files stored in the repository. This was what I was talking about with my point 0c. It sounds like all the replacements for CVS have addressed this, by allowing the anoncvs-equivalent server process to run as a user that doesn't have OS-level write privileges on the repository. _Secundus_, CVS-over-ssh operates by invoking 'cvs server' on the repository host -- running under the user ID of the invoker, who must have an account on the repository host. It can't perform any operations that the invoking user can't. Which means that the invoking user must also have OS-level write privileges on the repository. Now, such users are _supposed_ to be able to check in changes to the repository, but they _aren't_ supposed to be able to modify the ,v files with a text editor. The distinction is crucial. If the account of a user with write privileges is compromised, and used to check in a malicious change, the version history is intact, the change will be easily detected, and we can simply back out the malice. If the account of a user with write privileges is compromised and used to hand-edit a malicious change into a ,v file, it's quite that this will go undetected until after half the binaries on the planet are untrustworthy. It is this latter scenario I would like to be impossible. There are several possible ways to do that. One way is the way Perforce does it: _all_ access, even local access, goes through p4d, and p4d can run under its own user ID and be the only user ID with write access to the repository. Another way, and perhaps a cleverer one, is OpenCM's way, where the (SHA of) the file content is the file's identity, so a malicious change will not even be picked up. (Please correct me if I misunderstand.) Of course, that provides no insulation against an attacker using a compromised account to execute "rm -fr /path/to/repository", but *that* problem is best solved with backups, because a disk failure could have the same effect and there's nothing software can do about that. zw ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-09 15:27 ` Joseph S. Myers 2002-12-09 17:05 ` Walter Landry @ 2002-12-11 1:11 ` Branko Čibej 1 sibling, 0 replies; 60+ messages in thread From: Branko Čibej @ 2002-12-11 1:11 UTC (permalink / raw) To: Joseph S. Myers; +Cc: gcc Joseph S. Myers wrote: >On Mon, 9 Dec 2002, Walter Landry wrote: > > > >>arch doesn't interact at all with root. The remote repositories are >>all done with sftp, ftp, and http, which is as secure as those servers >>are. >> >> > >Is this - for anonymous access - _plain_ HTTP, or HTTP + WebDAV + DeltaV >which svn uses? One problem there was with SVN - it may have been fixed >by now, and a fix would be necessary for it to be usable for GCC - was its >use of HTTP and HTTPS (for write access); these tend to be heavily >controlled by firewalls and the ability to tunnel over SSH (with just that >one port needing to be open) would be necessary. "Transparent" proxies >may pass plain HTTP OK, but not the WebDAV/DeltaV extensions SVN needs. > There is now a new repository access layer in place that can be easily piped over SSH and doesn't require Apache on the server side. It's not as well tested yet, of course. -- Brane Äibej <brane@xbc.nu> http://www.xbc.nu/brane/ ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 14:18 ` source mgt. requirements solicitation Tom Lord 2002-12-08 14:56 ` DJ Delorie 2002-12-08 16:09 ` Phil Edwards @ 2002-12-08 18:32 ` Joseph S. Myers 2002-12-11 2:48 ` Branko Čibej 2 siblings, 1 reply; 60+ messages in thread From: Joseph S. Myers @ 2002-12-08 18:32 UTC (permalink / raw) To: Tom Lord; +Cc: gcc On Sun, 8 Dec 2002, Tom Lord wrote: > 1) There are frequent reports on this list of glitches with > the current CVS repository. The most common problem relates to the fileattr performance optimization. There are known causes, and a known workaround (remove the cache files when the problem occurs). Other problems (occassional repository corruption) may often relate to hardware problems. BK uses extensive checksumming to detect such failures early (since early detection means backups can more easily be found); the RCS format has no checksums. I don't know what svn or arch do here. There are particular issues that are relevant to GCC (and other CVS users) that SVN addresses or intends to address as a "better CVS": * Proper file renaming support. * Atomic checkins across multiple files (rarely a problem). * O(1) performance of tag and branch operations. (A major issue for the snapshot script; when the machine is loaded it can take hours to tag the tree with the per-snapshot tag, remove the old gcc_latest snapshot tag and apply the new one (writing to every ,v file several times). Part of the problem, however, is waiting on locks in each directory, and reducing the extent to which locks are needed (e.g. avoiding them for anonymous checkouts) and the time for which they are held would help.) * Performance of operations (checkout, update, ...) on branches (reading every file in the tree; the cache mentioned above avoids this problem for HEAD only). * cvs update -d and modules (more an issue with merged gcc and src trees) (I don't know whether svn does modules yet). I haven't seen an obvious need for major changes in branch merging or distributed repositories, but people making heavy use of branches may well have a use for better tools there. It's just that something (a) supporting file renames and (b) having much better performance (including on branches) and (c) having better reliability would solve most of the problems for most of the users. (Not all problems for all users, better tools aiming towards that are still useful if they don't cause more trouble in the common case. Checkout, update, diff, annotate, commit shouldn't be made any more complicated.) > 2) GCC, more than many projects, relies on a distributed > testing effort, which mostly applies to the HEAD revision > and to release candidates. Most of this testing is done > by hand. Better tools are useful here (I always want more testing and more testcases) but it isn't much to do with version control, rather with processing the test results into a coherent form (there used to be a database driven from gcc-testresults) and getting people to fix regressions they cause (not a problem lately, but there have been long periods with the regression tester showing regressions staying for weeks). > 7) Questions about which patches relate to which issues in the > issue database are fairly common. Better tools may help if they encourage volunteers to do the boring task of going through incoming bug reports and checking they include enough information to reproduce them and can be reproduced. But that's a matter of the long-delayed Bugzilla transition (delayed by human time to set up a new machine, not by lack of better version control) possibly linked with some system for bug reports to have enough well-defined fields for automatic testing. > 9) Distributed testing occurs mostly on the HEAD -- which > means that the HEAD breaks on various targets, fairly > frequently. It means that HEAD breakage is frequently detected. > 11) Some efforts, such as overhauling the build process, will > probably benefit from a switch to rev ctl. systems that > support tree rearrangements. I think it's better to just do renames the CVS way (delete and add) now, rather than waiting, then when changing make the repository conversion tool smart enough to handle most of the renames that have taken place in the GCC repository. Better tools such as svn or arch may be useful, but we're not CM developers so it's just a matter of evaluating such tools when they are ready (do all the common things CVS does just as easily, are reliable enough, have good enough (preferably better than CVS) performance for what we do, solve some of the problems with CVS). Indications (such as above) of problems with CVS for GCC aren't particularly important, since the main problems with CVS are well known and affect GCC much as they affect other projects. -- Joseph S. Myers jsm28@cam.ac.uk ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: source mgt. requirements solicitation 2002-12-08 18:32 ` Joseph S. Myers @ 2002-12-11 2:48 ` Branko Čibej 0 siblings, 0 replies; 60+ messages in thread From: Branko Čibej @ 2002-12-11 2:48 UTC (permalink / raw) To: Joseph S. Myers; +Cc: gcc Joseph S. Myers wrote: >* cvs update -d and modules (more an issue with merged gcc and src trees) >(I don't know whether svn does modules yet). > > Subversion does modules a lot better than CVS, if I do say so myself. See http://svnbook.red-bean.com/book.html#svn-ch-6-sect-3 -- Brane Äibej <brane@xbc.nu> http://www.xbc.nu/brane/ ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Itching and scratching (Re: source mgt. requirements solicitation)
@ 2002-12-18 21:44 Robert Dewar
0 siblings, 0 replies; 60+ messages in thread
From: Robert Dewar @ 2002-12-18 21:44 UTC (permalink / raw)
To: lord, shebs; +Cc: gcc
> Speaking more generally, the folks that get paid to do free software
> are the ones who are solving the problems of people with the money.
> It's up to us to be clever enough to figure out to solve the specific
> problems in a way that improves architecture and infrastructure.
> That was a key but underappreciated aspect of Cygnus' development
> contracts; we would always try to go after projects that included
> infrastructure improvement, but if necessary we would do something
> that was random but lucrative and use the profits to pay for
> generic work.
For the record, this is very similar to ACT's approach to development
contracts.
^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2002-12-19 2:39 UTC | newest] Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-12-08 7:13 on reputation and lines and putting things places (Re: gcc branches?) Robert Dewar 2002-12-08 14:18 ` source mgt. requirements solicitation Tom Lord 2002-12-08 14:56 ` DJ Delorie 2002-12-08 15:02 ` David S. Miller 2002-12-08 15:45 ` Bruce Stephens 2002-12-08 16:52 ` David S. Miller 2002-12-08 15:11 ` Bruce Stephens 2002-12-08 16:24 ` Joseph S. Myers 2002-12-08 16:47 ` Tom Lord 2002-12-08 22:20 ` Craig Rodrigues 2002-12-08 16:09 ` Phil Edwards 2002-12-08 19:13 ` Zack Weinberg 2002-12-09 10:33 ` Phil Edwards 2002-12-09 11:06 ` Joseph S. Myers 2002-12-09 9:42 ` Zack Weinberg 2002-12-09 11:00 ` Jack Lloyd 2002-12-09 15:10 ` Walter Landry 2002-12-09 15:27 ` Joseph S. Myers 2002-12-09 17:05 ` Walter Landry 2002-12-09 17:10 ` Joseph S. Myers 2002-12-09 18:27 ` Walter Landry 2002-12-09 19:16 ` Joseph S. Myers 2002-12-10 0:27 ` Zack Weinberg 2002-12-10 0:41 ` Tom Lord 2002-12-10 12:05 ` Phil Edwards 2002-12-10 19:44 ` Mark Mielke 2002-12-10 19:57 ` David S. Miller 2002-12-10 20:02 ` Phil Edwards 2002-12-10 23:07 ` David S. Miller 2002-12-11 6:31 ` Phil Edwards 2002-12-14 13:43 ` Linus Torvalds 2002-12-14 14:06 ` Tom Lord 2002-12-14 17:44 ` Linus Torvalds 2002-12-14 19:45 ` Tom Lord 2002-12-14 14:41 ` Neil Booth 2002-12-14 15:47 ` Zack Weinberg 2002-12-14 15:33 ` Momchil Velikov 2002-12-14 16:06 ` Linus Torvalds 2002-12-15 3:59 ` Momchil Velikov 2002-12-15 8:26 ` Momchil Velikov 2002-12-15 12:02 ` Linus Torvalds 2002-12-15 14:16 ` Momchil Velikov 2002-12-15 15:20 ` Pop Sébastian 2002-12-15 16:09 ` Linus Torvalds 2002-12-15 16:49 ` Bruce Stephens 2002-12-15 16:59 ` Linus Torvalds 2002-12-15 18:10 ` Bruce Stephens 2002-12-16 8:32 ` Diego Novillo 2002-12-17 3:36 ` Pop Sébastian 2002-12-17 13:14 ` Tom Lord 2002-12-17 15:28 ` Itching and scratching (Re: source mgt. requirements solicitation) Stan Shebs 2002-12-17 16:07 ` Tom Lord 2002-12-17 15:46 ` Stan Shebs 2002-12-16 17:22 ` source mgt. requirements solicitation Mike Stump 2002-12-15 17:09 ` Stan Shebs 2002-12-09 17:50 ` Zack Weinberg 2002-12-11 1:11 ` Branko Čibej 2002-12-08 18:32 ` Joseph S. Myers 2002-12-11 2:48 ` Branko Čibej 2002-12-18 21:44 Itching and scratching (Re: source mgt. requirements solicitation) Robert Dewar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).