From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18228 invoked by alias); 10 Dec 2002 08:28:15 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 18205 invoked from network); 10 Dec 2002 08:28:13 -0000 Received: from unknown (HELO emf.net) (205.149.0.20) by sources.redhat.com with SMTP; 10 Dec 2002 08:28:13 -0000 Received: (from lord@localhost) by emf.net (K/K) id AAA17798; Tue, 10 Dec 2002 00:28:13 -0800 (PST) Date: Tue, 10 Dec 2002 00:31:00 -0000 From: Tom Lord Message-Id: <200212100828.AAA17798@emf.net> To: gcc@gcc.gnu.org Subject: new batch of replies (D) X-SW-Source: 2002-12/txt/msg00515.txt.bz2 Replies in the message: Joseph S. Myers: There are Constraints on the Protocols We can use Joseph S. Myers: The Ability to Revert Patches is Important Joseph S. Myers: Sending Patches by Mail is Important Walter Landry: Some Slight Misstatements About Distributed Repositories Zack Weinberg: Additional Comments About Mainline Server Requirements ================================================================ Joseph S. Myers: There are Constraints on the Protocols We can use One problem there was with SVN - it may have been fixed by now, and a fix would be necessary for it to be usable for GCC - was its use of HTTP and HTTPS (for write access); these tend to be heavily controlled by firewalls and the ability to tunnel over SSH (with just that one port needing to be open) would be necessary. "Transparent" proxies may pass plain HTTP OK, but not the WebDAV/DeltaV extensions SVN needs. I think Walter has mostly answered this. I would like to add that communication with a server consists of commands that are analogous to a few FTP operations -- it's just a very minimal set of filesystem primitives. Creating new or supporting additional protocols can be done quickly (as Walter has demonstrated). ================================================================ Joseph S. Myers: The Ability to Revert Patches is Important A change set is applied. It turns out to have problems, so needs to be reverted - common enough. Of course the version history and ChangeLog shows both the original application and reversion. The reversion might in fact be of the original change set and a series of subsequent failed attempts at patching it up. But intermediate unrelated changes to the tree should not be backed out in the process. This can be accomplished with the `delta-patch' command, which takes the changes between any two revisions, and applies them to your tree. Once you have identified the changeset you want to back out, if it is in revision N, then you want to apply the changes from N to N-1 to your tree. This is not currently recorded in the log as an official "reversion" -- just as an ordinary change that happens to be a reversion. That is perhaps something that should be added prior to 1.0. ================================================================ Joseph S. Myers: Sending Patches by Mail is Important Patches by email (with distributed patch review by multiple people reading gcc-patches, including those who can't actually approve the patch) is the normal way GCC development works. Presume that most contributors will not want to deal with security issues of making any local repository accessible to other machines, even if it's on a permanently connected machine and local firewalls or policy don't prevent this. A patch for use with a better version control system would need to include some encoding for that system of renames / deletes / ... - but that needs to be just as human-readable as context diffs / unidiffs are. It is a well known bug in the design of the reference implementation that changesets are not convenient for email use. Fixing this is, in fact, one of the most important tasks to complete before 1.0. I am in favor of fixing this slowly and carefully: taking a few weeks and a few people to design the new changeset format, because the semantics of of the format have many long-term, wide-ranging implications. You can see the notes I made when I left off on this project on the fifthvision.net web site. Recently, a volunteer has prototyped a very email-friendly format and has an implementation of `mkpatch' and `dopatch' up and crawling. ================================================================ Walter Landry: Some Slight Miststatements About Distributed Repositories So, for example, I can develop arch independently of whether Tom thinks that I am worthy enough to do so. :-) We disagree a bit, but that's all. So you don't, in general, have a repository that is writeable by more than one person. This is the practice that has emerged among early adopters, but multiple writers are fully supported, and I would expect the GCC mainline repository to (continue to) have multiple writers. Even if the procedures are slightly changed so that all the hard merging and changeset vetting is done in other repositories before being moved to the mainline, I'd predict that release manager duties (moving in those changes) will be shared. ================================================================ Zack Weinberg: Server Security is Mission Critical As Joseph pointed out, GCC development is and will be centered around a 'master' server. If we wind up using a distributed system, individual developers will take advantage of it to do offline work, but the master repository will still act as a communication nexus between us all, and official releases will be cut from there. I doubt anyone will do releases except from there.[1] The security of this master server is mission-critical. The present situation, with CVS pserver enabled for read-only anonymous access, and write privilege available via CVS-over-ssh, has two potentially exploitable vulnerabilities that should be easy to address in a new system. _Imprimis_, the CVS pserver requires write privileges on the CVS repository directories, even if it is providing only read access. Therefore, if the 'anoncvs' user is somehow compromised -- for instance, by a buffer overflow bug in the pserver itself -- the attacker could potentially modify any of the ,v files stored in the repository. [....] You do not need a write-privileged server to provide read-only access to an `arch' repository. _Secundus_, CVS-over-ssh operates by invoking 'cvs server' on the repository host -- running under the user ID of the invoker, who must have an account on the repository host. [...] It can't perform any operations that the invoking user can't. Which means that the invoking user must also have OS-level write privileges on the repository. Now, such users are _supposed_ to be able to check in changes to the repository, but they _aren't_ supposed to be able to modify the ,v files with a text editor. The distinction is crucial. If the account of a user with write privileges is compromised, and used to check in a malicious change, the version history is intact, the change will be easily detected, and we can simply back out the malice. If the account of a user with write privileges is compromised and used to hand-edit a malicious change into a ,v file, it's quite that this will go undetected until after half the binaries on the planet are untrustworthy. It is this latter scenario I would like to be impossible. I think your security analysis is incorrect, for all revision control systems. The account used to check-in changes must be able to write the repository files. Thus, if an attacker obtains the ability to run arbitrary code as that user, the files may be compromised. "Hand editing" is immaterial, here. Still, `arch' does have a _theoretical_ advantage in this area. Let us assume that we have a security-enhanced OS in which we can place very fine-grained restrictions on the set of system calls available to particular users. For example, we might forbid users from opening files for writing except with the O_CREAT | O_EXCL flags, and we might forbid them from removing files other than files whose names match a particular regexps. On such a system, and with those particular restrictions, an `arch' based server can indeed be hardened against attacks that result in the execution of arbitrary malicious code. I don't know of systems that have such fine-grained controls, but they are certainly doable -- and I wouldn't be too surprised to hear of someone providing them. There are several possible ways to do that. One way is the way Perforce does it: _all_ access, even local access, goes through p4d, and p4d can run under its own user ID and be the only user ID with write access to the repository. Yes, but any compromise that can run arbitrary code with the p4d id can corrupt the repository. Another way, and perhaps a cleverer one, is OpenCM's way, where the (SHA of) the file content is the file's identity, so a malicious change will not even be picked up. (Please correct me if I misunderstand.) Of course, that provides no insulation against an attacker using a compromised account to execute "rm -fr /path/to/repository", but *that* problem is best solved with backups, because a disk failure could have the same effect and there's nothing software can do about that. It also provides no help against an attacker whose arbitrary code is careful to update the hash records. Again, I think the fine-grained access controls I described are your only help here, and if you find some, you'll be pleased by the narrow and benign access needs of an arch write-transaction server. ================================================================