From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10303 invoked by alias); 23 Jul 2009 18:04:36 -0000 Received: (qmail 10077 invoked by alias); 23 Jul 2009 18:04:35 -0000 X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS X-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS X-Spam-Check-By: sourceware.org X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bastion2.fedora.phx.redhat.com Subject: cluster: STABLE3 - doc: remove old gfs docs To: cluster-cvs-relay@redhat.com X-Project: Cluster Project X-Git-Module: cluster.git X-Git-Refname: refs/heads/STABLE3 X-Git-Reftype: branch X-Git-Oldrev: ab181b7303ccd66ab4bd67a08ed06136b4a20a93 X-Git-Newrev: 6f83ab05ad47c5fe242d7840388db955c20c4ece From: David Teigland Message-Id: <20090723175222.2F98712022F@lists.fedorahosted.org> Date: Thu, 23 Jul 2009 18:04:00 -0000 X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254 Mailing-List: contact cluster-cvs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: cluster-cvs-owner@sourceware.org X-SW-Source: 2009-q3/txt/msg00090.txt.bz2 Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=6f83ab05ad47c5fe242d7840388db955c20c4ece Commit: 6f83ab05ad47c5fe242d7840388db955c20c4ece Parent: ab181b7303ccd66ab4bd67a08ed06136b4a20a93 Author: David Teigland AuthorDate: Thu Jul 23 12:42:43 2009 -0500 Committer: David Teigland CommitterDate: Thu Jul 23 12:44:03 2009 -0500 doc: remove old gfs docs Signed-off-by: David Teigland --- doc/Makefile | 5 +- doc/gfs2.txt | 45 --------------- doc/journaling.txt | 155 -------------------------------------------------- doc/min-gfs.txt | 159 ---------------------------------------------------- 4 files changed, 1 insertions(+), 363 deletions(-) diff --git a/doc/Makefile b/doc/Makefile index 10a076c..2aeb0b9 100644 --- a/doc/Makefile +++ b/doc/Makefile @@ -1,7 +1,4 @@ -DOCS = gfs2.txt \ - journaling.txt \ - min-gfs.txt \ - usage.txt \ +DOCS = usage.txt \ COPYING.applications \ COPYING.libraries \ COPYRIGHT \ diff --git a/doc/gfs2.txt b/doc/gfs2.txt deleted file mode 100644 index 88f0143..0000000 --- a/doc/gfs2.txt +++ /dev/null @@ -1,45 +0,0 @@ -Global File System ------------------- - -http://sources.redhat.com/cluster/ - -GFS is a cluster file system. It allows a cluster of computers to -simultaneously use a block device that is shared between them (with FC, -iSCSI, NBD, etc). GFS reads and writes to the block device like a local -file system, but also uses a lock module to allow the computers coordinate -their I/O so file system consistency is maintained. One of the nifty -features of GFS is perfect consistency -- changes made to the file system -on one machine show up immediately on all other machines in the cluster. - -GFS uses interchangable inter-node locking mechanisms. Different lock -modules can plug into GFS and each file system selects the appropriate -lock module at mount time. Lock modules include: - - lock_nolock -- does no real locking and allows gfs to be used as a - local file system - - lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking - The dlm is found at linux/fs/dlm/ - -In addition to interfacing with an external locking manager, a gfs lock -module is responsible for interacting with external cluster management -systems. Lock_dlm depends on user space cluster management systems found -at the URL above. - -To use gfs as a local file system, no external clustering systems are -needed, simply: - - $ gfs2_mkfs -p lock_nolock -j 1 /dev/block_device - $ mount -t gfs2 /dev/block_device /dir - -GFS2 is not on-disk compatible with previous versions of GFS. - -The following man pages can be found at the URL above: - gfs2_mkfs to make a filesystem - gfs2_fsck to repair a filesystem - gfs2_grow to expand a filesystem online - gfs2_jadd to add journals to a filesystem online - gfs2_tool to manipulate, examine and tune a filesystem - gfs2_quota to examine and change quota values in a filesystem - mount.gfs2 to find mount options - diff --git a/doc/journaling.txt b/doc/journaling.txt deleted file mode 100644 index e89eefa..0000000 --- a/doc/journaling.txt +++ /dev/null @@ -1,155 +0,0 @@ -o Journaling & Replay - -The fundamental problem with a journaled cluster filesystem is -handling journal replay with multiple journals. A single block of -metadata can be modified sequentially by many different nodes in the -cluster. As the block is modified by each node, it gets logged in the -journal for each node. If care is not taken, it's possible to get -into a situation where a journal replay can actually corrupt a -filesystem. The error scenario is: - -1) Node A modifies a metadata block by putting a updated copy into its - incore log. -2) Node B wants to read and modify the block so it requests the lock - and a blocking callback is sent to Node A. -3) Node A flushes its incore log to disk, and then syncs out the - metadata block to its inplace location. -4) Node A then releases the lock. -5) Node B reads in the block and puts a modified copy into its ondisk - log and then the inplace block location. -6) Node A crashes. - -At this point, Node A's journal needs to be replayed. Since there is -a newer version of block inplace, if that block is replayed, the -filesystem will be corrupted. There are a few different ways of -avoiding this problem. - -1) Generation Numbers (GFS1) - - Each metadata block has header in it that contains a 64-bit - generation number. As each block is logged into a journal, the - generation number is incremented. This provides a strict ordering - of the different versions of the block a they are logged in the FS' - different journals. When journal replay happens, each block in the - journal is not replayed if generation number in the journal is less - than the generation number in place. This ensures that a newer - version of a block is never replaced with an older version. So, - this solution basically allows multiple copies of the same block in - different journals, but it allows you to always know which is the - correct one. - - Pros: - - A) This method allows the fastest callbacks. To release a lock, - the incore log for the lock must be flushed and then the inplace - data and metadata must be synced. That's it. The sync - operations involved are: start the log body and wait for it to - become stable on the disk, synchronously write the commit block, - start the inplace metadata and wait for it to become stable on - the disk. - - Cons: - - A) Maintaining the generation numbers is expensive. All newly - allocated metadata block must be read off the disk in order to - figure out what the previous value of the generation number was. - When deallocating metadata, extra work and care must be taken to - make sure dirty data isn't thrown away in such a way that the - generation numbers stop doing their thing. - B) You can't continue to modify the filesystem during journal - replay. Basically, replay of a block is a read-modify-write - operation: the block is read from disk, the generation number is - compared, and (maybe) the new version is written out. Replay - requires that the R-M-W operation is atomic with respect to - other R-M-W operations that might be happening (say by a normal - I/O process). Since journal replay doesn't (and can't) play by - the normal metadata locking rules, you can't count on them to - protect replay. Hence GFS1, quieces all writes on a filesystem - before starting replay. This provides the mutual exclusion - required, but it's slow and unnecessarily interrupts service on - the whole cluster. - -2) Total Metadata Sync (OCFS2) - - This method is really simple in that it uses exactly the same - infrastructure that a local journaled filesystem uses. Every time - a node receives a callback, it stops all metadata modification, - syncs out the whole incore journal, syncs out any dirty data, marks - the journal as being clean (unmounted), and then releases the lock. - Because journal is marked as clean and recovery won't look at any - of the journaled blocks in it, a valid copy of any particular block - only exists in one journal at a time and that journal always the - journal who modified it last. - - Pros: - - A) Very simple to implement. - B) You can reuse journaling code from other places (such as JBD). - C) No quiece necessary for replay. - D) No need for generation numbers sprinkled throughout the metadata. - - Cons: - - A) This method has the slowest possible callbacks. The sync - operations are: stop all metadata operations, start and wait for - the log body, write the log commit block, start and wait for all - the FS' dirty metadata, write an unmount block. Writing the - metadata for the whole filesystem can be particularly expensive - because it can be scattered all over the disk and there can be a - whole journal's worth of it. - -3) Revocation of a lock's buffers (GFS2) - - This method prevents a block from appearing in more than one - journal by canceling out the metadata blocks in the journal that - belong to the lock being released. Journaling works very similarly - to a local filesystem or to #2 above. - - The biggest difference is you have to keep track of buffers in the - active region of the ondisk journal, even after the inplace blocks - have been written back. This is done in GFS2 by adding a second - part to the Active Items List. The first part (in GFS2 called - AIL1) contains a list of all the blocks which have been logged to - the journal, but not written back to their inplace location. Once - an item in AIL1 has been written back to its inplace location, it - is moved to AIL2. Once the tail of the log moves past the block's - transaction in the log, it can be removed from AIL2. - - When a callback occurs, the log is flushed to the disk and the - metadata for the lock is synced to disk. At this point, any - metadata blocks for the lock that are in the current active region - of the log will be in the AIL2 list. We then build a transaction - that contains revoke tags for each buffer in the AIL2 list that - belongs to that lock. - - Pros: - - A) No quiece necessary for Replay - B) No need for generation numbers sprinkled throughout the - metadata. - C) The sync operations are: stop all metadata operations, start and - wait for the log body, write the log commit block, start and - wait for all the FS' dirty metadata, start and wait for the log - body of a transaction that revokes any of the lock's metadata - buffers in the journal's active region, and write the commit - block for that transaction. - - Cons: - - A) Recovery takes two passes, one to find all the revoke tags in - the log and one to replay the metadata blocks using the revoke - tags as a filter. This is necessary for a local filesystem and - the total sync method, too. It's just that there will probably - be more tags. - -Comparing #2 and #3, both do extra I/O during a lock callback to make -sure that any metadata blocks in the log for that lock will be -removed. I believe #2 will be slower because syncing out all the -dirty metadata for entire filesystem requires lots of little, -scattered I/O across the whole disk. The extra I/O done by #3 is a -log write to the disk. So, not only should it be less I/O, but it -should also be better suited to get good performance out of the disk -subsystem. - -KWP 07/06/05 - diff --git a/doc/min-gfs.txt b/doc/min-gfs.txt deleted file mode 100644 index af1399c..0000000 --- a/doc/min-gfs.txt +++ /dev/null @@ -1,159 +0,0 @@ - -Minimum GFS HowTo ------------------ - -The following gfs configuration requires a minimum amount of hardware and -no expensive storage system. It's the cheapest and quickest way to "play" -with gfs. - - - ---------- ---------- - | GNBD | | GNBD | - | client | | client | <-- these nodes use gfs - | node2 | | node3 | - ---------- ---------- - | | - ------------------ IP network - | - ---------- - | GNBD | - | server | <-- this node doesn't use gfs - | node1 | - ---------- - -- There are three machines to use with hostnames: node1, node2, node3 - -- node1 has an extra disk /dev/sda1 to use for gfs - (this could be hda1 or an lvm LV or an md device) - -- node1 will use gnbd to export this disk to node2 and node3 - -- Node1 cannot use gfs, it only acts as a gnbd server. - (Node1 will /not/ actually be part of the cluster since it is only - running the gnbd server.) - -- Only node2 and node3 will be in the cluster and use gfs. - (A two-node cluster is a special case for cman, noted in the config below.) - -- There's not much point to using clvm in this setup so it's left out. - -- Download the "cluster" source tree. - -- Build and install from the cluster source tree. (The kernel components - are not required on node1 which will only need the gnbd_serv program.) - - cd cluster - ./configure --kernel_src=/path/to/kernel - make; make install - -- Create /etc/cluster/cluster.conf on node2 with the following contents: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- load kernel modules on nodes - -node2 and node3> modprobe gnbd -node2 and node3> modprobe gfs -node2 and node3> modprobe lock_dlm - -- run the following commands - -node1> gnbd_serv -n -node1> gnbd_export -c -d /dev/sda1 -e global_disk - -node2 and node3> gnbd_import -n -i node1 -node2 and node3> ccsd -node2 and node3> cman_tool join -node2 and node3> fence_tool join - -node2> gfs_mkfs -p lock_dlm -t gamma:gfs1 -j 2 /dev/gnbd/global_disk - -node2 and node3> mount -t gfs /dev/gnbd/global_disk /mnt - -- the end, you now have a gfs file system mounted on node2 and node3 - - -Appendix A ----------- - -To use manual fencing instead of gnbd fencing, the cluster.conf file -would look like this: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -FAQ ---- - -- Why can't node3 use gfs, too? - -You might be able to make it work, but we recommend that you not try. -This software was not intended or designed to allow that kind of usage. - -- Isn't node3 a single point of failure? how do I avoid that? - -Yes it is. For the time being, there's no way to avoid that, apart from -not using gnbd, of course. Eventually, there will be a way to avoid this -using cluster mirroring. - -- More info from - http://sources.redhat.com/cluster/gnbd/gnbd_usage.txt - http://sources.redhat.com/cluster/doc/usage.txt -