From mboxrd@z Thu Jan 1 00:00:00 1970 From: Duane Ellis To: insight@sourceware.cygnus.com Subject: /tmp_mnt, the automounter & realpath Date: Fri, 29 Oct 1999 08:05:00 -0000 Message-id: <199910291459.KAA20866@ss5mth35.franklin.com.franklin.com> References: <959C31D5DB12D311903C0090277B0C7E0D0B6F@mercury.broadjump.com> X-SW-Source: 1999-q4/msg00034.html We have 50+ unix machines, all using automount to mount file systems. I have a problem installing insite -- Duane. in the file: itclConfig.sh, I'm finding statements like this: ------------------------------------------------------------ ITCL_BUILD_LIB_SPEC='-L/files1/proj/gdb_sneak/insite/insight-19990928/itcl/itcl/ ITCL_LIB_SPEC ='-L/proj/gdb_sneak/insite/install/lib -litcl3.0' ITCL_SRC_DIR ='/files1/proj/gdb_sneak/insite/insight-19990928/itcl/itcl' ITCL_SH ='/files1/proj/gdb_sneak/insite/insight-19990928/itcl/itcl/unix/itclsh' ITCL_LIB_FULL_PATH ='/files1/proj/gdb_sneak/insite/insight-19990928/itcl/itcl/uni ------------------------------------------------------------ There are other examples. I have a workable solution - that helps solve problems like this in GDB and GCC - at the end of this email When I "./configure" insite, I specified "/proj/gdb_sneak/insite/install" for both the --prefix and --exec-prefix. While this example is from RedHat 5.2, kernal 2.2.9 - it also is a persistant problem on SunOS 4.3. I have been told the problem is centered around the automounter. Now, I'm no expert on the automounter - all I know is what it is, not how it works or how to configure it. To me, the automounter is just indistinguishable from black majic. I've also seen this same problem burried inside of debug object records inside of GDB (outputed from gcc)- {it's not just insite with this problem} and various other tools too. --[So, so whats the problem?]------------------------------ Yes, it does still run. But only on the machine that I configure it on... there are other combiniations of these cases too, see below. Not all of you may have a 50+ machine network where all the machines share file systems via 'nfs' mount points, with a few large file servers. Our network here at Franklin is setup like this: On Each machine, /files[0-99]/ is a *REAL* hard mount point that various hard disks on your machines are mounted. By having these things under /files[0-99], it lets the sys-admin backup scripts know just what needs to be backed up. We have another names like that. --[ How is our automounter configured? ]-------------------------- We then have a number of 'automount' points that the automounter makes work like majic. For instance: 3 Developers Tom, Dick and Harry each have their own machines A, B and C. Their home directories are known as: /home/tom is physically on machine A, /files2/home/tom /home/dick is physically on machine B, /files4/home/dick /home/harry is physically on machine C, /files1/home/harry The automounter or something, when I "cd /home/tom" - knows how to automatically mount the remote nfs file system. After some time of no activity (I think our timeout is 5 or 10 minutes), the automounter quitely dismounts the file system. In the case of /home/tom - Depending upon what machine you are logged into, /home/tom may really be known as any one of these: /home/tom /files2/home/tom /tmp_mnt/home/tom Here at franklin, we have a number of top level automount points, /proj for projects /data for data /nbu for NOT BACKED UP reconstructable files ------------------------------------------------------------ Ok, so here's the problem: A) TOM logged into his machine TOM builds the tool on a disk local to TOM's machine. Tool works IF you are running the tool on toms machine. Ie: You are Dick or Harry, on your machine, tool does not work. B) HARRY logged into TOM's machine HARRY builds the tool on disk local to TOM's machine Tool works for a while for dick & harry - but sometimes misteriously dies with 'stale nfs' handle error messages. Tool does not work for TOM. C) DICK builds a tool that has parts living on 3 different machines, A, B and C. ------------------------------------------------------------ What's happening, What Goes Wrong. Case (A) For Dick & Harry, when they execute the programs, some of the files refer to /files2/home/tom, or something like that and *NOT* for instance, /home/tom. If you are on tom's machine - that's great. /files2/home/tom exists. but if you are on dick or harrys' machine, /files2 may not exist or contain something totally different. Case (B - part 1) If the files reference /tmp_mnt/home/tom, and for some reason you have not reciently accessed /home/tom in such a way to cause the automounter to mount /home/tom, the files and/or directories don't exist [much like the /files2 example above] Case (B - part 2) If for instance, you are deeply debugging something, examining things and it's now lunch - or nature calls. You walk away and when you come back - the automounter has dismounted everything. Before you left - /tmp_mnt/home/tom - worked, now... it's timed out - The code tries to read from the file again - BANG: STALE NFS handle. You are hozed. Case (C) - It gets worse. Combine the above problems. -[Solution]-------------------------------------------------- What do we do here at Franklin to solve this problem. This is also a suggestion for a new utility in some form to go into the GNU tool chain. I'd like to donate what we have - but cannot, So I'll describe it instead. The solution is centered around the Unix system call "realpath()" In our shell scripts, makefiles, etc. Even in our internal versions of GDB, GCC, and other tools when and where a directory or path name is outputed - we have it call some function we have called "FIX_realpath()" to fix up the name. We also have a simple command line interface to the 'FIX_realpath()' function call, it is in a program we call "realpath" For instance in a Makefile or configure script you might see: echo EXEC_PREFIX=`realpath ${exec_prefix}` The realpath program calls the function realpath() and strips all funky names from the begining of the result. In practice, we have found that it is actually best to have a two column table of strings, that works like this: If you find this: Replace it with: ---------------- ---------------- /tmp_mnt "" (blank, strip it) /nbu[0..99] /nbu /files[0..99] "" (blank, strip it) /data[0.99] /data Column 1, could be a regular expression, but... the complexity level of that does not always warrent that level of stuff. In some cases, I've seen this implimented as a simple sed statement or awk statement that 'fixes' the problem. Practially speaking, it sucks to have to hunt down all of these little scripts burried in makefiles, configure scripts, etc, to add your own site specific names, back quote them enough to make it work. In the example above, you could think of 'realpath' as just plain echo. However, the nice thing is being able to - within a script do this: FOO=`realpath $(FOO)` And not know what "$(FOO)" is, a directory or a file...or whatever. One CON we have come across is this: Sometimes you make sym-link tree to make a directory structure work the realpath program - well - gets in the way. In most (99.99%) of the cases, we actually execute realpath via a MAKEFILE macro, or shell variable called $(REALPATH). In the case where the CON is a problem, you could do this: REALPATH=echo But that does not always work, for instance, this fails: $(REALPATH) . Of course the 'realpath' program would have to be site specific but a very simple program. (BTW - I like the fact that it is now included in the CYGWIN-B20 package, I don't think it was available in CYGWIN-B18)