From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 52581 invoked by alias); 22 Aug 2016 16:02:21 -0000 Mailing-List: contact cygwin-developers-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-developers-owner@cygwin.com Mail-Followup-To: cygwin-developers@cygwin.com Received: (qmail 52569 invoked by uid 89); 22 Aug 2016 16:02:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=sk:check_, sk:!check_ X-HELO: smtp.salomon.at Received: from smtp.salomon.at (HELO smtp.salomon.at) (193.186.16.13) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 22 Aug 2016 16:02:10 +0000 Received: from samail03.wamas.com ([172.28.33.235] helo=mailhost.salomon.at) by smtp.salomon.at with esmtps (UNKNOWN:DHE-RSA-AES256-SHA:256) (Exim 4.80.1) (envelope-from ) id 1bbrfp-0000H0-As for cygwin-developers@cygwin.com; Mon, 22 Aug 2016 18:02:07 +0200 Received: from [172.28.41.34] by mailhost.salomon.at with esmtp (Exim 4.77) (envelope-from ) id 1bbrfo-0001JD-PO for cygwin-developers@cygwin.com; Mon, 22 Aug 2016 18:02:04 +0200 Subject: Re: About the dll search algorithm of dlopen (patch-r2) To: cygwin-developers@cygwin.com References: <574E835E.7090109@ssi-schaefer.com> <20160601110947.GE11431@calimero.vinschen.de> <574EF07B.1060806@ssi-schaefer.com> <20160601201748.GI11431@calimero.vinschen.de> <57B735D5.4070401@ssi-schaefer.com> <57B7362D.8060707@ssi-schaefer.com> <20160820193243.vbmpfjc5mjdhrndh@calimero.vinschen.de> <4d4481ce-c163-0ec9-29f5-59bbe13260fa@ssi-schaefer.com> From: Michael Haubenwallner Message-ID: <0963e74e-c963-a7f4-19e7-490b02f9393e@ssi-schaefer.com> Date: Mon, 22 Aug 2016 16:02:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <4d4481ce-c163-0ec9-29f5-59bbe13260fa@ssi-schaefer.com> Content-Type: multipart/mixed; boundary="------------865AFF5ED1BE16D76F7A6642" X-SW-Source: 2016-08/txt/msg00006.txt.bz2 This is a multi-part message in MIME format. --------------865AFF5ED1BE16D76F7A6642 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-length: 6598 On 08/22/2016 02:15 PM, Michael Haubenwallner wrote: > On 08/20/2016 09:32 PM, Corinna Vinschen wrote: >>>>> >>>>> One way around YA code duplication could be some kind of path iterator >>>>> class which could be used from find_exec as well as from >>>>> get_full_path_of_dll. > >>>> 0001.patch is a draft for some new cygwin::pathfinder class, with >>>> 0002.patch adding the executable's directory as searchpath, and >>>> 0003.patch to search the PATH environment as well. >>>> >>>> Thoughts? >> >> Ok, that might be disappointing now because you already put so much work >> into it, but I actually expected some more discussion first. I have two >> problem with this. >> >> I'm not a big fan of templates. > > Never mind, it's been some template exercise to me anyway. > >> What I had in mind was a *simple* class which gets told if it searches >> for libs or executables and then checks the different paths accordingly, >> kind of a copy of find_exec as a class, just additionally handling the >> prefix issue for DLLs. What about this one: 've dropped any templates now, while keeping the basenames logic inside dlopen. /haubi/ > > What I'm more interested in for such a class is the actual API for use > by dlopen() and exec(), and the final list of files searched for - with > these use cases coming to my mind: > > Libraries/dlls with final search path "/lib:/morelibs": > L1) dlopen("libN.so") > L2) dlopen("libN.dll") > L3) dlopen("cygN.dll") > L4) dlopen("N.so") > L5) dlopen("N.dll") > Executables with final search path "/bin:/moreexes" > X1) exec("X") > X2) exec("X.exe") > X3) exec("X.com") > > Instead of API calls similar to: > L1) find(dll, "N", ["/lib", "/morelibs"]) > L2) find(dll, "N", ["/lib", "/morelibs"]) > L3) find(dll, "N", ["/lib", "/morelibs"]) > L4) find(dll, "N", ["/lib", "/morelibs"]) > L5) find(dll, "N", ["/lib", "/morelibs"]) > X1) find(exe, "X", ["/bin", "/moreexes"]) > X2) find(exe, "X", ["/bin", "/moreexes"]) > X3) find(exe, "X", ["/bin", "/moreexes"]) > it feels necessary to support more explicit naming, as in: > L1) find(["libN.so", "cygN.dll", "libN.dll"], ["/lib/../bin","/lib","/morelibs"]) > L2) find([ "cygN.dll", "libN.dll"], ["/lib/../bin","/lib","/morelibs"]) > L3) find([ "cygN.dll", "libN.dll"], ["/lib/../bin","/lib","/morelibs"]) > L4) find(["N.so", "N.dll" ], ["/lib/../bin","/lib","/morelibs"]) > L5) find([ "N.dll" ], ["/lib/../bin","/lib","/morelibs"]) > X1) find(["X", "X.exe","X.com"], ["/bin","/moreexes"]) > X2) find(["X", "X.exe" ], ["/bin","/moreexes"]) > X3) find(["X", "X.com" ], ["/bin","/moreexes"]) > > Where the find method does not need to actually know whether it searches > for a dll or an exe, but dlopen() and exec() instead define the file > names to search for. This is what the patch draft does in dlopen. > >>>>>>>> *) The directory of the current main executable should be searched >>>>>>>> after LD_LIBRARY_PATH and before /usr/bin:/usr/lib. >>>>>>>> And PATH should be searched before /usr/bin:/usr/lib as well. >>>>>>> >>>>>>> Checking the executable path and $PATH are Windows concepts. dlopen >>>>>>> doesn't do that on POSIX systems and we're not doing that either. >>>>>> >>>>>> Agreed, but POSIX also does have the concept of embedded RUNPATH, >>>>>> which is completely missing in Cygwin as far as I can see. >>>>> >>>>> RPATH and RUNPATH are ELF dynamic loader features, not supported by >>>>> PE/COFF. >>>> >>>> In any case, to me it does feel quite important to have the (almost) same >>>> dll search algorithm with dlopen() as with CreateProcess(). >> >> Last but not least I'm not yet convinced if it's *really* a good idea to >> prepend the executable path to the DLL search path unconditionally. Be >> it as it is in terms of DT_RUNPATH, why is the application dir a good >> choice at all, unless we're talking Windows semantics? Which we don't. >> Also, if loading from the applications dir from dlopen is important for >> you, you can emulate it by adding the application dir to LD_LIBRAYR_PATH. > > As long as there is lack of a Cygwin specific dll loader to find the > dlls to load during process startup, we're bound to Windows semantics. > > For dlopen, it is more important to find the same dll file as would be > found when the exe was linked against that dll file, rather than using > the Linux-known algorithm and environment variables - and differ from > process startup: Both really should result in the same algorithm here, > even if that means some difference compared to Linux. > > As far as I understand, lack of DT_RUNPATH (besides /etc/ld.so.conf) > support during process start was the main reason for the dlls to install > into /lib/../bin instead of /lib at all, to be found at process start > because of residing in the application's bin dir: > Why should that be different for dlopen? > >> I checked for the usage of DT_RUNPATH/DT_RPATH on Fedora 23 and only a >> limited number of packages use it (texlive, samba, python, man-db, >> swipl, and a few more). Some of them, like texlive, even use it wrongly, >> with RPATH pointing to a non-existing build dir. There are also a few >> stray "/usr/lib64" settings, but all in all it's not used to point to >> the dir the application is installed to, but rather to some package specific >> subdir, e.g. /usr/lib64/samba, /usr/lib64/swipl-7.2.3/lib/x86_64-linux, >> etc. > > On Linux, the binaries installed in /usr usually rely on the Linux > loader to be configured via /etc/ld.so.conf to find their runtime > libs in /usr/lib. > > Please remember: This whole thing is not a problem with packages > installed to /usr, but with packages installed to /somewhere/else > that provide runtime libraries that are also available in /usr. > > Using LD_LIBRARY_PATH pointing to /somewhere/else/lib may break the > binaries found in /usr/bin - and agreed, searching PATH doesn't make > it better, as PATH is the "LD_LIBRARY_PATH" for Windows. > >> IMHO this means just adding the applications bin dir is most of the time >> an unused or even wrong workaround. > > Although GetModuleHandle may reduce that pressure for dlopen - as long as > the applications bin dir is searched at process start, it really should > be searched by dlopen too, even if for /usr/bin/* this might indeed become > redundant, as we always add /usr/bin in dlopen - which really mimics > the /etc/ld.so.conf content actually, although that one is unavailable > to process startup. > > Thanks! > /haubi/ > --------------865AFF5ED1BE16D76F7A6642 Content-Type: text/x-patch; name="0001-dlopen-search-each-name-within-one-single-search-dir.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-dlopen-search-each-name-within-one-single-search-dir.pa"; filename*1="tch" Content-length: 13853 >From ff75702593aed447f695efa37fc3312495681f8f Mon Sep 17 00:00:00 2001 From: Michael Haubenwallner Date: Tue, 31 May 2016 13:09:11 +0000 Subject: [PATCH 1/3] dlopen: search each name within one single search dir Search x/bin:x/lib if searching in x/lib. Introduces and uses new pathfinder class, which introduces and uses new vstrlist class. --- winsup/cygwin/dlfcn.cc | 71 ++++++------- winsup/cygwin/pathfinder.h | 112 ++++++++++++++++++++ winsup/cygwin/vstrlist.h | 253 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 396 insertions(+), 40 deletions(-) create mode 100644 winsup/cygwin/pathfinder.h create mode 100644 winsup/cygwin/vstrlist.h diff --git a/winsup/cygwin/dlfcn.cc b/winsup/cygwin/dlfcn.cc index 255a6d5..da20a58 100644 --- a/winsup/cygwin/dlfcn.cc +++ b/winsup/cygwin/dlfcn.cc @@ -20,6 +20,7 @@ details. */ #include "cygtls.h" #include "tls_pbuf.h" #include "ntdll.h" +#include "pathfinder.h" static void set_dl_error (const char *str) @@ -28,29 +29,6 @@ set_dl_error (const char *str) _my_tls.locals.dl_error = 1; } -/* Look for an executable file given the name and the environment - variable to use for searching (eg., PATH); returns the full - pathname (static buffer) if found or NULL if not. */ -inline const char * -check_path_access (const char *mywinenv, const char *name, path_conv& buf) -{ - return find_exec (name, buf, mywinenv, FE_NNF | FE_DLL); -} - -/* Search LD_LIBRARY_PATH for dll, if it exists. Search /usr/bin and /usr/lib - by default. Return valid full path in path_conv real_filename. */ -static inline bool -gfpod_helper (const char *name, path_conv &real_filename) -{ - if (strchr (name, '/')) - real_filename.check (name, PC_SYM_FOLLOW | PC_NULLEMPTY); - else if (!check_path_access ("LD_LIBRARY_PATH", name, real_filename)) - check_path_access ("/usr/bin:/usr/lib", name, real_filename); - if (!real_filename.exists ()) - real_filename.error = ENOENT; - return !real_filename.error; -} - static bool get_full_path_of_dll (const char* str, path_conv &real_filename) { @@ -63,38 +41,50 @@ get_full_path_of_dll (const char* str, path_conv &real_filename) return false; /* Yes. Let caller deal with it. */ } - tmp_pathbuf tp; - char *name = tp.c_get (); + pathfinder::basenamelist basenames; - strcpy (name, str); /* Put it somewhere where we can manipulate it. */ + const char *basename = strrchr (str, '/'); + basename = basename ? basename + 1 : str; - char *basename = strrchr (name, '/'); - basename = basename ? basename + 1 : name; - char *suffix = strrchr (name, '.'); - if (suffix && suffix < basename) - suffix = NULL; + int baselen = str + len - basename; + const char *suffix = strrchr (basename, '.'); + char const * ext = ""; + int extlen = 0; /* Is suffix ".so"? */ if (suffix && !strcmp (suffix, ".so")) { /* Does the file exist? */ - if (gfpod_helper (name, real_filename)) - return true; + basenames.appendv (basename, baselen, NULL); /* No, replace ".so" with ".dll". */ - strcpy (suffix, ".dll"); + baselen -= 3; + ext = ".dll"; + extlen = 4; } /* Does the filename start with "lib"? */ if (!strncmp (basename, "lib", 3)) { /* Yes, replace "lib" with "cyg". */ - strncpy (basename, "cyg", 3); - /* Does the file exist? */ - if (gfpod_helper (name, real_filename)) - return true; + basenames.appendv ("cyg", 3, basename+3, baselen-3, ext, extlen, NULL); /* No, revert back to "lib". */ - strncpy (basename, "lib", 3); } - if (gfpod_helper (name, real_filename)) + basenames.appendv (basename, baselen, ext, extlen, NULL); + + pathfinder finder (basenames); + + if (basename > str) + finder.add_searchdir (str, basename - 1 - str); + else + { + /* NOTE: The Windows loader (for linked dlls) does + not use the LD_LIBRARY_PATH environment variable. */ + finder.add_envsearchpath ("LD_LIBRARY_PATH"); + + /* Finally we better have some fallback. */ + finder.add_searchdir ("/usr/lib", -1); + } + + if (finder.check_path_access (real_filename)) return true; /* If nothing worked, create a relative path from the original incoming @@ -113,6 +103,7 @@ dlopen (const char *name, int flags) { void *ret = NULL; + debug_printf ("flags %d for %s", flags, name); if (name == NULL) { ret = (void *) GetModuleHandle (NULL); /* handle for the current module */ diff --git a/winsup/cygwin/pathfinder.h b/winsup/cygwin/pathfinder.h new file mode 100644 index 0000000..3453692 --- /dev/null +++ b/winsup/cygwin/pathfinder.h @@ -0,0 +1,112 @@ +/* pathfinder.h: find one of multiple file names in path + + Copyright 2016 Red Hat, Inc. + +This file is part of Cygwin. + +This software is a copyrighted work licensed under the terms of the +Cygwin license. Please consult the file "CYGWIN_LICENSE" for +details. */ + +#include "vstrlist.h" + +#ifdef __cplusplus + +class pathfinder +{ +public: + typedef vstrlist basenamelist; + +private: + pathfinder (); + pathfinder (pathfinder const &); + pathfinder & operator = (pathfinder const &); + + basenamelist basenames_; + size_t basenames_maxlen_; + + typedef vstrlist searchbufferlist; + searchbufferlist searchbuffers_; + +public: + ~pathfinder () {} + + /* We need the basenames to search for first, to allow for optimized + memory allocation of each searchpath + basename combination. + The incoming list of basenames is emptied (ownership take over). */ + pathfinder (basenamelist & basenames) + : basenames_ () + , basenames_maxlen_ () + , searchbuffers_() + { + basenames.swap(basenames_); + + for (basenamelist::iterator basename = basenames_.begin (); + basename != basenames_.end (); + ++ basename) + { + if (basenames_maxlen_ < basename->stringlength ()) + basenames_maxlen_ = basename->stringlength (); + } + } + + void add_searchdir (const char *dir, int dirlen) + { + if (dirlen < 0) + dirlen = strlen (dir); + + if (!dirlen) + return; + + /* Search "x/bin:x/lib" for "x/lib" */ + if (dirlen >=4 && !strncmp (dir + dirlen - 4, "/lib", 4)) + /* prealloc buffer in searchdir for any basename we will search for */ + searchbuffers_.appendv (dir, dirlen - 4, "/bin", 4, "/", 1 + basenames_maxlen_, NULL); + + /* prealloc buffer in searchdir for any basename we will search for */ + searchbuffers_.appendv (dir, dirlen, "/", 1 + basenames_maxlen_, NULL); + } + + void add_searchpath (const char *path) + { + while (path && *path) + { + const char *next = strchr (path, ':'); + add_searchdir (path, next ? next - path : -1); + path = next ? next + 1 : next; + } + } + + void add_envsearchpath (const char *envpath) + { + add_searchpath (getenv (envpath)); + } + + /* Within each searchdir registered, try each registered basename to + find as executable. Returns found dir/basename in real_filename. + Returns true when found. */ + bool check_path_access (path_conv& real_filename) + { + for (searchbufferlist::iterator dir = searchbuffers_.begin (); + dir != searchbuffers_.end (); + ++dir) + for (basenamelist::iterator name = basenames_.begin (); + name != basenames_.end (); + ++name) + { + /* complete the filename path to search for */ + memcpy (dir->buffer () + dir->stringlength (), name->string (), name->stringlength () + 1); + debug_printf ("trying %s", dir->buffer ()); + real_filename.check (dir->string (), PC_SYM_FOLLOW | PC_POSIX); + if (real_filename.exists () && !real_filename.isdir ()) + { + debug_printf (" found %s", dir->buffer ()); + return true; + } + } + real_filename.error = ENOENT; + return !real_filename.error; + } +}; + +#endif /* __cplusplus */ diff --git a/winsup/cygwin/vstrlist.h b/winsup/cygwin/vstrlist.h new file mode 100644 index 0000000..ecbdc64 --- /dev/null +++ b/winsup/cygwin/vstrlist.h @@ -0,0 +1,253 @@ +/* vstrlist.h: vstrlist + + Copyright 2016 Red Hat, Inc. + +This file is part of Cygwin. + +This software is a copyrighted work licensed under the terms of the +Cygwin license. Please consult the file "CYGWIN_LICENSE" for +details. */ + +#ifdef __cplusplus + +class vstrlist +{ +public: + class member + { + friend class vstrlist; + friend class iterator; + + member * prev_; + member * next_; + size_t bufferlength_; + size_t stringlength_; + char buffer_[1]; /* we always have space for a trailing zero */ + + /* no copy */ + member (member const &); + member & operator = (member const &); + + /* anchor */ + void * operator new (size_t class_size) + { + return cmalloc_abort (HEAP_STR, class_size); + } + + /* anchor */ + member () + : prev_ (this) + , next_ (this) + , bufferlength_ (0) + , stringlength_ (0) + , buffer_ () + {} + + /* entry: determine memory size from args */ + void * operator new (size_t class_size, char const * part0, va_list parts) + { + char const * part = part0; + va_list partsdup; + va_copy (partsdup, parts); + size_t bufferlength = 0; + while (part) + { + int partlen = va_arg (partsdup, int); + if (partlen < 0) + partlen = strlen (part); + bufferlength += partlen; + part = va_arg (partsdup, char const *); + } + va_end (partsdup); + + return cmalloc_abort (HEAP_STR, class_size + bufferlength); + } + + /* entry: instantly insert into list */ + member (member * before, char const * part0, va_list parts) + : prev_ (before->prev_) + , next_ (before) + , bufferlength_ (0) + , stringlength_ () + , buffer_ () + { + prev_->next_ = this; + next_->prev_ = this; + + char * dest = buffer_; + char const * part = part0; + va_list partsdup; + va_copy (partsdup, parts); + while (part) + { + int partlen = va_arg (partsdup, int); + if (partlen < 0) + { + char * old = dest; + dest = stpcpy (old, part); + partlen = dest - old; + } + else + dest = stpncpy (dest, part, partlen); + bufferlength_ += partlen; + part = va_arg (partsdup, const char *); + } + va_end (partsdup); + *dest = (char)0; + stringlength_ = dest - buffer_; + } + + void operator delete (void * p) + { + cfree (p); + } + + /* remove entry from list */ + ~member () + { + member * next = next_; + member * prev = prev_; + next->prev_ = prev; + prev->next_ = next; + prev_ = NULL; + next_ = NULL; + } + + public: + member const * next () const { return next_; } + member * next () { return next_; } + member const * prev () const { return next_; } + member * prev () { return next_; } + + /* always is a readonly string */ + char const * string () const { return buffer_; } + size_t stringlength () const { return stringlength_; } + + /* give write access to the buffer when writeable */ + char * buffer () { return buffer_; } + size_t bufferlength () { return bufferlength_; } + }; + + class iterator + { + friend class vstrlist; + + member * current_; + + iterator (); + + iterator (member * current) + : current_ (current) + {} + + public: + iterator (iterator const & rhs) + : current_ (rhs.current_) + {} + + iterator & operator = (iterator const & rhs) + { + current_ = rhs.current_; + return *this; + } + + iterator & operator ++ () + { + current_ = current_->next (); + return *this; + } + + iterator operator ++ (int) + { + iterator ret (*this); + current_ = current_->next (); + return ret; + } + + iterator & operator -- () + { + current_ = current_->prev (); + return *this; + } + + iterator operator -- (int) + { + iterator ret (*this); + current_ = current_->prev (); + return ret; + } + + bool operator == (iterator const & rhs) const + { + return current_ == rhs.current_; + } + + bool operator != (iterator const & rhs) const + { + return current_ != rhs.current_; + } + + member const & operator * () const { return *current_; } + member & operator * () { return *current_; } + member const * operator -> () const { return current_; } + member * operator -> () { return current_; } + + void remove () + { + member * old = current_; + ++ *this; + delete old; + } + }; + +private: + member * anchor_; + + /* no copy */ + vstrlist (vstrlist const &); + vstrlist & operator = (vstrlist const &); + +public: + iterator begin () { return iterator (anchor_->next ()); } + iterator end () { return iterator (anchor_ ); } + iterator rbegin () { return iterator (anchor_->prev ()); } + iterator rend () { return iterator (anchor_ ); } + + vstrlist () + : anchor_ (new member ()) + {} + + ~vstrlist () + { + for (iterator it = begin (); it != end (); it.remove ()); + delete anchor_; + } + + void swap (vstrlist & that) + { + member * old = anchor_; + anchor_ = that.anchor_; + that.anchor_ = old; + } + + member * appendv (char const * part0, va_list parts) + { + return new (part0, parts) member (anchor_, part0, parts); + } + + member * appendv (char const * part0, ...) + { + va_list parts; + va_start (parts, part0); + member * ret = appendv (part0, parts); + va_end (parts); + return ret; + } + + member * append (char const * part) + { + return appendv (part, strlen(part)); + } +}; + +#endif /* __cplusplus */ -- 2.8.3 --------------865AFF5ED1BE16D76F7A6642--