From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.polymtl.ca (smtp.polymtl.ca [132.207.4.11]) by sourceware.org (Postfix) with ESMTPS id 698083858D1E for ; Mon, 18 Apr 2022 18:44:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 698083858D1E Received: from simark.ca (simark.ca [158.69.221.121]) (authenticated bits=0) by smtp.polymtl.ca (8.14.7/8.14.7) with ESMTP id 23IIh9eN027561 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Apr 2022 14:43:14 -0400 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp.polymtl.ca 23IIh9eN027561 Received: from [10.0.0.11] (192-222-157-6.qc.cable.ebox.net [192.222.157.6]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id DD2CC1EDF0; Mon, 18 Apr 2022 14:43:08 -0400 (EDT) Message-ID: Date: Mon, 18 Apr 2022 14:43:08 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH 5/5] gdbsupport: add path_join function Content-Language: en-US To: Lancelot SIX Cc: gdb-patches@sourceware.org, Simon Marchi References: <20220414200137.3479373-1-simon.marchi@polymtl.ca> <20220414200137.3479373-5-simon.marchi@polymtl.ca> <20220415143827.t2nlcfhmh2pondev@ubuntu.lan> From: Simon Marchi In-Reply-To: <20220415143827.t2nlcfhmh2pondev@ubuntu.lan> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Poly-FromMTA: (simark.ca [158.69.221.121]) at Mon, 18 Apr 2022 18:43:09 +0000 X-Spam-Status: No, score=-3034.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Apr 2022 18:44:22 -0000 On 2022-04-15 10:38, Lancelot SIX wrote: > Hi, > >> +/* Join COMPONENT and all subsequent null-terminated string arguments into a >> + single path. >> + >> + The last argument must be nullptr to denote the end of the argument list >> + >> + Empty arguments or arguments consisting only of directory separators are >> + ignored. */ >> + >> +extern std::string path_join (const char *component, ...) >> + ATTRIBUTE_SENTINEL; > > The last part of the interface seems odd to me. Mostly because it is > quite different to the os.path.join API from python. > > Most notable, in python `os.path.join (a, b)` will yield b if b is an > absolute path: > > >>> os.path.join("/home/foo", "/") > '/' > >>> os.path.join("/foo", "/bar") > '/bar' > > One rational for this behavior would be to try to mimic using cd on each > argument in order: > > def strange_join(*args): > cwd = os.getcwd() > try: > for comp in args: > os.chdir(comp) > return os.getcwd() > finally: > os.chdir(cwd) > > Of course this implementation would be wrong for multiple reasons: > - it must work for dirs which do not exist > - the ".." remain ".." > - many other... > > This behaviour regarding abs path is also consistent what I have with > std::filesystem in c++17 (which we be nice if we could use, but > unfortunately for the moment we have to stick to c++11) > > auto p = std::filesystem::path {"/foo"} / "/bar"; > assert (p == "/bar"); > > If you compare to one of the testcase you add, the behavior is > different: > > s = ::path_join ("/foo", "/bar", nullptr); > SELF_CHECK (s == "/foo/bar"); > > Is this different semantics made on purpose? I indeed noticed how os.path.join handled ("/foo", "/bar"). I went with the behavior in my patch mostly because there are cases in GDB where we really need to append absolute paths to another path. For example, looking up files in sysroots. If the sysroot is set to "/the/sysroot/" (with or without the trailing slash) and we want to load the target library /usr/lib/libfoo.so, it would be nice if path_join (sysroot_path, lib_path) gave /the/sysroot/usr/lib/libfoo.so without a double-slash. Same when we are looking up files in the debug-file-directory (/usr/lib/debug). When changing those implementations that currently use concat(...) or string_printf("%s/%s"), we would need to be careful: if the right hand side is an absolute path, then we would have to skip the leading directory separator before passing it to path_join, to get the desired result. And then we're back at having some special path handling details everywhere. So, IMO, the behavior I implemented makes more sense for us. I just don't know what it would mean on Windows. What would this give? path_join ("C:/hello", "D:/hi", nullptr); > Note that there is another part of the behavior which is different in > your proposal v.s. python and std::filesystem. Appending an empty > component at the end ensures that the path terminates with a path > separator: > > >>> os.path.join("foo", "") > "foo/" > > If I use your patch, I see: > > s = ::path_join ("foo", "", nullptr); > SELF_CHECK (s == "foo"); I also noticed that. In my opinion, adding that last slash is not really useful. For consistency, we should always present directories the same way, either with or without a trailing slash. Assuming we have this code: s = path_join ("/base/dir", either_a_subdirectory_or_empty, nullptr); Then if either_a_subdirectory_or_empty is empty, I'd like the result to be /base/dir, exactly as if we had done: if (!either_a_subdirectory_or_empty.empty ()) s = path_join ("/base/dir", either_a_subdirectory_or_empty, nullptr); else s = "/base/dir"; So again, the reason for my choice is just that it likely leads to simpler code at call sites, in our probable use cases. > Using similar semantics to std::filesystem might allow us to replace our > implementation with the c++ standard library when we switch to c++17. I understand the idea, and considered it. The trailing slash one isn't very important, but I think the std::filesystem behavior with the absolute path would be annoying. > On a different note, I am not sure I am a huge fan of the last nullptr > argument which is required. > > You can achieve a similar interface except for this nullptr using light > variadic templates (not too much, should be easy enough to maintain). > Is this something you considered? > > One implementation could be: > > /* The real worker can be compiled in a .cc file if you wish. */ > extern std::string path_join_1 (std::string a, std::string b); > > template std::string > path_join (std::string a, std::string b, Args... comps) > { > return path_join (path_join (a, b), comps...); > } > > template <> > std::string > path_join (std::string a, std::string b) > { > return path_join_1 (a, b); > } > > I do not pretend this is optimal (it is not). In this POC I pass all > arguments as std::string and by value, this is probably not what you > want to do in a production grade implementation, but this just gives the > overall idea. > > What do you think? If you can make it work using gdb::string_view or const char *, I think it would be ok. I'll give it a try. But otherwise, I am personally fine with the sentinel nullptr, given that the compiler gives you a warning if you forget it. Simon