Re: [PATCH 1/2] gdb: avoid '//' in filenames when searching for debuginfo

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Pedro Alves <pedro@palves.net>
To: Andrew Burgess <aburgess@redhat.com>, Eli Zaretskii <eliz@gnu.org>
Cc: tom@tromey.com, gdb-patches@sourceware.org
Subject: Re: [PATCH 1/2] gdb: avoid '//' in filenames when searching for debuginfo
Date: Fri, 14 Jun 2024 18:48:40 +0100	[thread overview]
Message-ID: <d28a7ed9-444f-4685-9567-15810395e598@palves.net> (raw)
In-Reply-To: <87ed8z8xfc.fsf@redhat.com>

On 2024-06-14 14:29, Andrew Burgess wrote:
> Eli Zaretskii <eliz@gnu.org> writes:
> 
>>> From: Andrew Burgess <aburgess@redhat.com>
>>> Cc: gdb-patches@sourceware.org
>>> Date: Fri, 14 Jun 2024 09:58:09 +0100
>>>
>>> Tom Tromey <tom@tromey.com> writes:
>>>
>>>>>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:
>>>>
>>>> Andrew> The new combine_paths function is similar to the existing path_join
>>>> Andrew> function (see gdbsupport/pathstuff.h), but unlike path_join the new
>>>> Andrew> function doesn't require that the first path be absolute.
>>>>
>>>> From what I can see, path_join doesn't have this requirement.
>>>> The docs say "The first element can be absolute or relative" and the
>>>> code seems to agree.
>>>
>>> You're absolutely right.  And I see what's happened.  Somewhere along
>>> the line the logic has gotten reversed in my head.  The problem isn't
>>> that the first _must_ be absolute with path_join, it's that everything
>>> else _must_not_ be absolute, see these lines in path_join:
>>>
>>>       if (i > 0)
>>> 	gdb_assert (strlen (path) == 0 || !IS_ABSOLUTE_PATH (path));
>>>
>>> And for what we're doing here we, for example, join the path to the file
>>> (which is absolute) onto the debug directory, which might, or might not,
>>> be absolute, e.g. these lines in find_separate_debug_file:
>>>
>>>   /* If the file is in the sysroot, try using its base path in
>>>      the global debugfile directory.  */
>>>   debugfile = path_join (target_prefix_str, debugdir.get (),
>>> 			 base_path, debuglink);
>>>
>>> I've initially understood the problem, written the code, then come back
>>> and added comments and a commit message, and gotten mixed up somewhere.
>>> Sorry about that.
>>>
>>> Anyway, here's an updated patch.  I've change one comment in symfile.c,
>>> and I've also updated the commit message, other than that the patch is
>>> unchanged.
>>>
>>> Oh, and just because I was doubting myself, I did try using path_join,
>>> and as predicted the test (sysroot-debug-lookup.exp) causes GDB to
>>> trigger an assertion as we are appending an absolute path.
>>
>> I suggest to test this with the first argument empty and the second
>> starting with two slashes: those should be left intact, at least on
>> MS-Windows (due to UNCs).
> 
> Sorry, I'm feel foggy today... what is 'this' in this paragraph.
> 
> So, I went and read up what UNCs means, I believe it's filenames like:
> 
>   \\SERVER\foo\bar\woof
> 
> And I guess, if they can start with forward slashes then maybe this is
> valid too?
> 
>   //SERVER/foo/bar/woof
> 
> I'm pretty sure that's not currently going to work, at least in some
> places we'll try to do:
> 
>  /debug-directory//SERVER/foo/bar/woof
> 
> which probably isn't going to help.  Right now the new combine_path is
> going to mush this into '/debug-directory/SERVER/foo/bar/woof' dropping
> one of the '/' characters.  I can change this if the double slash is in
> fact useful in the middle of the filename, but I couldn't find anything
> immediately that suggests it would be.
> 
>> Also, how do we make sure none of the arguments except the first one
>> is an absolute file name?  Or if we cannot ensure that, how do we
>> handle the case where some non-first argument is an absolute file
>> name?
> 
> In path_join (gdbsupport/pathstuff.cc)?  We do this:
> 
>       if (i > 0)
> 	gdb_assert (strlen (path) == 0 || !IS_ABSOLUTE_PATH (path));
> 
> Where 'i' is the index into the array of filenames to combine, for the
> first item we don't do any checks.  For the rest the filename needs to
> either be empty, or not absolute.
> 
> For the new combine_paths (gdb/symfile.c)?  We don't do any checks,
> that's the point.  Paths after the first one might be absolute, and we
> still want to join them together.
> 
>>
>> Btw, the GNU Coding Standards frown on using "path" for anything but
>> PATH-style directory lists.  The ones here should be called "file
>> name" of "file_name" or "fname" or "file" or "directory", but not
>> "path".
> 
> I've updated some of the uses of path to filename in the patch below.
> Mostly in the commit message.  Within GDB's code base the use of path to
> mean filename or directory name is pretty much ubiquitous
> unfortunately.  Still, I've tried to switch where I think it makes
> sense.
> 
> OOI, what should we use when we don't know if something is a filename or
> directory name?  e.g. a function that joins such things (whatever they
> are called) together?  i.e. path_join might be used to join together two
> directory names.  Or a directory name and a filename.  If we call the
> arguments 'filename' then that seems confusing, a directory name is just
> as valid.  And similarly, using 'directory name' seems wrong too.  This,
> I think, is where 'path' starts to creep in.  Plus you have libc
> functions like 'realpath' which use 'path' in the same way....
> 
> Anyway.  An updated patch is below.
> 
> Thanks,
> Andrew
> 
> ---
> 
> commit 74235cbf9db8f364ff8ed46898c34ccc6f38c5b9
> Author: Andrew Burgess <aburgess@redhat.com>
> Date:   Wed Jun 12 15:24:46 2024 +0100
> 
>     gdb: avoid '//' in filenames when searching for debuginfo
>     
>     I spotted that the gdb.base/sysroot-debug-lookup.exp test that I added
>     recently actually had a KPASS when run with the
>     native-extended-gdbserver board.  This was an oversight when adding
>     the test.
>     
>     The failures in this test, when using the 'unix' board, are logged as
>     bug PR gdb/31804.  The problem appears to be caused by the use of the
>     child_path function in find_separate_debug_file.
>     
>     What happens on the 'unix' board is that the file is specified to GDB
>     with a target: prefix, however GDB spots that the target filesystem is
>     local to GDB and so opens the file without a target: prefix.  When we
>     call into find_separate_debug_file the DIR and CANON_DIR arguments,
>     which are computed from the objfile_name() no longer have a target:
>     prefix.
>     
>     However, in this test if the file was opened with a target: prefix,
>     then the sysroot also has a target: prefix.  When child_path is called
>     it looks for a common prefix between CANON_DIR (from the objfile_name)
>     and the sysroot.  However, the sysroot still has the target: prefix,
>     which means the child_path() call fails and returns nullptr.
>     
>     What happens in the native-extended-gdbserver case is that GDB doesn't
>     see the target filesystem as local.  Now the filename retains the
>     target: prefix, which means that in the child_path() call both the
>     sysroot and the CANON_DIR have a target: prefix, and so the
>     child_path() call succeeds.  This allows GDB to progress, try some
>     additional paths, and then find the debug information.
>     
>     So, this commit changes gdb.base/sysroot-debug-lookup.exp to expect
>     the test to succeed when using the native-extended-gdbserver protocol.
>     
>     This leaves one KFAIL when using the native-extended-gdbserver board,
>     we find the debug information but (apparently) find it in the wrong
>     file.  What's happening is that when GDB builds the filename for the
>     debug information we end up with a '//' string as a directory
>     separator, the test regexp only expects a single separator.
>     
>     Instead of just fixing the test regexp, I've added a new function
>     combine_paths which I then use for building the debug info filenames.
>     This function takes care of ensuring only a single '/' is added when
>     joining filenames together.  With this done I now have no KFAIL when
>     using the native-extended-gdbserver board.
>     
>     The new combine_paths function is similar to the existing path_join
>     function (see gdbsupport/pathstuff.h), but unlike path_join the new
>     function doesn't require every filename after the first be a relative
>     filename.  For now I've left the new function in symfile.c, it can
>     always be moved if this proves to be more widely useful.
>     
>     After this commit we still have 2 KFAIL when not using the
>     native-extended-gdbserver board.
>     
>     Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31804
> 
> diff --git a/gdb/symfile.c b/gdb/symfile.c
> index 5a03def91c6..d8120aa00e0 100644
> --- a/gdb/symfile.c
> +++ b/gdb/symfile.c
> @@ -1344,6 +1344,48 @@ show_debug_file_directory (struct ui_file *file, int from_tty,
>  #define DEBUG_SUBDIRECTORY ".debug"
>  #endif
>  
> +/* This is like path_join, but does not have the requirement that every
> +   filename after the first be a relative filename.  All PATHS are joined
> +   together and a directory separator is added if needed between joined
> +   elements.
> +
> +   This allows us to do things like, e.g. append the absolute filename of a
> +   file onto the debug file directory name.  */
> +
> +template<typename ...Args>
> +static std::string
> +combine_paths (Args... paths)
> +{
> +  /* It doesn't make sense to join less than two filenames.  */
> +  static_assert (sizeof... (Args) >= 2);
> +
> +  std::array<const char *, sizeof... (Args)> path_array
> +    { paths... };
> +
> +  std::string ret;
> +
> +  for (int i = 0; i < path_array.size (); ++i)
> +    {
> +      const char *path = path_array[i];
> +
> +      if (!ret.empty ())
> +	{
> +	  /* If RET doesn't end in a separator then add one now.  */
> +	  if (!IS_DIR_SEPARATOR (ret.back ()))
> +	    ret += '/';
> +
> +	  /* Now that RET ends in a separator ignore any at the start of
> +	     PATH.  */
> +	  while (IS_DIR_SEPARATOR (path[0]))
> +	    ++path;
> +	}
> +
> +      ret.append (path);
> +    }
> +
> +  return ret;
> +}

path_join does the bulk of the work in an array_view overload, which avoids
code bloat from having multiple instances of the same code for different
arguments.  I think this would benefit from doing the same.

next prev parent reply	other threads:[~2024-06-14 17:48 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13  9:56 [PATCH 0/2] find debug by debuglink minor fixes Andrew Burgess
2024-06-13  9:56 ` [PATCH 1/2] gdb: avoid '//' in filenames when searching for debuginfo Andrew Burgess
2024-06-13 16:46   ` Tom Tromey
2024-06-14  8:58     ` Andrew Burgess
2024-06-14 11:06       ` Eli Zaretskii
2024-06-14 13:29         ` Andrew Burgess
2024-06-14 14:18           ` Eli Zaretskii
2024-06-15 10:24             ` Andrew Burgess
2024-06-15 12:45               ` Eli Zaretskii
2024-06-15 15:28                 ` Andrew Burgess
2024-06-15 15:33                   ` Andrew Burgess
2024-06-15 16:11                   ` Eli Zaretskii
2024-06-14 17:48           ` Pedro Alves [this message]
2024-06-15  9:54             ` Andrew Burgess
2024-06-14 14:23       ` Tom Tromey
2024-06-15  9:53         ` Andrew Burgess
2024-06-13  9:56 ` [PATCH 2/2] gdb: fix a target: prefix issue in find_separate_debug_file Andrew Burgess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d28a7ed9-444f-4685-9567-15810395e598@palves.net \
    --to=pedro@palves.net \
    --cc=aburgess@redhat.com \
    --cc=eliz@gnu.org \
    --cc=gdb-patches@sourceware.org \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).