public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Andrew Burgess <aburgess@redhat.com>
To: Pedro Alves <pedro@palves.net>, Eli Zaretskii <eliz@gnu.org>
Cc: tom@tromey.com, gdb-patches@sourceware.org
Subject: Re: [PATCH 1/2] gdb: avoid '//' in filenames when searching for debuginfo
Date: Sat, 15 Jun 2024 10:54:49 +0100	[thread overview]
Message-ID: <87v82a7cpi.fsf@redhat.com> (raw)
In-Reply-To: <d28a7ed9-444f-4685-9567-15810395e598@palves.net>

Pedro Alves <pedro@palves.net> writes:

> On 2024-06-14 14:29, Andrew Burgess wrote:
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>>>> From: Andrew Burgess <aburgess@redhat.com>
>>>> Cc: gdb-patches@sourceware.org
>>>> Date: Fri, 14 Jun 2024 09:58:09 +0100
>>>>
>>>> Tom Tromey <tom@tromey.com> writes:
>>>>
>>>>>>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:
>>>>>
>>>>> Andrew> The new combine_paths function is similar to the existing path_join
>>>>> Andrew> function (see gdbsupport/pathstuff.h), but unlike path_join the new
>>>>> Andrew> function doesn't require that the first path be absolute.
>>>>>
>>>>> From what I can see, path_join doesn't have this requirement.
>>>>> The docs say "The first element can be absolute or relative" and the
>>>>> code seems to agree.
>>>>
>>>> You're absolutely right.  And I see what's happened.  Somewhere along
>>>> the line the logic has gotten reversed in my head.  The problem isn't
>>>> that the first _must_ be absolute with path_join, it's that everything
>>>> else _must_not_ be absolute, see these lines in path_join:
>>>>
>>>>       if (i > 0)
>>>> 	gdb_assert (strlen (path) == 0 || !IS_ABSOLUTE_PATH (path));
>>>>
>>>> And for what we're doing here we, for example, join the path to the file
>>>> (which is absolute) onto the debug directory, which might, or might not,
>>>> be absolute, e.g. these lines in find_separate_debug_file:
>>>>
>>>>   /* If the file is in the sysroot, try using its base path in
>>>>      the global debugfile directory.  */
>>>>   debugfile = path_join (target_prefix_str, debugdir.get (),
>>>> 			 base_path, debuglink);
>>>>
>>>> I've initially understood the problem, written the code, then come back
>>>> and added comments and a commit message, and gotten mixed up somewhere.
>>>> Sorry about that.
>>>>
>>>> Anyway, here's an updated patch.  I've change one comment in symfile.c,
>>>> and I've also updated the commit message, other than that the patch is
>>>> unchanged.
>>>>
>>>> Oh, and just because I was doubting myself, I did try using path_join,
>>>> and as predicted the test (sysroot-debug-lookup.exp) causes GDB to
>>>> trigger an assertion as we are appending an absolute path.
>>>
>>> I suggest to test this with the first argument empty and the second
>>> starting with two slashes: those should be left intact, at least on
>>> MS-Windows (due to UNCs).
>> 
>> Sorry, I'm feel foggy today... what is 'this' in this paragraph.
>> 
>> So, I went and read up what UNCs means, I believe it's filenames like:
>> 
>>   \\SERVER\foo\bar\woof
>> 
>> And I guess, if they can start with forward slashes then maybe this is
>> valid too?
>> 
>>   //SERVER/foo/bar/woof
>> 
>> I'm pretty sure that's not currently going to work, at least in some
>> places we'll try to do:
>> 
>>  /debug-directory//SERVER/foo/bar/woof
>> 
>> which probably isn't going to help.  Right now the new combine_path is
>> going to mush this into '/debug-directory/SERVER/foo/bar/woof' dropping
>> one of the '/' characters.  I can change this if the double slash is in
>> fact useful in the middle of the filename, but I couldn't find anything
>> immediately that suggests it would be.
>> 
>>> Also, how do we make sure none of the arguments except the first one
>>> is an absolute file name?  Or if we cannot ensure that, how do we
>>> handle the case where some non-first argument is an absolute file
>>> name?
>> 
>> In path_join (gdbsupport/pathstuff.cc)?  We do this:
>> 
>>       if (i > 0)
>> 	gdb_assert (strlen (path) == 0 || !IS_ABSOLUTE_PATH (path));
>> 
>> Where 'i' is the index into the array of filenames to combine, for the
>> first item we don't do any checks.  For the rest the filename needs to
>> either be empty, or not absolute.
>> 
>> For the new combine_paths (gdb/symfile.c)?  We don't do any checks,
>> that's the point.  Paths after the first one might be absolute, and we
>> still want to join them together.
>> 
>>>
>>> Btw, the GNU Coding Standards frown on using "path" for anything but
>>> PATH-style directory lists.  The ones here should be called "file
>>> name" of "file_name" or "fname" or "file" or "directory", but not
>>> "path".
>> 
>> I've updated some of the uses of path to filename in the patch below.
>> Mostly in the commit message.  Within GDB's code base the use of path to
>> mean filename or directory name is pretty much ubiquitous
>> unfortunately.  Still, I've tried to switch where I think it makes
>> sense.
>> 
>> OOI, what should we use when we don't know if something is a filename or
>> directory name?  e.g. a function that joins such things (whatever they
>> are called) together?  i.e. path_join might be used to join together two
>> directory names.  Or a directory name and a filename.  If we call the
>> arguments 'filename' then that seems confusing, a directory name is just
>> as valid.  And similarly, using 'directory name' seems wrong too.  This,
>> I think, is where 'path' starts to creep in.  Plus you have libc
>> functions like 'realpath' which use 'path' in the same way....
>> 
>> Anyway.  An updated patch is below.
>> 
>> Thanks,
>> Andrew
>> 
>> ---
>> 
>> commit 74235cbf9db8f364ff8ed46898c34ccc6f38c5b9
>> Author: Andrew Burgess <aburgess@redhat.com>
>> Date:   Wed Jun 12 15:24:46 2024 +0100
>> 
>>     gdb: avoid '//' in filenames when searching for debuginfo
>>     
>>     I spotted that the gdb.base/sysroot-debug-lookup.exp test that I added
>>     recently actually had a KPASS when run with the
>>     native-extended-gdbserver board.  This was an oversight when adding
>>     the test.
>>     
>>     The failures in this test, when using the 'unix' board, are logged as
>>     bug PR gdb/31804.  The problem appears to be caused by the use of the
>>     child_path function in find_separate_debug_file.
>>     
>>     What happens on the 'unix' board is that the file is specified to GDB
>>     with a target: prefix, however GDB spots that the target filesystem is
>>     local to GDB and so opens the file without a target: prefix.  When we
>>     call into find_separate_debug_file the DIR and CANON_DIR arguments,
>>     which are computed from the objfile_name() no longer have a target:
>>     prefix.
>>     
>>     However, in this test if the file was opened with a target: prefix,
>>     then the sysroot also has a target: prefix.  When child_path is called
>>     it looks for a common prefix between CANON_DIR (from the objfile_name)
>>     and the sysroot.  However, the sysroot still has the target: prefix,
>>     which means the child_path() call fails and returns nullptr.
>>     
>>     What happens in the native-extended-gdbserver case is that GDB doesn't
>>     see the target filesystem as local.  Now the filename retains the
>>     target: prefix, which means that in the child_path() call both the
>>     sysroot and the CANON_DIR have a target: prefix, and so the
>>     child_path() call succeeds.  This allows GDB to progress, try some
>>     additional paths, and then find the debug information.
>>     
>>     So, this commit changes gdb.base/sysroot-debug-lookup.exp to expect
>>     the test to succeed when using the native-extended-gdbserver protocol.
>>     
>>     This leaves one KFAIL when using the native-extended-gdbserver board,
>>     we find the debug information but (apparently) find it in the wrong
>>     file.  What's happening is that when GDB builds the filename for the
>>     debug information we end up with a '//' string as a directory
>>     separator, the test regexp only expects a single separator.
>>     
>>     Instead of just fixing the test regexp, I've added a new function
>>     combine_paths which I then use for building the debug info filenames.
>>     This function takes care of ensuring only a single '/' is added when
>>     joining filenames together.  With this done I now have no KFAIL when
>>     using the native-extended-gdbserver board.
>>     
>>     The new combine_paths function is similar to the existing path_join
>>     function (see gdbsupport/pathstuff.h), but unlike path_join the new
>>     function doesn't require every filename after the first be a relative
>>     filename.  For now I've left the new function in symfile.c, it can
>>     always be moved if this proves to be more widely useful.
>>     
>>     After this commit we still have 2 KFAIL when not using the
>>     native-extended-gdbserver board.
>>     
>>     Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31804
>> 
>> diff --git a/gdb/symfile.c b/gdb/symfile.c
>> index 5a03def91c6..d8120aa00e0 100644
>> --- a/gdb/symfile.c
>> +++ b/gdb/symfile.c
>> @@ -1344,6 +1344,48 @@ show_debug_file_directory (struct ui_file *file, int from_tty,
>>  #define DEBUG_SUBDIRECTORY ".debug"
>>  #endif
>>  
>> +/* This is like path_join, but does not have the requirement that every
>> +   filename after the first be a relative filename.  All PATHS are joined
>> +   together and a directory separator is added if needed between joined
>> +   elements.
>> +
>> +   This allows us to do things like, e.g. append the absolute filename of a
>> +   file onto the debug file directory name.  */
>> +
>> +template<typename ...Args>
>> +static std::string
>> +combine_paths (Args... paths)
>> +{
>> +  /* It doesn't make sense to join less than two filenames.  */
>> +  static_assert (sizeof... (Args) >= 2);
>> +
>> +  std::array<const char *, sizeof... (Args)> path_array
>> +    { paths... };
>> +
>> +  std::string ret;
>> +
>> +  for (int i = 0; i < path_array.size (); ++i)
>> +    {
>> +      const char *path = path_array[i];
>> +
>> +      if (!ret.empty ())
>> +	{
>> +	  /* If RET doesn't end in a separator then add one now.  */
>> +	  if (!IS_DIR_SEPARATOR (ret.back ()))
>> +	    ret += '/';
>> +
>> +	  /* Now that RET ends in a separator ignore any at the start of
>> +	     PATH.  */
>> +	  while (IS_DIR_SEPARATOR (path[0]))
>> +	    ++path;
>> +	}
>> +
>> +      ret.append (path);
>> +    }
>> +
>> +  return ret;
>> +}
>
> path_join does the bulk of the work in an array_view overload, which avoids
> code bloat from having multiple instances of the same code for different
> arguments.  I think this would benefit from doing the same.

OK, I'll see how the discussion with Tom goes, but if this function
doesn't end up being deleted then I'll rewrite this with an array_view
worker core.

Thanks,
Andrew


  reply	other threads:[~2024-06-15  9:54 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13  9:56 [PATCH 0/2] find debug by debuglink minor fixes Andrew Burgess
2024-06-13  9:56 ` [PATCH 1/2] gdb: avoid '//' in filenames when searching for debuginfo Andrew Burgess
2024-06-13 16:46   ` Tom Tromey
2024-06-14  8:58     ` Andrew Burgess
2024-06-14 11:06       ` Eli Zaretskii
2024-06-14 13:29         ` Andrew Burgess
2024-06-14 14:18           ` Eli Zaretskii
2024-06-15 10:24             ` Andrew Burgess
2024-06-15 12:45               ` Eli Zaretskii
2024-06-15 15:28                 ` Andrew Burgess
2024-06-15 15:33                   ` Andrew Burgess
2024-06-15 16:11                   ` Eli Zaretskii
2024-06-14 17:48           ` Pedro Alves
2024-06-15  9:54             ` Andrew Burgess [this message]
2024-06-14 14:23       ` Tom Tromey
2024-06-15  9:53         ` Andrew Burgess
2024-06-13  9:56 ` [PATCH 2/2] gdb: fix a target: prefix issue in find_separate_debug_file Andrew Burgess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v82a7cpi.fsf@redhat.com \
    --to=aburgess@redhat.com \
    --cc=eliz@gnu.org \
    --cc=gdb-patches@sourceware.org \
    --cc=pedro@palves.net \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).