public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Max Yvon Zimmermann <max.yvon.zimmermann@campus.tu-berlin.de>
To: Guinevere Larsen <blarsen@redhat.com>, Eli Zaretskii <eliz@gnu.org>
Cc: <gdb-patches@sourceware.org>
Subject: Re: [PATCH] Add wildcard matching to substitute-path rules
Date: Fri, 5 Apr 2024 00:52:41 +0200	[thread overview]
Message-ID: <58bcd451-6cec-4f11-9428-79d9b3037e88@campus.tu-berlin.de> (raw)
In-Reply-To: <845a6a0f-6a7d-4b75-84b8-6dd2d33ea0e3@redhat.com>

Thank you once again for your feedback. You made me take a deep dive into how source redirection works in detail. It was quite fun!
I will try to answer the concerns from both of you in this reply.

>> Actually, this is already mentioned in the documentation. But maybe it can be made a bit clearer. On the other hand, I really do not want to encourage the extra \-escaping. I am thinking of writing something like this instead:
>>
>> "On DOS-based file system, you are encouraged to use '/' as the path separator within your patterns. This avoids escaping the '\' characters and would still match against file names that use '\' as the path separator. For instance, the pattern 'C:/foo/*' would still match 'C:\foo\bar'."
> 
> I'm not sure this should be in the manual, since the way file names
> are written in the debug info is not under user's control.  If the
> file names in the debug info use backslashes, will GDB recognize
> rewrite rules that use forward slashes?  Even if GDB does treat both
> characters as equal on Windows, I'm not sure users will trust that up
> front, so they might use backslashes in the rewrite rules anyway.

Yes, GDB will recognize rewrite rules that use forward slashes even if file names in the debug info use backslashes.
I can of course still add an example for double \-escaped DOS paths if you want me to.

>> That's not what I meant.  I meant that a rule like
>>
>>   set substitute-path /*/include /path/to/somewhere
>>
>> can catch any parent directory of 'include', and which one it will
>> catch is not known in advance.
> 
> This was something I was also wondering about. taking from the original example:
> 
> Max> In the case of conan (https://conan.io/) the path will consist of some constant part and some directory named with some hash value (i.e., /path/to/<unique hash>/builddir/)
> 
> Suppose there are 2 builds for this package, so we have the following valid directories: /path/to/deadbeef/builddir /path/to/00000000/builddir
> 
> which of those will be chosen? Is it consistent across gdb runs? OSs? How can a user know in advance which will be picked?

I think I need to clear up some confusion about the direction of this mapping. We map from the file names provided in the debug info to our local source files. NOT the other way around!

In my original example we would set the following rule:

set substitute-path /path/to/*/builddir /src

Let's say we are debugging an executable which contains the file name '/path/to/deadbeef/builddir/main.c' in its debug info.
When GDB wants to open the file it will look up any substitution rules that match this file name.
In this case the file would match against our specified rule and would therefore be rewritten as '/src/main.c'. This is unambiguous.
Now '/src/main.c' will be opened by GDB.
If we were to debug another executable which contains the file name '/path/to/00000000/builddir/main.c' in its debug info, the same thing would happen.

I want to emphasize that we are NOT looking for any paths that would "match" '/src/main.c'.
So your question about which path will be chosen does not really apply.
There is no path to be chosen.

But I think I know what you mean with your question.
In a nutshell, the "chosen" path depends on which build of the package is currently being debugged.

Is this consistent across GDB runs?
Yes.

Is this consistent across OSs?
Yes.

How can a user know in advance which will be picked?
There is nothing to be picked.

>>> Also, in the example I stated in my last reply, no file name that matches '/foo/include' would ever match '/bar/include'. The point I was trying to get across was that two unrelated files can already be mapped onto the same file (accidentally).
> 
> The difference I see is that in the previous examples, the problem is apparent within what the user would expect (some GDB script, debug information, or other things related to their project), whereas with glob matching, the problem may be unrelated (directory structure of unrelated folders).
> 
> If I wasn't aware of this patch, I would never think to check if the folder structure of my system could be the problem for getting debug information.

Once again, this is not really an issue due to the direction of the mapping.
The folder structure of a system does not affect which rule will be matched. I know this might sound like a bold statement but it's true.
Only the file names in the debug info are relevant.

>> Also, in the example I stated in my last reply, no file name that matches '/foo/include' would ever match '/bar/include'. The point I was trying to get across was that two unrelated files can already be mapped onto the same file (accidentally).
> 
> Well, two wrongs don't make a right.

This is a very fair point. Having different files map to the same file in the local source directory could be annoying.
Because this isn't a new issue per se, I thought about implementing a separate patch that would warn the user if two different files get mapped onto the same source file.
But after some thought I realized that there are actually valid use-cases for this multi-mapping.

Let's say we have the following directory structure:

/inc.h
/foo.c
/dir/bar.c

'/foo.c' includes '/inc.h' via #include "inc.h".
'/dir/bar.c' includes '/inc.h' via #include "../inc.h"

With GCC this will result in two entries in the debug info: '/inc.h' and '/dir/../inc.h'.
For GDB these entries are two separate files. (Maybe this could changed with another patch though)
But with the help of the following two rules we can tell GDB to treat them the same:

set substitute-path /from /to
set substitute-path /from/dir/../ /to

I therefore don't know if multi-mapping should be considered wrong here.
Also I don't think that glob matching would make this more common due to the aforementioned direction of the mapping.
For me, this is a completely separate issue.

  reply	other threads:[~2024-04-04 22:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-01 21:26 Max Yvon Zimmermann
2024-04-02 11:46 ` Eli Zaretskii
2024-04-04  0:05   ` Max Yvon Zimmermann
2024-04-04  5:53     ` Eli Zaretskii
2024-04-04  8:52       ` Max Yvon Zimmermann
2024-04-04 11:52         ` Eli Zaretskii
2024-04-04 12:42           ` Guinevere Larsen
2024-04-04 22:52             ` Max Yvon Zimmermann [this message]
2024-04-05  5:58               ` Max Yvon Zimmermann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58bcd451-6cec-4f11-9428-79d9b3037e88@campus.tu-berlin.de \
    --to=max.yvon.zimmermann@campus.tu-berlin.de \
    --cc=blarsen@redhat.com \
    --cc=eliz@gnu.org \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).