public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* [Bug debuginfod/25548] New: also support canonicalized source-file name lookups in webapi
@ 2020-02-13 15:02 fche at redhat dot com
  2020-02-13 15:19 ` [Bug debuginfod/25548] " jan.kratochvil at redhat dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: fche at redhat dot com @ 2020-02-13 15:02 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=25548

            Bug ID: 25548
           Summary: also support canonicalized source-file name lookups in
                    webapi
           Product: elfutils
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: debuginfod
          Assignee: unassigned at sourceware dot org
          Reporter: fche at redhat dot com
                CC: elfutils-devel at sourceware dot org,
                    jan.kratochvil at redhat dot com, kkleine at redhat dot com
  Target Milestone: ---

It turns out that some debuginfo consumers canonicalize source-code file names
by the time they get the convenient chance to fall back to debuginfod.  This
means that path substrings that POSIX defines as no-ops are lost.  debuginfod
should extend a gracious embrace to these tools by also permitting such paths
to be used during the /buildid/hexcode/source/PATH webapi.

Implementing this should not be too hard.  The dwarf_extract_source_paths could
add both waldo and a new canonicalize_posix_path(waldo) to the returned
debug_sourcefiles[] set.  That's it.

The canonicalize_posix_path() function would perform these textual rewrites,
repeating until convergence:

   /./       ->  /
   /FOO/../  ->  /    # NB: FOO must not be . or .. because   /../../ != /
   //        ->  /

(realpath(3) is not helpful because it looks at the host filesystem to do
resolution, which we cannot when dealing with archive-resident file names.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug debuginfod/25548] also support canonicalized source-file name lookups in webapi
  2020-02-13 15:02 [Bug debuginfod/25548] New: also support canonicalized source-file name lookups in webapi fche at redhat dot com
@ 2020-02-13 15:19 ` jan.kratochvil at redhat dot com
  2020-02-13 17:20 ` kkleine at redhat dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: jan.kratochvil at redhat dot com @ 2020-02-13 15:19 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=25548

--- Comment #1 from Jan Kratochvil <jan.kratochvil at redhat dot com> ---
// g++ -o x x.cpp -Wall $(llvm-config --cxxflags --libs)
#include <llvm/ADT/SmallString.h>
#include <llvm/Support/Path.h>
#include <stdio.h>
int main(int argc, char **argv) {
  while (*++argv) {
    llvm::SmallString<128> path(*argv);
    if (llvm::sys::path::remove_dots(path, true /*remove_dot_dot*/,
                                     llvm::sys::path::Style::posix))
      puts(path.c_str());
  }
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug debuginfod/25548] also support canonicalized source-file name lookups in webapi
  2020-02-13 15:02 [Bug debuginfod/25548] New: also support canonicalized source-file name lookups in webapi fche at redhat dot com
  2020-02-13 15:19 ` [Bug debuginfod/25548] " jan.kratochvil at redhat dot com
@ 2020-02-13 17:20 ` kkleine at redhat dot com
  2020-02-14 12:21 ` labath at google dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: kkleine at redhat dot com @ 2020-02-13 17:20 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=25548

--- Comment #2 from Konrad Kleine <kkleine at redhat dot com> ---
Please hold back with implementing this because from what I heard from LLDB
folks, the normalization of paths is not always sound. I wouldn't want to
spread this if we're not 100% clear.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug debuginfod/25548] also support canonicalized source-file name lookups in webapi
  2020-02-13 15:02 [Bug debuginfod/25548] New: also support canonicalized source-file name lookups in webapi fche at redhat dot com
  2020-02-13 15:19 ` [Bug debuginfod/25548] " jan.kratochvil at redhat dot com
  2020-02-13 17:20 ` kkleine at redhat dot com
@ 2020-02-14 12:21 ` labath at google dot com
  2020-03-01 19:47 ` fche at redhat dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: labath at google dot com @ 2020-02-14 12:21 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=25548

Pavel Labath <labath at google dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |labath at google dot com

--- Comment #3 from Pavel Labath <labath at google dot com> ---
The "soundness" that Konrad mentions in #c2 refers to the fact that "/FOO/../ 
->  /" transformation is correct only in case "FOO" is not a symlink. If it is,
then "FOO/.." can point to just about anywhere. The only way to know where it
will _really_  point to is to fetch the original value of the symlink, but this
may not be possible if one is dealing with paths which do not necessarily come
from the host system.

So, I think that doing these kinds of transformations in these cases (e.g. when
setting a breakpoint) is reasonable -- we don't need to a file to actually
exist in order to set a breakpoint, and we don't want to force the user to type
the exact path that happened to find its way into the debug info.

The part where this (IMO) becomes unsound is when lldb does these
transformations on _all_ paths, even those that definitely do refer to the host
filesystem (e.g. the file which we are about to execute).

Anyway, going back to this bug, I don't think it's unreasonable for debuginfod
to support queries with "canonicalized" paths, but I can also understand a
position which says that this canonicalization is bad, and that one should
perform queries with the original path. Indeed, one can envision a world where
lldb allows the user to set a breakpoint using the "canonicalized" path, but
then performs the file lookups using the "original" path instead. These lookups
would not even need to be limited to debuginfod -- using the original path for
host filesystem lookups would also improve correctness in certain obscure
scenarios. It's just that reaching this world in lldb will be tricky (but not
impossible), because this canonicalization happens at a very low layer (but
this is something I would want to change anyway).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug debuginfod/25548] also support canonicalized source-file name lookups in webapi
  2020-02-13 15:02 [Bug debuginfod/25548] New: also support canonicalized source-file name lookups in webapi fche at redhat dot com
                   ` (2 preceding siblings ...)
  2020-02-14 12:21 ` labath at google dot com
@ 2020-03-01 19:47 ` fche at redhat dot com
  2020-03-25  2:13 ` fche at redhat dot com
  2020-03-26 14:44 ` fche at redhat dot com
  5 siblings, 0 replies; 7+ messages in thread
From: fche at redhat dot com @ 2020-03-01 19:47 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=25548

--- Comment #4 from Frank Ch. Eigler <fche at redhat dot com> ---
Related to this, Eli Schwartz helped identify two separate outright bugs in the
code that touch on this problem.

The debuginfod-client.c curl api code neglects to pass CURLOPT_PATH_AS_IS, so
source file references with /../ bits in them get collapsed.  This is the
opposite of what the webapi docs say.

The above bug hid a separate bug in the server.  There, an inconsistent
canonicalization rule is in effect with archives.  srefs are uncanonicalized,
but sdefs are canonicalized.  That makes those source queries unsatisfiable. 
Whoops.

The cure should consist of:

- adding CURLOPT_PATH_AS_IS to the client, where I swear it was at one point
- teaching the server to store both canonicalized AND uncanonicalized
artifactsrc names in the _f_s and _r_sref tables (so up to two rows instead of
just one)

and then bob will in fact be our uncle.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug debuginfod/25548] also support canonicalized source-file name lookups in webapi
  2020-02-13 15:02 [Bug debuginfod/25548] New: also support canonicalized source-file name lookups in webapi fche at redhat dot com
                   ` (3 preceding siblings ...)
  2020-03-01 19:47 ` fche at redhat dot com
@ 2020-03-25  2:13 ` fche at redhat dot com
  2020-03-26 14:44 ` fche at redhat dot com
  5 siblings, 0 replies; 7+ messages in thread
From: fche at redhat dot com @ 2020-03-25  2:13 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=25548

--- Comment #5 from Frank Ch. Eigler <fche at redhat dot com> ---
https://sourceware.org/pipermail/elfutils-devel/2020q1/002522.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug debuginfod/25548] also support canonicalized source-file name lookups in webapi
  2020-02-13 15:02 [Bug debuginfod/25548] New: also support canonicalized source-file name lookups in webapi fche at redhat dot com
                   ` (4 preceding siblings ...)
  2020-03-25  2:13 ` fche at redhat dot com
@ 2020-03-26 14:44 ` fche at redhat dot com
  5 siblings, 0 replies; 7+ messages in thread
From: fche at redhat dot com @ 2020-03-26 14:44 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=25548

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #6 from Frank Ch. Eigler <fche at redhat dot com> ---
commit d63a809da467e646480c273b8eb276401679d2bb

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-26 14:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13 15:02 [Bug debuginfod/25548] New: also support canonicalized source-file name lookups in webapi fche at redhat dot com
2020-02-13 15:19 ` [Bug debuginfod/25548] " jan.kratochvil at redhat dot com
2020-02-13 17:20 ` kkleine at redhat dot com
2020-02-14 12:21 ` labath at google dot com
2020-03-01 19:47 ` fche at redhat dot com
2020-03-25  2:13 ` fche at redhat dot com
2020-03-26 14:44 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).