public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
From: Noah Sanci <nsanci@redhat.com>
To: Mark Wielaard <mark@klomp.org>
Cc: elfutils-devel@sourceware.org, Frank Eigler <fche@redhat.com>
Subject: Re: [Bug debuginfod/28034] %-escape url characters
Date: Wed, 21 Jul 2021 14:18:15 -0400	[thread overview]
Message-ID: <CAJXA7qiHxBxryT67msvo2e256LYj9syWKx=2LxFErZL81KDkQw@mail.gmail.com> (raw)
In-Reply-To: <YPcM03PSfYzbaq6Z@wildebeest.org>

[-- Attachment #1: Type: text/plain, Size: 5851 bytes --]

Hello,

> On Mon, Jul 19, 2021 at 10:53:17AM -0400, Noah Sanci via Elfutils-devel wrote:
> > When requesting some source files, some URL-inconvenient chars
> > sometimes pop up.  Example from f33 libstdc++:
> > /buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug/
> > gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/
> > libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/
> > condition_variable.cc
> > As this URL is passed into debuginfod's handler_cb, it appears that the
> > + signs are helpfully unescaped to spaces by libmicrohttpd, which
> > 'course breaks everything.  We need to suppress this HTTP URL
> > processing step.  Also worth checking that %HEX decoding is also
> > suppressed.
>
> I find this description slightly confusing. It ends with "Also worth
> checking..." but that is actually what is done in this patch. The part
> before that about what debuginfod/microhttpd does on the server side
> is interesting, but not really what this patch is about (just why this
> patch is necessary, but it seems necessary on the client side always
> whatever server is used).

Thank you for bringing this to my attention. Fixed

> curl_easy_escape () can return NULL on failure.
>

Fixed.

> > +Note: the client should %-escape characters in /SOURCE/FILE that are not shown as "unreserved" in section 2.3 of RFC3986.

Please read the new note just to ensure that it's reasonable, as my
updated note specifies some characters that will be percent escaped.
Otherwise fixed.

> > @@ -149,18 +150,18 @@ ps -q $PID1 -e -L -o '%p %c %a' | grep traverse
> >  # Compile a simple program, strip its debuginfo and save the build-id.
> >  # Also move the debuginfo into another directory so that elfutils
> >  # cannot find it without debuginfod.
> > -echo "int main() { return 0; }" > ${PWD}/prog.c
> > -tempfiles prog.c
> > +echo "int main() { return 0; }" > ${PWD}/p+r%o\$g.c
> > +tempfiles p+r%o\$g.c
> >  # Create a subdirectory to confound source path names
> >  mkdir foobar
> > -gcc -Wl,--build-id -g -o prog ${PWD}/foobar///./../prog.c
> > -testrun ${abs_top_builddir}/src/strip -g -f prog.debug ${PWD}/prog
> > +gcc -Wl,--build-id -g -o p+r%o\$g ${PWD}/foobar///./../p+r%o\$g.c
> > +testrun ${abs_top_builddir}/src/strip -g -f p+r%o\$g.debug ${PWD}/p+r%o\$g
> >  BUILDID=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../src/readelf \
> > -          -a prog | grep 'Build ID' | cut -d ' ' -f 7`
> > +          -a p+r%o\\$g | grep 'Build ID' | cut -d ' ' -f 7`
> >
> >  wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
> > -mv prog F
> > -mv prog.debug F
> > +mv p+r%o\$g F
> > +mv p+r%o\$g.debug F
> >  kill -USR1 $PID1
> >  # Wait till both files are in the index.
> >  wait_ready $PORT1 'thread_work_total{role="traverse"}' 2
> > @@ -171,7 +172,7 @@ wait_ready $PORT1 'thread_busy{role="scan"}' 0
> >
> >  # Test whether elfutils, via the debuginfod client library dlopen hooks,
> >  # is able to fetch debuginfo from the local debuginfod.
> > -testrun ${abs_builddir}/debuginfod_build_id_find -e F/prog 1
> > +testrun ${abs_builddir}/debuginfod_build_id_find -e F/p+r%o\$g 1
> >
> >  ########################################################################
> >
> > @@ -213,22 +214,22 @@ fi
> >  # Test whether debuginfod-find is able to fetch those files.
> >  rm -rf $DEBUGINFOD_CACHE_PATH # clean it from previous tests
> >  filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find debuginfo $BUILDID`
> > -cmp $filename F/prog.debug
> > +cmp $filename F/p+r%o\$g.debug
> >  if [ -w $filename ]; then
> >      echo "cache file writable, boo"
> >      err
> >  fi
> >
> > -filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find executable F/prog`
> > -cmp $filename F/prog
> > +filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find executable F/p+r%o\\$g`
> > +cmp $filename F/p+r%o\$g
> >
> >  # raw source filename
> > -filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find source $BUILDID ${PWD}/foobar///./../prog.c`
> > -cmp $filename  ${PWD}/prog.c
> > +filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find source $BUILDID ${PWD}/foobar///./../p+r%o\\$g.c`
> > +cmp $filename  ${PWD}/p+r%o\$g.c
> >
> >  # and also the canonicalized one
> > -filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find source $BUILDID ${PWD}/prog.c`
> > -cmp $filename  ${PWD}/prog.c
> > +filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find source $BUILDID ${PWD}/p+r%o\\$g.c`
> > +cmp $filename  ${PWD}/p+r%o\$g.c
> >
> >
> >  ########################################################################
> > @@ -672,7 +673,7 @@ touch -d '1970-01-01' $DEBUGINFOD_CACHE_PATH/00000000 # old enough to guarantee
> >  echo 0 > $DEBUGINFOD_CACHE_PATH/cache_clean_interval_s
> >  echo 0 > $DEBUGINFOD_CACHE_PATH/max_unused_age_s
> >
> > -testrun ${abs_builddir}/debuginfod_build_id_find -e F/prog 1
> > +testrun ${abs_builddir}/debuginfod_build_id_find -e F/p+r%o\$g 1
> >
> >  testrun ${abs_top_builddir}/debuginfod/debuginfod-find debuginfo $BUILDID2 && false || true
> >
> > @@ -725,6 +726,7 @@ PID4=$!
> >  wait_ready $PORT3 'ready' 1
> >  tempfiles vlog$PORT3
> >  errfiles vlog$PORT3
> > +tempfiles $DB.backup*
>
> ???
>

Hopefully the updated changelog message clarifies these changes.

> >  kill -USR2 $PID4
> >  wait_ready $PORT3 'thread_work_total{role="groom"}' 1
> > @@ -738,6 +740,7 @@ wait_ready $PORT3 'groom{statistic="files scanned (#)"}' 0
> >  wait_ready $PORT3 'groom{statistic="files scanned (mb)"}' 0
> >
> >  kill $PID4
> > +PID4=0
>
> And this?
> Might need to be
>
>   wait $PID
>   PID4=0

The small test suite fixes have been removed from the script, however
I would like to update the test suite ASAP. Can I just commit a new
branch titled something like 'elfutils-test-fixes' without a specific
pr?

Noah

[-- Attachment #2: 0001-debuginfod-PR28034-client-side-escape-url-characters.patch --]
[-- Type: text/x-patch, Size: 7532 bytes --]

From 26d39eee2ecc030ac7e488977092f62afd72e903 Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Fri, 16 Jul 2021 15:16:20 -0400
Subject: [PATCH] debuginfod: PR28034 - client-side %-escape url characters

When requesting some source files, some URL-inconvenient chars
sometimes pop up.  Example from f33 libstdc++:
/buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug/
gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/
libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/
condition_variable.cc
As this URL is passed into debuginfod's handler_cb, it appears that the
+ signs are helpfully unescaped to spaces by libmicrohttpd, which
'course breaks everything.
In order to ensure the server properly parses urls such as this one,
%-escape characters on the client side so that the correct url
is preserved and properly processed on the server side.

https://sourceware.org/bugzilla/show_bug.cgi?id=28034

Signed-off-by: Noah Sanci <nsanci@redhat.com>

a
---
 debuginfod/ChangeLog           |  6 ++++++
 debuginfod/debuginfod-client.c | 17 ++++++++++++++---
 doc/debuginfod.8               |  4 ++++
 tests/ChangeLog                |  9 +++++++++
 tests/run-debuginfod-find.sh   | 32 ++++++++++++++++----------------
 5 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index e71436ca..aad35a4e 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -1,3 +1,9 @@
+2021-07-16  Noah Sanci  <nsanci@redhat.com>
+
+	PR28034
+	* debuginfod-client.c (debuginfod_query_server): % escape filename
+	so the completed url is processed properly.
+
 2021-07-06  Alice Zhang  <alizhang@redhat.com>
 
         PR27531
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index f12f609c..64936acd 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -831,9 +831,20 @@ debuginfod_query_server (debuginfod_client *c,
       else
         slashbuildid = "/buildid";
 
-      if (filename) /* must start with / */
-        snprintf(data[i].url, PATH_MAX, "%s%s/%s/%s%s", server_url,
-                 slashbuildid, build_id_bytes, type, filename);
+      if (filename)/* must start with / */
+        {
+          /* PR28034 escape characters in completed url to %hh format. */
+          char *escaped_string;
+          escaped_string = curl_easy_escape(data[i].handle, filename, 0);
+          if (!escaped_string)
+            {
+              rc = ENOMEM;
+              goto out1;
+            }
+          snprintf(data[i].url, PATH_MAX, "%s%s/%s/%s%s", server_url,
+                   slashbuildid, build_id_bytes, type, escaped_string);
+          curl_free(escaped_string);
+        }
       else
         snprintf(data[i].url, PATH_MAX, "%s%s/%s/%s", server_url,
                  slashbuildid, build_id_bytes, type);
diff --git a/doc/debuginfod.8 b/doc/debuginfod.8
index 1adf703a..ee8f76ae 100644
--- a/doc/debuginfod.8
+++ b/doc/debuginfod.8
@@ -299,6 +299,10 @@ l l.
 \../bar/foo.c AT_comp_dir=/zoo/	/buildid/BUILDID/source/zoo//../bar/foo.c
 .TE
 
+Note: the client should %-escape characters in /SOURCE/FILE that are
+not shown as "unreserved" in section 2.3 of RFC3986. Some characters
+that will be escaped include "+", "\\", "$", "!", the 'space' character,
+and ";". RFC3986 includes a more comprehensive list of these characters.
 .SS /metrics
 
 This endpoint returns a Prometheus formatted text/plain dump of a
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 1460b422..0840e7af 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,12 @@
+2021-07-16  Noah Sanci  <nsanci@redhat.com>
+
+	PR28034
+	* run-debuginfod-find.sh: Added a test ensuring files with %
+	escapable characters in their paths are accessible. The test
+	itself is changing the name of a binary known previously as prog to
+	p+r%o$g. General operations such as accessing p+r%o$g acts as the
+	test for %-escape checking.
+
 2021-07-06  Alice Zhang  <alizhang@redhat.com>
 
         PR27531
diff --git a/tests/run-debuginfod-find.sh b/tests/run-debuginfod-find.sh
index 73bbe65b..ecf3195e 100755
--- a/tests/run-debuginfod-find.sh
+++ b/tests/run-debuginfod-find.sh
@@ -149,18 +149,18 @@ ps -q $PID1 -e -L -o '%p %c %a' | grep traverse
 # Compile a simple program, strip its debuginfo and save the build-id.
 # Also move the debuginfo into another directory so that elfutils
 # cannot find it without debuginfod.
-echo "int main() { return 0; }" > ${PWD}/prog.c
-tempfiles prog.c
+echo "int main() { return 0; }" > ${PWD}/p+r%o\$g.c
+tempfiles p+r%o\$g.c
 # Create a subdirectory to confound source path names
 mkdir foobar
-gcc -Wl,--build-id -g -o prog ${PWD}/foobar///./../prog.c
-testrun ${abs_top_builddir}/src/strip -g -f prog.debug ${PWD}/prog
+gcc -Wl,--build-id -g -o p+r%o\$g ${PWD}/foobar///./../p+r%o\$g.c
+testrun ${abs_top_builddir}/src/strip -g -f p+r%o\$g.debug ${PWD}/p+r%o\$g
 BUILDID=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../src/readelf \
-          -a prog | grep 'Build ID' | cut -d ' ' -f 7`
+          -a p+r%o\\$g | grep 'Build ID' | cut -d ' ' -f 7`
 
 wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
-mv prog F
-mv prog.debug F
+mv p+r%o\$g F
+mv p+r%o\$g.debug F
 kill -USR1 $PID1
 # Wait till both files are in the index.
 wait_ready $PORT1 'thread_work_total{role="traverse"}' 2
@@ -171,7 +171,7 @@ wait_ready $PORT1 'thread_busy{role="scan"}' 0
 
 # Test whether elfutils, via the debuginfod client library dlopen hooks,
 # is able to fetch debuginfo from the local debuginfod.
-testrun ${abs_builddir}/debuginfod_build_id_find -e F/prog 1
+testrun ${abs_builddir}/debuginfod_build_id_find -e F/p+r%o\$g 1
 
 ########################################################################
 
@@ -213,22 +213,22 @@ fi
 # Test whether debuginfod-find is able to fetch those files.
 rm -rf $DEBUGINFOD_CACHE_PATH # clean it from previous tests
 filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find debuginfo $BUILDID`
-cmp $filename F/prog.debug
+cmp $filename F/p+r%o\$g.debug
 if [ -w $filename ]; then
     echo "cache file writable, boo"
     err
 fi
 
-filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find executable F/prog`
-cmp $filename F/prog
+filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find executable F/p+r%o\\$g`
+cmp $filename F/p+r%o\$g
 
 # raw source filename
-filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find source $BUILDID ${PWD}/foobar///./../prog.c`
-cmp $filename  ${PWD}/prog.c
+filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find source $BUILDID ${PWD}/foobar///./../p+r%o\\$g.c`
+cmp $filename  ${PWD}/p+r%o\$g.c
 
 # and also the canonicalized one
-filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find source $BUILDID ${PWD}/prog.c`
-cmp $filename  ${PWD}/prog.c
+filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find source $BUILDID ${PWD}/p+r%o\\$g.c`
+cmp $filename  ${PWD}/p+r%o\$g.c
 
 
 ########################################################################
@@ -672,7 +672,7 @@ touch -d '1970-01-01' $DEBUGINFOD_CACHE_PATH/00000000 # old enough to guarantee
 echo 0 > $DEBUGINFOD_CACHE_PATH/cache_clean_interval_s
 echo 0 > $DEBUGINFOD_CACHE_PATH/max_unused_age_s
 
-testrun ${abs_builddir}/debuginfod_build_id_find -e F/prog 1
+testrun ${abs_builddir}/debuginfod_build_id_find -e F/p+r%o\$g 1
 
 testrun ${abs_top_builddir}/debuginfod/debuginfod-find debuginfo $BUILDID2 && false || true
 
-- 
2.31.1


  parent reply	other threads:[~2021-07-21 18:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-19 14:53 Noah Sanci
2021-07-20 17:50 ` Mark Wielaard
2021-07-20 18:13   ` Frank Ch. Eigler
2021-07-20 19:08   ` Mark Wielaard
2021-07-21 18:18   ` Noah Sanci [this message]
2021-07-21 19:44     ` Noah Sanci
2021-07-22 12:29       ` Mark Wielaard
2021-07-22 12:28     ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJXA7qiHxBxryT67msvo2e256LYj9syWKx=2LxFErZL81KDkQw@mail.gmail.com' \
    --to=nsanci@redhat.com \
    --cc=elfutils-devel@sourceware.org \
    --cc=fche@redhat.com \
    --cc=mark@klomp.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).