public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* [Bug debuginfod/28034] client-side %-escape url characters
@ 2021-08-26 19:27 Noah Sanci
  2021-08-26 21:02 ` Frank Ch. Eigler
  0 siblings, 1 reply; 11+ messages in thread
From: Noah Sanci @ 2021-08-26 19:27 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

Hello,

When requesting some source files, some URL-inconvenient chars
sometimes pop up.  Example from f33 libstdc++:
/buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug/
gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/
libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/
condition_variable.cc
As this URL is passed into debuginfod's handler_cb, it appears that the
+ signs are helpfully unescaped to spaces by libmicrohttpd, which
'course breaks everything.
In order to ensure the server properly parses urls such as this one,
%-escape characters on the client side so that the correct url
is preserved and properly processed on the server side.

This patch preserves '/'s in the url. Please find the patch attached.

-Noah Sanci

[-- Attachment #2: 0001-debuginfod-PR28034-client-side-escape-url-characters.patch --]
[-- Type: text/x-patch, Size: 3869 bytes --]

From 0bc0f3f04b1c058fe8495bd76c962ec9d3c901dd Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Fri, 16 Jul 2021 15:16:20 -0400
Subject: [PATCH] debuginfod: PR28034 - client-side %-escape url characters

When requesting some source files, some URL-inconvenient chars
sometimes pop up.  Example from f33 libstdc++:
/buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug/
gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/
libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/
condition_variable.cc
As this URL is passed into debuginfod's handler_cb, it appears that the
+ signs are helpfully unescaped to spaces by libmicrohttpd, which
'course breaks everything.
In order to ensure the server properly parses urls such as this one,
%-escape characters on the client side so that the correct url
is preserved and properly processed on the server side.

https://sourceware.org/bugzilla/show_bug.cgi?id=28034

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/debuginfod-client.c | 52 ++++++++++++++++++++++++++++------
 1 file changed, 44 insertions(+), 8 deletions(-)

diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index 7d4b220f..ed5943f3 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -904,16 +904,52 @@ debuginfod_query_server (debuginfod_client *c,
       if (filename) /* must start with / */
         {
           /* PR28034 escape characters in completed url to %hh format. */
-          char *escaped_string;
-          escaped_string = curl_easy_escape(data[i].handle, filename, 0);
-          if (!escaped_string)
+          char escaped_string[PATH_MAX] = {'\0'};
+          char *loc = (char *) filename;
+          char *loc2;
+          char *tmp;
+          for(size_t j = 0; j < strlen(filename); ++j)
             {
-              rc = -ENOMEM;
-              goto out2;
+              loc2 = strstr(loc, "/");
+              // If the first character is a '/'
+              if ( (unsigned long) loc2 - (unsigned long) loc <= 0)
+                {
+                  strcat(escaped_string, "/");
+                  loc ++;
+                }
+              // If we have reached the end of the path, there won't be a /
+              // but we still need to process one more string (the file's actual name)
+              else if (loc2 == NULL)
+                {
+                  tmp = curl_easy_escape(data[i].handle, loc, 0);
+                  if (!tmp)
+                    {
+                      rc = -ENOMEM;
+                      goto out1;
+                    }
+                  strcat(escaped_string, tmp);
+                  curl_free(tmp);
+                  break;
+                }
+              // The default case, when there is a string surrounded by '/'s
+              else
+                {
+                  // The third argument to curl_easy_escape isolated the string between the
+                  // '/'s
+                  tmp = curl_easy_escape(data[i].handle, loc, (unsigned long) loc2 - (unsigned long)loc);
+                  if (!tmp)
+                    {
+                      rc = -ENOMEM;
+                      goto out1;
+                    }
+                  strcat(escaped_string, tmp);
+                  strcat(escaped_string, "/");
+                  curl_free(tmp);
+                  loc = loc2+1;
+                }
             }
-          snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
-                   build_id_bytes, type, escaped_string);
-          curl_free(escaped_string);
+          snprintf(data[i].url, PATH_MAX, "%s/%s/%s%s", server_url,
+                    build_id_bytes, type, escaped_string);
         }
       else
         snprintf(data[i].url, PATH_MAX, "%s/%s/%s", server_url, build_id_bytes, type);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug debuginfod/28034] client-side %-escape url characters
  2021-08-26 19:27 [Bug debuginfod/28034] client-side %-escape url characters Noah Sanci
@ 2021-08-26 21:02 ` Frank Ch. Eigler
  2021-08-27 14:44   ` Noah Sanci
  0 siblings, 1 reply; 11+ messages in thread
From: Frank Ch. Eigler @ 2021-08-26 21:02 UTC (permalink / raw)
  To: Noah Sanci; +Cc: elfutils-devel

Hi -

>            /* PR28034 escape characters in completed url to %hh format. */
> -          char *escaped_string;
> -          escaped_string = curl_easy_escape(data[i].handle, filename, 0);
> -          if (!escaped_string)
> +          char escaped_string[PATH_MAX] = {'\0'};
> +          char *loc = (char *) filename;
> +          char *loc2;
> +          char *tmp;
> +          for(size_t j = 0; j < strlen(filename); ++j)
>              {
> -              rc = -ENOMEM;
> -              goto out2;
> +              loc2 = strstr(loc, "/");
> +              // If the first character is a '/'
> [...]

Holy cow that's a lot of work to do it this way.
A couple of alternatives:

- ditch curl_easy_escape :-( and use a
  malloc(strlen(x)*3)
  byte-by-byte copy from source string into destination
    if not [a-zA-Z0-9/.~] then %-escape

or:
- keep curl_easy_escape and postprocess
  byte-by-byte examine the result of curl_easy_escape
  - if seeing a "%2F", replace the % with a / and memmove the
    rest of the string 2 bytes ahead

It shouldn't need strtok or strstr or a lot of logic or stuff like
that really.

- FChE


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug debuginfod/28034] client-side %-escape url characters
  2021-08-26 21:02 ` Frank Ch. Eigler
@ 2021-08-27 14:44   ` Noah Sanci
  2021-08-27 15:07     ` Noah Sanci
  0 siblings, 1 reply; 11+ messages in thread
From: Noah Sanci @ 2021-08-27 14:44 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1382 bytes --]

Hello,

Here is an updated patch, using memmove. Much smaller.

Thanks for the suggestions,

Noah Sanci


On Thu, Aug 26, 2021 at 5:02 PM Frank Ch. Eigler <fche@redhat.com> wrote:
>
> Hi -
>
> >            /* PR28034 escape characters in completed url to %hh format. */
> > -          char *escaped_string;
> > -          escaped_string = curl_easy_escape(data[i].handle, filename, 0);
> > -          if (!escaped_string)
> > +          char escaped_string[PATH_MAX] = {'\0'};
> > +          char *loc = (char *) filename;
> > +          char *loc2;
> > +          char *tmp;
> > +          for(size_t j = 0; j < strlen(filename); ++j)
> >              {
> > -              rc = -ENOMEM;
> > -              goto out2;
> > +              loc2 = strstr(loc, "/");
> > +              // If the first character is a '/'
> > [...]
>
> Holy cow that's a lot of work to do it this way.
> A couple of alternatives:
>
> - ditch curl_easy_escape :-( and use a
>   malloc(strlen(x)*3)
>   byte-by-byte copy from source string into destination
>     if not [a-zA-Z0-9/.~] then %-escape
>
> or:
> - keep curl_easy_escape and postprocess
>   byte-by-byte examine the result of curl_easy_escape
>   - if seeing a "%2F", replace the % with a / and memmove the
>     rest of the string 2 bytes ahead
>
> It shouldn't need strtok or strstr or a lot of logic or stuff like
> that really.
>
> - FChE
>

[-- Attachment #2: 0001-debuginfod-PR28034-client-side-escape-url-characters.patch --]
[-- Type: text/x-patch, Size: 2407 bytes --]

From de7e50955dba711aeee33196bf2bfea3c47696f7 Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Fri, 16 Jul 2021 15:16:20 -0400
Subject: [PATCH] debuginfod: PR28034 - client-side %-escape url characters

When requesting some source files, some URL-inconvenient chars
sometimes pop up.  Example from f33 libstdc++:
/buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug/
gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/
libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/
condition_variable.cc
As this URL is passed into debuginfod's handler_cb, it appears that the
+ signs are helpfully unescaped to spaces by libmicrohttpd, which
'course breaks everything.
In order to ensure the server properly parses urls such as this one,
%-escape characters on the client side so that the correct url
is preserved and properly processed on the server side.

https://sourceware.org/bugzilla/show_bug.cgi?id=28034

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/debuginfod-client.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index 7d4b220f..eb49b583 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -905,13 +905,25 @@ debuginfod_query_server (debuginfod_client *c,
         {
           /* PR28034 escape characters in completed url to %hh format. */
           char *escaped_string;
+          char *loc;
           escaped_string = curl_easy_escape(data[i].handle, filename, 0);
           if (!escaped_string)
             {
               rc = -ENOMEM;
               goto out2;
             }
-          snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
+
+          loc = strstr(escaped_string, "%2F");
+          if (loc != NULL)
+            do
+              {
+                loc[0] = '/';
+                // pull the string back after replacement
+                memmove(loc+1,loc+3,strlen(loc+3));
+                escaped_string[strlen(escaped_string) - 1] = '\0';
+                escaped_string[strlen(escaped_string) - 1] = '\0';
+              } while( (loc = strstr(loc, "%2F")) );
+          snprintf(data[i].url, PATH_MAX, "%s/%s/%s%s", server_url,
                    build_id_bytes, type, escaped_string);
           curl_free(escaped_string);
         }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug debuginfod/28034] client-side %-escape url characters
  2021-08-27 14:44   ` Noah Sanci
@ 2021-08-27 15:07     ` Noah Sanci
  2021-08-27 15:30       ` Noah Sanci
  0 siblings, 1 reply; 11+ messages in thread
From: Noah Sanci @ 2021-08-27 15:07 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1615 bytes --]

Hello

Update 2, no longer append nulls unnecessarily.

-Noah Sanci

On Fri, Aug 27, 2021 at 10:44 AM Noah Sanci <nsanci@redhat.com> wrote:
>
> Hello,
>
> Here is an updated patch, using memmove. Much smaller.
>
> Thanks for the suggestions,
>
> Noah Sanci
>
>
> On Thu, Aug 26, 2021 at 5:02 PM Frank Ch. Eigler <fche@redhat.com> wrote:
> >
> > Hi -
> >
> > >            /* PR28034 escape characters in completed url to %hh format. */
> > > -          char *escaped_string;
> > > -          escaped_string = curl_easy_escape(data[i].handle, filename, 0);
> > > -          if (!escaped_string)
> > > +          char escaped_string[PATH_MAX] = {'\0'};
> > > +          char *loc = (char *) filename;
> > > +          char *loc2;
> > > +          char *tmp;
> > > +          for(size_t j = 0; j < strlen(filename); ++j)
> > >              {
> > > -              rc = -ENOMEM;
> > > -              goto out2;
> > > +              loc2 = strstr(loc, "/");
> > > +              // If the first character is a '/'
> > > [...]
> >
> > Holy cow that's a lot of work to do it this way.
> > A couple of alternatives:
> >
> > - ditch curl_easy_escape :-( and use a
> >   malloc(strlen(x)*3)
> >   byte-by-byte copy from source string into destination
> >     if not [a-zA-Z0-9/.~] then %-escape
> >
> > or:
> > - keep curl_easy_escape and postprocess
> >   byte-by-byte examine the result of curl_easy_escape
> >   - if seeing a "%2F", replace the % with a / and memmove the
> >     rest of the string 2 bytes ahead
> >
> > It shouldn't need strtok or strstr or a lot of logic or stuff like
> > that really.
> >
> > - FChE
> >

[-- Attachment #2: 0001-debuginfod-PR28034-client-side-escape-url-characters.patch --]
[-- Type: text/x-patch, Size: 2268 bytes --]

From f5c7c00c76b200675556a0ecc6bd8a5fdc7a30ea Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Fri, 16 Jul 2021 15:16:20 -0400
Subject: [PATCH] debuginfod: PR28034 - client-side %-escape url characters

When requesting some source files, some URL-inconvenient chars
sometimes pop up.  Example from f33 libstdc++:
/buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug/
gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/
libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/
condition_variable.cc
As this URL is passed into debuginfod's handler_cb, it appears that the
+ signs are helpfully unescaped to spaces by libmicrohttpd, which
'course breaks everything.
In order to ensure the server properly parses urls such as this one,
%-escape characters on the client side so that the correct url
is preserved and properly processed on the server side.

https://sourceware.org/bugzilla/show_bug.cgi?id=28034

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/debuginfod-client.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index 7d4b220f..6db82f79 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -905,13 +905,23 @@ debuginfod_query_server (debuginfod_client *c,
         {
           /* PR28034 escape characters in completed url to %hh format. */
           char *escaped_string;
+          char *loc;
           escaped_string = curl_easy_escape(data[i].handle, filename, 0);
           if (!escaped_string)
             {
               rc = -ENOMEM;
               goto out2;
             }
-          snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
+
+          loc = strstr(escaped_string, "%2F");
+          if (loc != NULL)
+            while( (loc = strstr(loc, "%2F")) )
+              {
+                loc[0] = '/';
+                // pull the string back after replacement
+                memmove(loc+1,loc+3,strlen(loc+3)+1);
+              } 
+          snprintf(data[i].url, PATH_MAX, "%s/%s/%s%s", server_url,
                    build_id_bytes, type, escaped_string);
           curl_free(escaped_string);
         }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug debuginfod/28034] client-side %-escape url characters
  2021-08-27 15:07     ` Noah Sanci
@ 2021-08-27 15:30       ` Noah Sanci
  2021-09-08 13:38         ` Mark Wielaard
  0 siblings, 1 reply; 11+ messages in thread
From: Noah Sanci @ 2021-08-27 15:30 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1864 bytes --]

Hello,

Next update, removed redundant if statement.

-Noah Sanci

On Fri, Aug 27, 2021 at 11:07 AM Noah Sanci <nsanci@redhat.com> wrote:
>
> Hello
>
> Update 2, no longer append nulls unnecessarily.
>
> -Noah Sanci
>
> On Fri, Aug 27, 2021 at 10:44 AM Noah Sanci <nsanci@redhat.com> wrote:
> >
> > Hello,
> >
> > Here is an updated patch, using memmove. Much smaller.
> >
> > Thanks for the suggestions,
> >
> > Noah Sanci
> >
> >
> > On Thu, Aug 26, 2021 at 5:02 PM Frank Ch. Eigler <fche@redhat.com> wrote:
> > >
> > > Hi -
> > >
> > > >            /* PR28034 escape characters in completed url to %hh format. */
> > > > -          char *escaped_string;
> > > > -          escaped_string = curl_easy_escape(data[i].handle, filename, 0);
> > > > -          if (!escaped_string)
> > > > +          char escaped_string[PATH_MAX] = {'\0'};
> > > > +          char *loc = (char *) filename;
> > > > +          char *loc2;
> > > > +          char *tmp;
> > > > +          for(size_t j = 0; j < strlen(filename); ++j)
> > > >              {
> > > > -              rc = -ENOMEM;
> > > > -              goto out2;
> > > > +              loc2 = strstr(loc, "/");
> > > > +              // If the first character is a '/'
> > > > [...]
> > >
> > > Holy cow that's a lot of work to do it this way.
> > > A couple of alternatives:
> > >
> > > - ditch curl_easy_escape :-( and use a
> > >   malloc(strlen(x)*3)
> > >   byte-by-byte copy from source string into destination
> > >     if not [a-zA-Z0-9/.~] then %-escape
> > >
> > > or:
> > > - keep curl_easy_escape and postprocess
> > >   byte-by-byte examine the result of curl_easy_escape
> > >   - if seeing a "%2F", replace the % with a / and memmove the
> > >     rest of the string 2 bytes ahead
> > >
> > > It shouldn't need strtok or strstr or a lot of logic or stuff like
> > > that really.
> > >
> > > - FChE
> > >

[-- Attachment #2: 0001-debuginfod-PR28034-client-side-escape-url-characters.patch --]
[-- Type: text/x-patch, Size: 2188 bytes --]

From a27c9a82546b040fbb8e5263003fd8fb33f1f1e0 Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Fri, 16 Jul 2021 15:16:20 -0400
Subject: [PATCH] debuginfod: PR28034 - client-side %-escape url characters

When requesting some source files, some URL-inconvenient chars
sometimes pop up.  Example from f33 libstdc++:
/buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug/
gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/
libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/
condition_variable.cc
As this URL is passed into debuginfod's handler_cb, it appears that the
+ signs are helpfully unescaped to spaces by libmicrohttpd, which
'course breaks everything.
In order to ensure the server properly parses urls such as this one,
%-escape characters on the client side so that the correct url
is preserved and properly processed on the server side.

https://sourceware.org/bugzilla/show_bug.cgi?id=28034

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/debuginfod-client.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index 7d4b220f..73960424 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -906,12 +906,20 @@ debuginfod_query_server (debuginfod_client *c,
           /* PR28034 escape characters in completed url to %hh format. */
           char *escaped_string;
           escaped_string = curl_easy_escape(data[i].handle, filename, 0);
+          char *loc = escaped_string;
           if (!escaped_string)
             {
               rc = -ENOMEM;
               goto out2;
             }
-          snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
+
+          while( (loc = strstr(loc, "%2F")) )
+            {
+                loc[0] = '/';
+                // pull the string back after replacement
+                memmove(loc+1,loc+3,strlen(loc+3)+1);
+            }
+          snprintf(data[i].url, PATH_MAX, "%s/%s/%s%s", server_url,
                    build_id_bytes, type, escaped_string);
           curl_free(escaped_string);
         }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug debuginfod/28034] client-side %-escape url characters
  2021-08-27 15:30       ` Noah Sanci
@ 2021-09-08 13:38         ` Mark Wielaard
  2021-09-09 17:28           ` Noah Sanci
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Wielaard @ 2021-09-08 13:38 UTC (permalink / raw)
  To: Noah Sanci, Frank Ch. Eigler; +Cc: elfutils-devel

Hi Noah,

This looks good, but the description is not really what the patch does.
It is part of the PR28034 fix, but repeating that problem statement
doesn't really make clear what this patch is about. Could explain in
the commit message that curl_easy_escape () also escapes '/' to '%2F'
and that we want to reverse that back to '/'?

Also some (tiny) optimization hints.

This can be done outside/before the loop

  /* Initialize each handle.  */
  for (int i = 0; i < num_urls; i++)

So you only need to escape once. You of course then need to make sure
the escaped_string is freed after the loop.

We already check that the first char is a '/'. It seems silly to curl
escape that one and then unescape it again. So maybe curl_easy_escape
(data[i].handle, filename + 1, 0) and then change the snprintf pattern
to "%s/%s/%s/%s"?
            ^ the slash got readded here.

The strlen inside the while loop can also be done outside and then
calculated instead of running strlen on the tail every time.

size_t escaped_strlen = strlen (escaped_string);
while( (loc = strstr(loc, "%2F")) )
  {
     loc[0] = '/';
     // pull the string back after replacement
     escaped_strlen -= (loc + 3) - escaped_string);
     memmove(loc+1, loc+3, escaped_strlen + 1);
  }

(Note, not even compiled, so I might be wrong)

Lastly I assume there are already testcases that cover this
functionality? Just wanting to know how you tested it.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug debuginfod/28034] client-side %-escape url characters
  2021-09-08 13:38         ` Mark Wielaard
@ 2021-09-09 17:28           ` Noah Sanci
  2021-09-12 17:24             ` Mark Wielaard
  0 siblings, 1 reply; 11+ messages in thread
From: Noah Sanci @ 2021-09-09 17:28 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Frank Ch. Eigler, elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1251 bytes --]

Hello,

The attached patch %-escapes debuginfod url characters, then unescapes only
'/' characters. Previously characters such as '+' were not escaped and caused
improper escaping further on in handler_cb.
https://sourceware.org/bugzilla/show_bug.cgi?id=28034.

On Wed, Sep 8, 2021 at 9:38 AM Mark Wielaard <mark@klomp.org> wrote:
>   /* Initialize each handle.  */
>   for (int i = 0; i < num_urls; i++)
>
> So you only need to escape once. You of course then need to make sure
> the escaped_string is freed after the loop.
Added

> We already check that the first char is a '/'. It seems silly to curl
> escape that one and then unescape it again. So maybe curl_easy_escape
> (data[i].handle, filename + 1, 0) and then change the snprintf pattern
> to "%s/%s/%s/%s"?
>             ^ the slash got readded here.
Added

> The strlen inside the while loop can also be done outside and then
> calculated instead of running strlen on the tail every time.
Added

> Lastly I assume there are already testcases that cover this
> functionality? Just wanting to know how you tested it.
Previously, with run-debuginfod-find.sh the test was embedded within other
tests. Now the test is independent and has been added to the list of TESTS.

Regards,

Noah Sanci

[-- Attachment #2: 0001-debuginfod-PR28034-Percent-escape-debuginfod-urls.patch --]
[-- Type: text/x-patch, Size: 7527 bytes --]

From a080fd3feb79d601d86c54cf479562d6d5bed395 Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Thu, 9 Sep 2021 13:10:33 -0400
Subject: [PATCH] debuginfod: PR28034 - Percent escape debuginfod urls

When requesting some source files, some URL-inconvenient chars
sometimes pop up, including but not limited to '+', '^', and '&'.
Example from f33 libstdc++:
 /buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug
 /gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/
 libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/
 condition_variable.cc
As this URL is passed into debuginfod's handler_cb, it appears that the
+ signs are helpfully unescaped to spaces by libmicrohttpd, which
'course breaks everything.
To combat this, before querying the debuginfod daemon, clients now %
escape the source filename. This converts many alphanumeric characters
into their %-code format, including '/' to %2F. We want to preserve the
'/' in the url, so after conversion replace %2Fs with a '/'.

https://sourceware.org/bugzilla/show_bug.cgi?id=28034

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/debuginfod-client.c         | 18 ++++++--
 tests/ChangeLog                        |  7 +--
 tests/Makefile.am                      |  2 +
 tests/run-debuginfod-percent-escape.sh | 60 ++++++++++++++++++++++++++
 4 files changed, 79 insertions(+), 8 deletions(-)
 create mode 100755 tests/run-debuginfod-percent-escape.sh

diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index d41723ce..77ca844b 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -883,6 +883,10 @@ debuginfod_query_server (debuginfod_client *c,
       data[i].errbuf[0] = '\0';
     }
 
+  char *escaped_string;
+  if (filename)
+    escaped_string = curl_easy_escape(&target_handle, filename+1, 0);
+
   /* Initialize each handle.  */
   for (int i = 0; i < num_urls; i++)
     {
@@ -904,16 +908,23 @@ debuginfod_query_server (debuginfod_client *c,
       if (filename) /* must start with / */
         {
           /* PR28034 escape characters in completed url to %hh format. */
-          char *escaped_string;
-          escaped_string = curl_easy_escape(data[i].handle, filename, 0);
+          char *loc = escaped_string;
           if (!escaped_string)
             {
               rc = -ENOMEM;
               goto out2;
             }
+
+          size_t escaped_strlen = strlen(escaped_string);
+          while ((loc = strstr(loc, "%2F")))
+            {
+                loc[0] = '/';
+                // pull the string back after replacement
+                memmove(loc+1, loc+3,escaped_strlen - (loc - escaped_string + 2) );
+                escaped_strlen -=2;
+            }
           snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
                    build_id_bytes, type, escaped_string);
-          curl_free(escaped_string);
         }
       else
         snprintf(data[i].url, PATH_MAX, "%s/%s/%s", server_url, build_id_bytes, type);
@@ -953,6 +964,7 @@ debuginfod_query_server (debuginfod_client *c,
       curl_multi_add_handle(curlm, data[i].handle);
     }
 
+  if (filename) curl_free(escaped_string);
   /* Query servers in parallel.  */
   if (vfd >= 0)
     dprintf (vfd, "query %d urls in parallel\n", num_urls);
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 85dca442..b84f420c 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -132,11 +132,8 @@
 2021-07-16  Noah Sanci  <nsanci@redhat.com>
 
 	PR28034
-	* run-debuginfod-find.sh: Added a test ensuring files with %
-	escapable characters in their paths are accessible. The test
-	itself is changing the name of a binary known previously as prog to
-	p+r%o$g. General operations such as accessing p+r%o$g acts as the
-	test for %-escape checking.
+	* run-debuginfod-percent-escape.sh: Added a test ensuring files with %
+	escapable characters in their paths are accessible.
 
 2021-07-21  Noah Sanci <nsanci@redhat.com>
 
diff --git a/tests/Makefile.am b/tests/Makefile.am
index c586422e..ee17d3b1 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -231,6 +231,7 @@ TESTS += run-debuginfod-dlopen.sh \
 	 run-debuginfod-federation-sqlite.sh \
 	 run-debuginfod-federation-link.sh \
 	 run-debuginfod-federation-metrics.sh \
+         run-debuginfod-percent-escape.sh \
 	 run-debuginfod-x-forwarded-for.sh
 endif
 endif
@@ -515,6 +516,7 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
 	     run-debuginfod-archive-groom.sh \
 	     run-debuginfod-archive-rename.sh \
              run-debuginfod-archive-test.sh \
+             run-debuginfod-percent-escape.sh \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.src.rpm \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.x86_64.rpm \
 	     debuginfod-rpms/fedora30/hello2-debuginfo-1.0-2.x86_64.rpm \
diff --git a/tests/run-debuginfod-percent-escape.sh b/tests/run-debuginfod-percent-escape.sh
new file mode 100755
index 00000000..f7d8dc66
--- /dev/null
+++ b/tests/run-debuginfod-percent-escape.sh
@@ -0,0 +1,60 @@
+#!/usr/bin/env bash
+#
+# Copyright (C) 2019-2021 Red Hat, Inc.
+# This file is part of elfutils.
+#
+# This file is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# elfutils is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. $srcdir/debuginfod-subr.sh  # includes set -e
+# for test case debugging, uncomment:
+set -x
+unset VALGRIND_CMD
+# This variable is essential and ensures no time-race for claiming ports occurs
+# set base to a unique multiple of 100 not used in any other 'run-debuginfod-*' test
+base=10000
+get_ports
+DB=${PWD}/.debuginfod_tmp.sqlite
+tempfiles $DB
+mkdir F
+env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod $VERBOSE \
+    -F -R -d $DB -p $PORT1 -t0 -g0 -v R ${PWD}/F > vlog$PORT1 2>&1 &
+PID1=$!
+tempfiles vlog$PORT1
+errfiles vlog$PORT1
+# Server must become ready
+wait_ready $PORT1 'ready' 1
+# Be patient when run on a busy machine things might take a bit.
+
+# Build a non-stripped binary
+echo "int main() { return 0; }" > ${PWD}/F/p++r\$\#o^^g.c
+gcc -Wl,--build-id -g -o ${PWD}/F/p++r\$\#o^^g ${PWD}/F/p++r\$\#o^^g.c
+BUILDID=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../src/readelf \
+          -a ${PWD}/F/p++r\\$\#o^^g | grep 'Build ID' | cut -d ' ' -f 7`
+tempfiles ${PWD}/F/p++r\$\#o^^g.c ${PWD}/F/p++r\$\#o^^g
+kill -USR1 $PID1
+# Now there should be 1 files in the index
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+rm -rf $DEBUGINFOD_CACHE_PATH # clean it from previous tests
+ls F
+env DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache DEBUGINFOD_URLS="http://127.0.0.1:$PORT1" \
+    LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find -vvv source F/p++r\$\#o^^g ${abs_builddir}/F/p++r\$\#o^^g.c > vlog1 2>&1 || true
+tempfiles vlog1
+grep 'F/p%2B%2Br%24%23o%5E%5Eg.c' vlog1
+
+kill $PID1
+wait $PID1
+PID1=0
+exit 0
-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug debuginfod/28034] client-side %-escape url characters
  2021-09-09 17:28           ` Noah Sanci
@ 2021-09-12 17:24             ` Mark Wielaard
  2021-09-13 16:20               ` Noah Sanci
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Wielaard @ 2021-09-12 17:24 UTC (permalink / raw)
  To: Noah Sanci; +Cc: elfutils-devel

Hi Noah,

On Thu, Sep 09, 2021 at 01:28:21PM -0400, Noah Sanci via Elfutils-devel wrote:
> The attached patch %-escapes debuginfod url characters, then unescapes only
> '/' characters. Previously characters such as '+' were not escaped and caused
> improper escaping further on in handler_cb.
> https://sourceware.org/bugzilla/show_bug.cgi?id=28034.
> 
> On Wed, Sep 8, 2021 at 9:38 AM Mark Wielaard <mark@klomp.org> wrote:
> >   /* Initialize each handle.  */
> >   for (int i = 0; i < num_urls; i++)
> >
> > So you only need to escape once. You of course then need to make sure
> > the escaped_string is freed after the loop.
> Added
> 
> > We already check that the first char is a '/'. It seems silly to curl
> > escape that one and then unescape it again. So maybe curl_easy_escape
> > (data[i].handle, filename + 1, 0) and then change the snprintf pattern
> > to "%s/%s/%s/%s"?
> >             ^ the slash got readded here.
> Added
> 
> > The strlen inside the while loop can also be done outside and then
> > calculated instead of running strlen on the tail every time.
> Added

Thanks. I do have one more performance nitpick (sorry).

> +  char *escaped_string;
> +  if (filename)
> +    escaped_string = curl_easy_escape(&target_handle, filename+1, 0);

The escaped_string is created outside the loop and reused each time
(good). But...

> +
>    /* Initialize each handle.  */
>    for (int i = 0; i < num_urls; i++)
>      {
> @@ -904,16 +908,23 @@ debuginfod_query_server (debuginfod_client *c,
>        if (filename) /* must start with / */
>          {
>            /* PR28034 escape characters in completed url to %hh format. */
> -          char *escaped_string;
> -          escaped_string = curl_easy_escape(data[i].handle, filename, 0);
> +          char *loc = escaped_string;
>            if (!escaped_string)
>              {
>                rc = -ENOMEM;
>                goto out2;
>              }

This check, and...

> +
> +          size_t escaped_strlen = strlen(escaped_string);
> +          while ((loc = strstr(loc, "%2F")))
> +            {
> +                loc[0] = '/';
> +                // pull the string back after replacement
> +                memmove(loc+1, loc+3,escaped_strlen - (loc - escaped_string + 2) );
> +                escaped_strlen -=2;
> +            }

the manipulation of the escaped_string, could both also be done
outside the loop, since they always do the same thing.

>            snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
>                     build_id_bytes, type, escaped_string);
> -          curl_free(escaped_string);
>          }
>        else
>          snprintf(data[i].url, PATH_MAX, "%s/%s/%s", server_url, build_id_bytes, type);
> @@ -953,6 +964,7 @@ debuginfod_query_server (debuginfod_client *c,
>        curl_multi_add_handle(curlm, data[i].handle);
>      }
>  
> +  if (filename) curl_free(escaped_string);

And released after the loop. Good.

>    /* Query servers in parallel.  */
>    if (vfd >= 0)
>      dprintf (vfd, "query %d urls in parallel\n", num_urls);
> diff --git a/tests/ChangeLog b/tests/ChangeLog
> index 85dca442..b84f420c 100644
> --- a/tests/ChangeLog
> +++ b/tests/ChangeLog
> @@ -132,11 +132,8 @@
>  2021-07-16  Noah Sanci  <nsanci@redhat.com>
>  
>  	PR28034
> -	* run-debuginfod-find.sh: Added a test ensuring files with %
> -	escapable characters in their paths are accessible. The test
> -	itself is changing the name of a binary known previously as prog to
> -	p+r%o$g. General operations such as accessing p+r%o$g acts as the
> -	test for %-escape checking.
> +	* run-debuginfod-percent-escape.sh: Added a test ensuring files with %
> +	escapable characters in their paths are accessible.

I think you should simply add a new ChangeLog entry at the top instead
of changing an old existing one. And please do mention the Makefile.am
(TESTS) and (EXTRA_DIST) addition.

>  2021-07-21  Noah Sanci <nsanci@redhat.com>
>  
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index c586422e..ee17d3b1 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -231,6 +231,7 @@ TESTS += run-debuginfod-dlopen.sh \
>  	 run-debuginfod-federation-sqlite.sh \
>  	 run-debuginfod-federation-link.sh \
>  	 run-debuginfod-federation-metrics.sh \
> +         run-debuginfod-percent-escape.sh \
>  	 run-debuginfod-x-forwarded-for.sh
>  endif
>  endif
> @@ -515,6 +516,7 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
>  	     run-debuginfod-archive-groom.sh \
>  	     run-debuginfod-archive-rename.sh \
>               run-debuginfod-archive-test.sh \
> +             run-debuginfod-percent-escape.sh \
>  	     debuginfod-rpms/fedora30/hello2-1.0-2.src.rpm \
>  	     debuginfod-rpms/fedora30/hello2-1.0-2.x86_64.rpm \
>  	     debuginfod-rpms/fedora30/hello2-debuginfo-1.0-2.x86_64.rpm \

> +env DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache DEBUGINFOD_URLS="http://127.0.0.1:$PORT1" \
> +    LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find -vvv source F/p++r\$\#o^^g ${abs_builddir}/F/p++r\$\#o^^g.c > vlog1 2>&1 || true
> +tempfiles vlog1
> +grep 'F/p%2B%2Br%24%23o%5E%5Eg.Ac' vlog1

OK, that is certainly a file name with lots of unexpected characters :)

Thanks,

Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug debuginfod/28034] client-side %-escape url characters
  2021-09-12 17:24             ` Mark Wielaard
@ 2021-09-13 16:20               ` Noah Sanci
  2021-09-13 18:11                 ` Noah Sanci
  0 siblings, 1 reply; 11+ messages in thread
From: Noah Sanci @ 2021-09-13 16:20 UTC (permalink / raw)
  To: Mark Wielaard, elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1747 bytes --]

Hello,

On Sun, Sep 12, 2021 at 1:24 PM Mark Wielaard <mark@klomp.org> wrote:
> The escaped_string is created outside the loop and reused each time
> (good). But...
> > +
> >    /* Initialize each handle.  */
> >    for (int i = 0; i < num_urls; i++)
> >      {
> > @@ -904,16 +908,23 @@ debuginfod_query_server (debuginfod_client *c,
> >        if (filename) /* must start with / */
> >          {
> >            /* PR28034 escape characters in completed url to %hh format. */
> > -          char *escaped_string;
> > -          escaped_string = curl_easy_escape(data[i].handle, filename, 0);
> > +          char *loc = escaped_string;
> >            if (!escaped_string)
> >              {
> >                rc = -ENOMEM;
> >                goto out2;
> >              }
>
> This check, and...
>
> > +
> > +          size_t escaped_strlen = strlen(escaped_string);
> > +          while ((loc = strstr(loc, "%2F")))
> > +            {
> > +                loc[0] = '/';
> > +                // pull the string back after replacement
> > +                memmove(loc+1, loc+3,escaped_strlen - (loc - escaped_string + 2) );
> > +                escaped_strlen -=2;
> > +            }
>
> the manipulation of the escaped_string, could both also be done
> outside the loop, since they always do the same thing.
Fixed.

>
> I think you should simply add a new ChangeLog entry at the top instead
> of changing an old existing one. And please do mention the Makefile.am
> (TESTS) and (EXTRA_DIST) addition.
Should be good.

> OK, that is certainly a file name with lots of unexpected characters :)
Gotta be sure :)

If there are any more efficiency changes, ChangeLog changes, or other
changes, don't be afraid to reach out again.

Thanks,

Noah Sanci

[-- Attachment #2: 0001-debuginfod-PR28034-Percent-escape-debuginfod-urls.patch --]
[-- Type: text/x-patch, Size: 8522 bytes --]

From fa9104c7435503d08308e18d4e1a82792f83fc56 Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Thu, 9 Sep 2021 13:10:33 -0400
Subject: [PATCH] debuginfod: PR28034 - Percent escape debuginfod urls

When requesting some source files, some URL-inconvenient chars
sometimes pop up, including but not limited to '+', '^', and '&'.
Example from f33 libstdc++:
 /buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug
 /gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/
 libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/
 condition_variable.cc
As this URL is passed into debuginfod's handler_cb, it appears that the
+ signs are helpfully unescaped to spaces by libmicrohttpd, which
'course breaks everything.
To combat this, before querying the debuginfod daemon, clients now %
escape the source filename. This converts many alphanumeric characters
into their %-code format, including '/' to %2F. We want to preserve the
'/' in the url, so after conversion replace %2Fs with a '/'.

https://sourceware.org/bugzilla/show_bug.cgi?id=28034

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/ChangeLog                   |  4 ++
 debuginfod/debuginfod-client.c         | 29 +++++++++----
 tests/ChangeLog                        | 12 +++---
 tests/Makefile.am                      |  2 +
 tests/run-debuginfod-percent-escape.sh | 60 ++++++++++++++++++++++++++
 5 files changed, 94 insertions(+), 13 deletions(-)
 create mode 100755 tests/run-debuginfod-percent-escape.sh

diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 1173f9cd..b8eaa88c 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -2,6 +2,10 @@
 
 	* debuginfod.cxx (libarchive_fdcache::lookup): Add endl after
 	obatched(clog) line.
+2021-09-13  Noah Sanci  <nsanci@redhat.com>
+
+	* debuginfod-client.c (debuginfod_query_server): Removed constant
+	operations from a loop. curl_free memory.
 
 2021-09-06  Dmitry V. Levin  <ldv@altlinux.org>
 
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index d41723ce..f55e1489 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -883,6 +883,26 @@ debuginfod_query_server (debuginfod_client *c,
       data[i].errbuf[0] = '\0';
     }
 
+  char *escaped_string = NULL;
+  size_t escaped_strlen = 0;
+  if (filename)
+    {
+      escaped_string = curl_easy_escape(&target_handle, filename+1, 0);
+      if (!escaped_string)
+        {
+          rc = -ENOMEM;
+          goto out2;
+        }
+      char *loc = escaped_string;
+      escaped_strlen = strlen(escaped_string) - 2 + (size_t)escaped_string;
+      while ((loc = strstr(loc, "%2F")))
+        {
+          loc[0] = '/';
+          // pull the string back after replacement
+          memmove(loc+1, loc+3,escaped_strlen - (size_t)loc );
+          escaped_strlen -=2;
+        }
+    }
   /* Initialize each handle.  */
   for (int i = 0; i < num_urls; i++)
     {
@@ -904,16 +924,8 @@ debuginfod_query_server (debuginfod_client *c,
       if (filename) /* must start with / */
         {
           /* PR28034 escape characters in completed url to %hh format. */
-          char *escaped_string;
-          escaped_string = curl_easy_escape(data[i].handle, filename, 0);
-          if (!escaped_string)
-            {
-              rc = -ENOMEM;
-              goto out2;
-            }
           snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
                    build_id_bytes, type, escaped_string);
-          curl_free(escaped_string);
         }
       else
         snprintf(data[i].url, PATH_MAX, "%s/%s/%s", server_url, build_id_bytes, type);
@@ -953,6 +965,7 @@ debuginfod_query_server (debuginfod_client *c,
       curl_multi_add_handle(curlm, data[i].handle);
     }
 
+  if (filename) curl_free(escaped_string);
   /* Query servers in parallel.  */
   if (vfd >= 0)
     dprintf (vfd, "query %d urls in parallel\n", num_urls);
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 1154686a..3f219320 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -30,6 +30,11 @@
 	* run-debuginfod-federation-metrics.sh: Likewise.
 	* run-debuginfod-federation-sqlite.sh: Likewise.
 
+2021-09-13  Noah Sanci  <nsanci@redhat.com>
+
+	* Makefile.am: added run-debuginfod-percent-escape.sh to TESTS and
+	EXTRA_DIST.
+
 2021-09-06  Dmitry V. Levin  <ldv@altlinux.org>
 
 	* elfcopy.c (copy_elf): Remove cast of malloc return value.
@@ -164,11 +169,8 @@
 2021-07-16  Noah Sanci  <nsanci@redhat.com>
 
 	PR28034
-	* run-debuginfod-find.sh: Added a test ensuring files with %
-	escapable characters in their paths are accessible. The test
-	itself is changing the name of a binary known previously as prog to
-	p+r%o$g. General operations such as accessing p+r%o$g acts as the
-	test for %-escape checking.
+	* run-debuginfod-percent-escape.sh: Added a test ensuring files with %
+	escapable characters in their paths are accessible.
 
 2021-07-21  Noah Sanci <nsanci@redhat.com>
 
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 22942733..43c34ce6 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -231,6 +231,7 @@ TESTS += run-debuginfod-dlopen.sh \
 	 run-debuginfod-federation-sqlite.sh \
 	 run-debuginfod-federation-link.sh \
 	 run-debuginfod-federation-metrics.sh \
+         run-debuginfod-percent-escape.sh \
 	 run-debuginfod-x-forwarded-for.sh
 endif
 endif
@@ -524,6 +525,7 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
 	     run-debuginfod-archive-groom.sh \
 	     run-debuginfod-archive-rename.sh \
              run-debuginfod-archive-test.sh \
+             run-debuginfod-percent-escape.sh \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.src.rpm \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.x86_64.rpm \
 	     debuginfod-rpms/fedora30/hello2-debuginfo-1.0-2.x86_64.rpm \
diff --git a/tests/run-debuginfod-percent-escape.sh b/tests/run-debuginfod-percent-escape.sh
new file mode 100755
index 00000000..f7d8dc66
--- /dev/null
+++ b/tests/run-debuginfod-percent-escape.sh
@@ -0,0 +1,60 @@
+#!/usr/bin/env bash
+#
+# Copyright (C) 2019-2021 Red Hat, Inc.
+# This file is part of elfutils.
+#
+# This file is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# elfutils is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. $srcdir/debuginfod-subr.sh  # includes set -e
+# for test case debugging, uncomment:
+set -x
+unset VALGRIND_CMD
+# This variable is essential and ensures no time-race for claiming ports occurs
+# set base to a unique multiple of 100 not used in any other 'run-debuginfod-*' test
+base=10000
+get_ports
+DB=${PWD}/.debuginfod_tmp.sqlite
+tempfiles $DB
+mkdir F
+env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod $VERBOSE \
+    -F -R -d $DB -p $PORT1 -t0 -g0 -v R ${PWD}/F > vlog$PORT1 2>&1 &
+PID1=$!
+tempfiles vlog$PORT1
+errfiles vlog$PORT1
+# Server must become ready
+wait_ready $PORT1 'ready' 1
+# Be patient when run on a busy machine things might take a bit.
+
+# Build a non-stripped binary
+echo "int main() { return 0; }" > ${PWD}/F/p++r\$\#o^^g.c
+gcc -Wl,--build-id -g -o ${PWD}/F/p++r\$\#o^^g ${PWD}/F/p++r\$\#o^^g.c
+BUILDID=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../src/readelf \
+          -a ${PWD}/F/p++r\\$\#o^^g | grep 'Build ID' | cut -d ' ' -f 7`
+tempfiles ${PWD}/F/p++r\$\#o^^g.c ${PWD}/F/p++r\$\#o^^g
+kill -USR1 $PID1
+# Now there should be 1 files in the index
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+rm -rf $DEBUGINFOD_CACHE_PATH # clean it from previous tests
+ls F
+env DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache DEBUGINFOD_URLS="http://127.0.0.1:$PORT1" \
+    LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find -vvv source F/p++r\$\#o^^g ${abs_builddir}/F/p++r\$\#o^^g.c > vlog1 2>&1 || true
+tempfiles vlog1
+grep 'F/p%2B%2Br%24%23o%5E%5Eg.c' vlog1
+
+kill $PID1
+wait $PID1
+PID1=0
+exit 0
-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug debuginfod/28034] client-side %-escape url characters
  2021-09-13 16:20               ` Noah Sanci
@ 2021-09-13 18:11                 ` Noah Sanci
  2021-09-16 10:35                   ` Mark Wielaard
  0 siblings, 1 reply; 11+ messages in thread
From: Noah Sanci @ 2021-09-13 18:11 UTC (permalink / raw)
  To: Mark Wielaard, elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 104 bytes --]

Hello

Quick arithmetic change to the original patch with an updated commit message.

Best,

Noah Sanci

[-- Attachment #2: 0001-debuginfod-PR28034-No-longer-escape-and-loop-efficie.patch --]
[-- Type: text/x-patch, Size: 8628 bytes --]

From cb2a3957d017f40e1edb35ed8f8fd3b9dee1c6be Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Thu, 9 Sep 2021 13:10:33 -0400
Subject: [PATCH] debuginfod: PR28034 - No longer escape '/', and loop
 efficiency

Previously, urls containing '/', so most urls, would escape '/' to %2F,
which is undesirable for use in other libraries which may escape
differently. This patch escapes the '/' and replaces all of them
ensuring there are no %2Fs sent.
Some inefficiencies within the code were fixed, such as changing constant
operations of a while loop within a for loop to a while loop outside of
a for loop. Also strlen is no longer used within the loop, simplifying
the interior operations to mere arithmetic.

https://sourceware.org/bugzilla/show_bug.cgi?id=28034

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/ChangeLog                   |  4 ++
 debuginfod/debuginfod-client.c         | 35 +++++++++++----
 tests/ChangeLog                        | 12 +++---
 tests/Makefile.am                      |  2 +
 tests/run-debuginfod-percent-escape.sh | 60 ++++++++++++++++++++++++++
 5 files changed, 100 insertions(+), 13 deletions(-)
 create mode 100755 tests/run-debuginfod-percent-escape.sh

diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 1173f9cd..b8eaa88c 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -2,6 +2,10 @@
 
 	* debuginfod.cxx (libarchive_fdcache::lookup): Add endl after
 	obatched(clog) line.
+2021-09-13  Noah Sanci  <nsanci@redhat.com>
+
+	* debuginfod-client.c (debuginfod_query_server): Removed constant
+	operations from a loop. curl_free memory.
 
 2021-09-06  Dmitry V. Levin  <ldv@altlinux.org>
 
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index d41723ce..8a1c68d5 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -883,6 +883,32 @@ debuginfod_query_server (debuginfod_client *c,
       data[i].errbuf[0] = '\0';
     }
 
+  char *escaped_string = NULL;
+  size_t escaped_strlen = 0;
+  if (filename)
+    {
+      escaped_string = curl_easy_escape(&target_handle, filename+1, 0);
+      if (!escaped_string)
+        {
+          rc = -ENOMEM;
+          goto out2;
+        }
+      char *loc = escaped_string;
+      escaped_strlen = strlen(escaped_string);
+      while ((loc = strstr(loc, "%2F")))
+        {
+          loc[0] = '/';
+          //pull the string back after replacement
+          // loc-escaped_string finds the distance from the origin to the new location
+          // - 2 accounts for the 2F which remain and don't need to be measured.
+          // The two above subtracted from escaped_strlen yields the remaining characters
+          // in the string which we want to pull back
+          memmove(loc+1, loc+3,escaped_strlen - (loc-escaped_string) - 2);
+          //Because the 2F was overwritten in the memmove (as desired) escaped_strlen is
+          // now two shorter.
+          escaped_strlen -= 2;
+        }
+    }
   /* Initialize each handle.  */
   for (int i = 0; i < num_urls; i++)
     {
@@ -904,16 +930,8 @@ debuginfod_query_server (debuginfod_client *c,
       if (filename) /* must start with / */
         {
           /* PR28034 escape characters in completed url to %hh format. */
-          char *escaped_string;
-          escaped_string = curl_easy_escape(data[i].handle, filename, 0);
-          if (!escaped_string)
-            {
-              rc = -ENOMEM;
-              goto out2;
-            }
           snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
                    build_id_bytes, type, escaped_string);
-          curl_free(escaped_string);
         }
       else
         snprintf(data[i].url, PATH_MAX, "%s/%s/%s", server_url, build_id_bytes, type);
@@ -953,6 +971,7 @@ debuginfod_query_server (debuginfod_client *c,
       curl_multi_add_handle(curlm, data[i].handle);
     }
 
+  if (filename) curl_free(escaped_string);
   /* Query servers in parallel.  */
   if (vfd >= 0)
     dprintf (vfd, "query %d urls in parallel\n", num_urls);
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 1154686a..3f219320 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -30,6 +30,11 @@
 	* run-debuginfod-federation-metrics.sh: Likewise.
 	* run-debuginfod-federation-sqlite.sh: Likewise.
 
+2021-09-13  Noah Sanci  <nsanci@redhat.com>
+
+	* Makefile.am: added run-debuginfod-percent-escape.sh to TESTS and
+	EXTRA_DIST.
+
 2021-09-06  Dmitry V. Levin  <ldv@altlinux.org>
 
 	* elfcopy.c (copy_elf): Remove cast of malloc return value.
@@ -164,11 +169,8 @@
 2021-07-16  Noah Sanci  <nsanci@redhat.com>
 
 	PR28034
-	* run-debuginfod-find.sh: Added a test ensuring files with %
-	escapable characters in their paths are accessible. The test
-	itself is changing the name of a binary known previously as prog to
-	p+r%o$g. General operations such as accessing p+r%o$g acts as the
-	test for %-escape checking.
+	* run-debuginfod-percent-escape.sh: Added a test ensuring files with %
+	escapable characters in their paths are accessible.
 
 2021-07-21  Noah Sanci <nsanci@redhat.com>
 
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 22942733..43c34ce6 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -231,6 +231,7 @@ TESTS += run-debuginfod-dlopen.sh \
 	 run-debuginfod-federation-sqlite.sh \
 	 run-debuginfod-federation-link.sh \
 	 run-debuginfod-federation-metrics.sh \
+         run-debuginfod-percent-escape.sh \
 	 run-debuginfod-x-forwarded-for.sh
 endif
 endif
@@ -524,6 +525,7 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
 	     run-debuginfod-archive-groom.sh \
 	     run-debuginfod-archive-rename.sh \
              run-debuginfod-archive-test.sh \
+             run-debuginfod-percent-escape.sh \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.src.rpm \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.x86_64.rpm \
 	     debuginfod-rpms/fedora30/hello2-debuginfo-1.0-2.x86_64.rpm \
diff --git a/tests/run-debuginfod-percent-escape.sh b/tests/run-debuginfod-percent-escape.sh
new file mode 100755
index 00000000..f7d8dc66
--- /dev/null
+++ b/tests/run-debuginfod-percent-escape.sh
@@ -0,0 +1,60 @@
+#!/usr/bin/env bash
+#
+# Copyright (C) 2019-2021 Red Hat, Inc.
+# This file is part of elfutils.
+#
+# This file is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# elfutils is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. $srcdir/debuginfod-subr.sh  # includes set -e
+# for test case debugging, uncomment:
+set -x
+unset VALGRIND_CMD
+# This variable is essential and ensures no time-race for claiming ports occurs
+# set base to a unique multiple of 100 not used in any other 'run-debuginfod-*' test
+base=10000
+get_ports
+DB=${PWD}/.debuginfod_tmp.sqlite
+tempfiles $DB
+mkdir F
+env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod $VERBOSE \
+    -F -R -d $DB -p $PORT1 -t0 -g0 -v R ${PWD}/F > vlog$PORT1 2>&1 &
+PID1=$!
+tempfiles vlog$PORT1
+errfiles vlog$PORT1
+# Server must become ready
+wait_ready $PORT1 'ready' 1
+# Be patient when run on a busy machine things might take a bit.
+
+# Build a non-stripped binary
+echo "int main() { return 0; }" > ${PWD}/F/p++r\$\#o^^g.c
+gcc -Wl,--build-id -g -o ${PWD}/F/p++r\$\#o^^g ${PWD}/F/p++r\$\#o^^g.c
+BUILDID=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../src/readelf \
+          -a ${PWD}/F/p++r\\$\#o^^g | grep 'Build ID' | cut -d ' ' -f 7`
+tempfiles ${PWD}/F/p++r\$\#o^^g.c ${PWD}/F/p++r\$\#o^^g
+kill -USR1 $PID1
+# Now there should be 1 files in the index
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+rm -rf $DEBUGINFOD_CACHE_PATH # clean it from previous tests
+ls F
+env DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache DEBUGINFOD_URLS="http://127.0.0.1:$PORT1" \
+    LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find -vvv source F/p++r\$\#o^^g ${abs_builddir}/F/p++r\$\#o^^g.c > vlog1 2>&1 || true
+tempfiles vlog1
+grep 'F/p%2B%2Br%24%23o%5E%5Eg.c' vlog1
+
+kill $PID1
+wait $PID1
+PID1=0
+exit 0
-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug debuginfod/28034] client-side %-escape url characters
  2021-09-13 18:11                 ` Noah Sanci
@ 2021-09-16 10:35                   ` Mark Wielaard
  0 siblings, 0 replies; 11+ messages in thread
From: Mark Wielaard @ 2021-09-16 10:35 UTC (permalink / raw)
  To: Noah Sanci, elfutils-devel

Hi Noah,

On Mon, 2021-09-13 at 14:11 -0400, Noah Sanci via Elfutils-devel wrote:
> Quick arithmetic change to the original patch with an updated commit
> message.

Thanks for commenting this, it is easy to make an off-by-one (or two)
error.

> Previously, urls containing '/', so most urls, would escape '/' to %2F,
> which is undesirable for use in other libraries which may escape
> differently. This patch escapes the '/' and replaces all of them
> ensuring there are no %2Fs sent.
> Some inefficiencies within the code were fixed, such as changing constant
> operations of a while loop within a for loop to a while loop outside of
> a for loop. Also strlen is no longer used within the loop, simplifying
> the interior operations to mere arithmetic.
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=28034

Looks good.
Could you push it to the master branch?
(Please do rebase first, so we keep a linear history)

Thanks,

Mark


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-16 10:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-26 19:27 [Bug debuginfod/28034] client-side %-escape url characters Noah Sanci
2021-08-26 21:02 ` Frank Ch. Eigler
2021-08-27 14:44   ` Noah Sanci
2021-08-27 15:07     ` Noah Sanci
2021-08-27 15:30       ` Noah Sanci
2021-09-08 13:38         ` Mark Wielaard
2021-09-09 17:28           ` Noah Sanci
2021-09-12 17:24             ` Mark Wielaard
2021-09-13 16:20               ` Noah Sanci
2021-09-13 18:11                 ` Noah Sanci
2021-09-16 10:35                   ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).