public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Xiaoming Ni <nixiaoming@huawei.com>, libc-alpha@sourceware.org
Subject: Re: [PATCH 1/3] stdlib: Use fixed buffer size for realpath (BZ #26241)
Date: Wed, 12 Aug 2020 20:04:11 -0300	[thread overview]
Message-ID: <eb51b9f2-ad44-69de-e987-3754477af93b@linaro.org> (raw)
In-Reply-To: <c32d054d-5ae4-2452-b710-f05d443f9c18@huawei.com>



On 11/08/2020 22:38, Xiaoming Ni wrote:
> On 2020/8/11 22:57, Adhemerval Zanella wrote:
>>
>>
>> On 11/08/2020 00:00, Xiaoming Ni wrote:
>>> On 2020/8/11 4:48, Adhemerval Zanella wrote:
>>>> It uses both a fixed internal buffer with PATH_MAX size to read and
>>>> copy the results of the readlink call.
>>>>
>>>> Also, if PATH_MAX is not defined it uses a default value of 1024
>>>> as for other stdlib implementations.
>>>>
>>>> The expected stack usage is about 8k on Linux where PATH_MAX is
>>>> define as 4096 (plus some internal function usage for local
>>>> variable).
>>>>
>>>> Checked on x86_64-linux-gnu and i686-linux-gnu.
>>>> ---
>>>>    stdlib/Makefile                               |   3 +-
>>>>    stdlib/canonicalize.c                         |  38 +++---
>>>>    stdlib/tst-canon-bz26341.c                    | 108 ++++++++++++++++++
>>>>    support/support_set_small_thread_stack_size.c |  12 +-
>>>>    support/xthread.h                             |   2 +
>>>>    5 files changed, 138 insertions(+), 25 deletions(-)
>>>>    create mode 100644 stdlib/tst-canon-bz26341.c
>>>>
>>>> diff --git a/stdlib/Makefile b/stdlib/Makefile
>>>> index 4615f6dfe7..7093b8a584 100644
>>>> --- a/stdlib/Makefile
>>>> +++ b/stdlib/Makefile
>>>> @@ -87,7 +87,7 @@ tests        := tst-strtol tst-strtod testmb testrand testsort testdiv   \
>>>>               tst-makecontext-align test-bz22786 tst-strtod-nan-sign \
>>>>               tst-swapcontext1 tst-setcontext4 tst-setcontext5 \
>>>>               tst-setcontext6 tst-setcontext7 tst-setcontext8 \
>>>> -           tst-setcontext9 tst-bz20544
>>>> +           tst-setcontext9 tst-bz20544 tst-canon-bz26341
>>>>      tests-internal    := tst-strtod1i tst-strtod3 tst-strtod4 tst-strtod5i \
>>>>               tst-tls-atexit tst-tls-atexit-nodelete
>>>> @@ -102,6 +102,7 @@ LDLIBS-test-atexit-race = $(shared-thread-library)
>>>>    LDLIBS-test-at_quick_exit-race = $(shared-thread-library)
>>>>    LDLIBS-test-cxa_atexit-race = $(shared-thread-library)
>>>>    LDLIBS-test-on_exit-race = $(shared-thread-library)
>>>> +LDLIBS-tst-canon-bz26341 = $(shared-thread-library)
>>>>      LDLIBS-test-dlclose-exit-race = $(shared-thread-library) $(libdl)
>>>>    LDFLAGS-test-dlclose-exit-race = $(LDFLAGS-rdynamic)
>>>> diff --git a/stdlib/canonicalize.c b/stdlib/canonicalize.c
>>>> index cbd885a3c5..554ba221e4 100644
>>>> --- a/stdlib/canonicalize.c
>>>> +++ b/stdlib/canonicalize.c
>>>> @@ -28,6 +28,14 @@
>>>>    #include <eloop-threshold.h>
>>>>    #include <shlib-compat.h>
>>>>    +#ifndef PATH_MAX
>>>> +# ifdef MAXPATHLEN
>>>> +#  define PATH_MAX MAXPATHLEN
>>>> +# else
>>>> +#  define PATH_MAX 1024
>>>> +# endif
>>>> +#endif
>>>> +
>>>>    /* Return the canonical absolute name of file NAME.  A canonical name
>>>>       does not contain any `.', `..' components nor any repeated path
>>>>       separators ('/') or symlinks.  All path components must exist.  If
>>>> @@ -42,9 +50,8 @@
>>>>    char *
>>>>    __realpath (const char *name, char *resolved)
>>>>    {
>>>> -  char *rpath, *dest, *extra_buf = NULL;
>>>> +  char *rpath, *dest, extra_buf[PATH_MAX];
>>> Why does the 4 KB stack space need to be occupied?  Even if there are no linked files ?
>>
>> It does not, it is a simplification to avoid to decompose the function
>> and handle symlinks in a special case.  To avoid the stack allocation
>> for common case would need to either to use dynamic allocation or
>> adjust the function that once it founds a symlink, it calls another
>> function to handle the loop with a stack allocated provided buffer.
>> I don't think this extra code complexity really pays off.
> 
> 
> Extract the symlinks processing as an independent function and move extra_buf and buf to the new independent function to avoid wasting 8 KB stack space when the realpath is used to process unlinked files.
> Is this better?

Yes, my only reservation is the complexity and possible code duplication
to handle it.  I can't see no easy way to accomplish it without duplicate
the loop code (minus the 'extra_buf' alloca) and make the default loop
calling it with the stack allocated extra_buf (and I would like to avoid
this approach).

Another possibility which I think it would be better it to use a scratch
buffer and make some compromise with stack usage and heap allocation.
The default 1024 bytes of the scratch buffer should hit mostly of the
common calls (it is 1/4 of PATH_MAX), so malloc would be called only for
large paths (which should be uncommon).  We can also use a scratch buffer
for the readlink as well, since we might infer the required size from
the previous lstat call.

With something like below we can make the realpath uses a stack of about
~1024 and ~2048 if the path contains symbolic link:

---

diff --git a/stdlib/canonicalize.c b/stdlib/canonicalize.c
index cbd885a3c5..dca160f523 100644
--- a/stdlib/canonicalize.c
+++ b/stdlib/canonicalize.c
@@ -25,9 +25,56 @@
 #include <errno.h>
 #include <stddef.h>
 
+#include <scratch_buffer.h>
 #include <eloop-threshold.h>
 #include <shlib-compat.h>
 
+#ifndef PATH_MAX
+# ifdef MAXPATHLEN
+#  define PATH_MAX MAXPATHLEN
+# else
+#  define PATH_MAX 1024
+# endif
+#endif
+
+static bool
+realpath_readlink (const char *rpath, const char *end, size_t path_max,
+		   size_t st_size, struct scratch_buffer *extra_buf)
+{
+  bool r = false;
+
+  struct scratch_buffer buf;
+  scratch_buffer_init (&buf);
+  /* Add the terminating null byte.  */
+  if (!scratch_buffer_set_array_size (&buf, st_size + 1,  sizeof (char)))
+    return false;
+
+  ssize_t n = __readlink (rpath, buf.data, buf.length - 1);
+  if (n < 0)
+    goto out;
+  ((char *) buf.data)[n] = '\0';
+
+  size_t len = strlen (end);
+  if (path_max - n <= len)
+    {
+      __set_errno (ENAMETOOLONG);
+      goto out;
+    }
+
+  if (!scratch_buffer_set_array_size (extra_buf, n + len + 1, sizeof (char)))
+    goto out;
+
+  /* Careful here, end may be a pointer into extra_buf... */
+  memmove ((char *) extra_buf->data + n, end, len + 1);
+  memcpy (extra_buf->data, buf.data, n);
+
+  r = true;
+
+out:
+  scratch_buffer_free (&buf);
+  return r;
+}
+
 /* Return the canonical absolute name of file NAME.  A canonical name
    does not contain any `.', `..' components nor any repeated path
    separators ('/') or symlinks.  All path components must exist.  If
@@ -42,10 +89,13 @@
 char *
 __realpath (const char *name, char *resolved)
 {
-  char *rpath, *dest, *extra_buf = NULL;
+  char *rpath, *dest;
   const char *start, *end, *rpath_limit;
-  long int path_max;
+  const size_t path_max = PATH_MAX;
   int num_links = 0;
+  struct scratch_buffer extra_buf;
+
+  scratch_buffer_init (&extra_buf);
 
   if (name == NULL)
     {
@@ -65,14 +115,6 @@ __realpath (const char *name, char *resolved)
       return NULL;
     }
 
-#ifdef PATH_MAX
-  path_max = PATH_MAX;
-#else
-  path_max = __pathconf (name, _PC_PATH_MAX);
-  if (path_max <= 0)
-    path_max = 1024;
-#endif
-
   if (resolved == NULL)
     {
       rpath = malloc (path_max);
@@ -101,7 +143,6 @@ __realpath (const char *name, char *resolved)
   for (start = end = name; *start; start = end)
     {
       struct stat64 st;
-      int n;
 
       /* Skip sequence of multiple path-separators.  */
       while (*start == '/')
@@ -163,35 +204,19 @@ __realpath (const char *name, char *resolved)
 
 	  if (S_ISLNK (st.st_mode))
 	    {
-	      char *buf = __alloca (path_max);
-	      size_t len;
-
 	      if (++num_links > __eloop_threshold ())
 		{
 		  __set_errno (ELOOP);
 		  goto error;
 		}
 
-	      n = __readlink (rpath, buf, path_max - 1);
-	      if (n < 0)
+	      if (! realpath_readlink (rpath, end, path_max, st.st_size,
+				       &extra_buf))
 		goto error;
-	      buf[n] = '\0';
-
-	      if (!extra_buf)
-		extra_buf = __alloca (path_max);
 
-	      len = strlen (end);
-	      if (path_max - n <= len)
-		{
-		  __set_errno (ENAMETOOLONG);
-		  goto error;
-		}
-
-	      /* Careful here, end may be a pointer into extra_buf... */
-	      memmove (&extra_buf[n], end, len + 1);
-	      name = end = memcpy (extra_buf, buf, n);
+	      name = end = extra_buf.data;
 
-	      if (buf[0] == '/')
+	      if (((char *)extra_buf.data)[0] == '/')
 		dest = rpath + 1;	/* It's an absolute symlink */
 	      else
 		/* Back up to previous component, ignore if at root already: */
@@ -209,6 +234,8 @@ __realpath (const char *name, char *resolved)
     --dest;
   *dest = '\0';
 
+  scratch_buffer_free (&extra_buf);
+
   assert (resolved == NULL || resolved == rpath);
   return rpath;
 
@@ -216,6 +243,7 @@ error:
   assert (resolved == NULL || resolved == rpath);
   if (resolved == NULL)
     free (rpath);
+  scratch_buffer_free (&extra_buf);
   return NULL;
 }
 libc_hidden_def (__realpath)

  reply	other threads:[~2020-08-12 23:04 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 20:48 Adhemerval Zanella
2020-08-10 20:48 ` [PATCH 2/3] stdlib: Enforce PATH_MAX on allocated realpath buffer Adhemerval Zanella
2020-08-11  8:26   ` Florian Weimer
2020-08-11  9:54     ` Andreas Schwab
2020-08-11 10:24       ` Florian Weimer
2020-08-11 15:05         ` Adhemerval Zanella
2020-08-11 15:37           ` Paul Eggert
2020-08-11 18:29           ` Florian Weimer
2020-08-11  9:48   ` Andreas Schwab
2020-08-10 20:48 ` [PATCH 3/3] linux: Optimize realpath stack usage Adhemerval Zanella
2020-08-10 21:25   ` Paul Eggert
2020-08-11 14:14     ` Adhemerval Zanella
2020-08-11 15:18       ` Adhemerval Zanella
2020-08-11 15:52       ` Paul Eggert
2020-08-11 19:01         ` Adhemerval Zanella
2020-08-11 16:46     ` Andreas Schwab
2020-08-17 14:00       ` Dmitry V. Levin
2020-08-17 15:13         ` Andreas Schwab
2020-08-17 16:17           ` Dmitry V. Levin
2020-08-11  9:46   ` Andreas Schwab
2020-08-11  0:32 ` [PATCH 1/3] stdlib: Use fixed buffer size for realpath (BZ #26241) Matt Turner
2020-08-11  3:00 ` Xiaoming Ni
2020-08-11 14:57   ` Adhemerval Zanella
2020-08-12  1:38     ` Xiaoming Ni
2020-08-12 23:04       ` Adhemerval Zanella [this message]
2020-08-13 20:29         ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb51b9f2-ad44-69de-e987-3754477af93b@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=libc-alpha@sourceware.org \
    --cc=nixiaoming@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).