public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Paul Eggert <eggert@cs.ucla.edu>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: bug-gnulib@gnu.org, libc-alpha@sourceware.org
Subject: Re: [PATCH v3 4/6] stdlib: Sync canonicalize with gnulib [BZ #10635] [BZ #26592] [BZ #26341] [BZ #24970]
Date: Fri, 1 Jan 2021 16:04:02 -0800	[thread overview]
Message-ID: <275283e0-70ee-5ea4-e63d-d0f1d1393667@cs.ucla.edu> (raw)
In-Reply-To: <502b6d2d-1139-ca9d-14cf-00082adc915e@linaro.org>

[-- Attachment #1: Type: text/plain, Size: 2898 bytes --]

On 12/30/20 5:10 AM, Adhemerval Zanella wrote:

>> it is just really
>> a small optimization that adds code complexity on a somewhat convoluted
>> code.

The code is indeed simpler without the NARROW_ADDRESSES optimization, so 
I removed that optimization by installing the attached patch into Gnulib.

>> For ENAMETOOLONG, I think this is the right error code: it enforces
>> that we do not support internal objects longer that PTRDIFF_MAX.

This sounds backwards, as the code returns ENOMEM every other place it 
tries to create an internal object longer than PTRDIFF_MAX - these 
ENOMEM checks are in the malloc calls invoked by scratch_buffer_grow and 
scratch_buffer_grow_preserve. It would be odd for canonicalize_file_name 
to return ENAMETOOLONG for this one particular way of creating a 
too-large object, while at the same time it returns ENOMEM for all the 
other ways.

Besides, ENAMETOOLONG is the POSIX error code for exceeding NAME_MAX or 
PATH_MAX, which is not what is happening here.

In Gnulib and other GNU apps we've long used the tradition that ENOMEM 
means you've run out of memory, regardless of whether it's because your 
heap or your address space is too small. This is a good tradition and 
it'd be good to use it here too.

>> I think it should be a fair assumption to make it on internal code, such
>> as realpath

Yes, staying less than PTRDIFF_MAX is a vital assumption on internal 
objects. I'd go even further and say it's important for user-supplied 
objects, too, as so much code relies on pointer subtraction and we can't 
realistically prohibit that within glibc.

> (this is another reason why I think NARROW_ADDRESSES is not 
> necessary).

Unfortunately, if we merely assume every object has at most PTRDIFF_MAX 
bytes, we still must check for overflow when adding the sizes of two 
objects. The NARROW_ADDRESSES optimization would have let us avoid that 
unnecessary check on 64-bit machines.

> And your fix (from 93e0186d4) does not really solve the issue, since
> now that len is a size_t the overflow check won't catch the potentially
> allocation larger than PTRDIFF_MAX (the realpath will still fail with
> ENOMEM though).

Sure, which means the code is doing the right thing: it's failing with 
ENOMEM because it ran out of memory. There is no need for an extra 
PTRDIFF_MAX check in canonicalize.c if malloc (via scratch_buffer_grow) 
already does the check.
> Wouldn't the below be simpler?
> 
>                size_t len = strlen (end);
>                if (len > IDX_MAX || INT_ADD_OVERFLOW ((idx_t) len, n))
>                  {
>                    __set_errno (ENAMETOOLONG);
>                    goto error_nomem;
>                  }

It's not simpler than the attached Gnulib patch, because it contains an 
unnecessary comparison to IDX_MAX and an unnecessary cast to idx_t.

[-- Attachment #2: 0001-canonicalize-remove-NARROW_ADDRESSES-optimization.patch --]
[-- Type: text/x-patch, Size: 3184 bytes --]

From 8f6b9b66be6672bed1045c27e606dd9fcedcf022 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 1 Jan 2021 15:54:43 -0800
Subject: [PATCH] canonicalize: remove NARROW_ADDRESSES optimization

* lib/canonicalize-lgpl.c, lib/canonicalize.c (NARROW_ADDRESSES):
Remove, and remove all uses, as the optimization is arguably not
worth the extra complexity.  Suggested by Adhemerval Zanella in:
https://sourceware.org/pipermail/libc-alpha/2020-December/121203.html
---
 ChangeLog               | 8 ++++++++
 lib/canonicalize-lgpl.c | 6 +-----
 lib/canonicalize.c      | 6 +-----
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 2d498a5e9..fc45e1176 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2021-01-01  Paul Eggert  <eggert@cs.ucla.edu>
+
+	canonicalize: remove NARROW_ADDRESSES optimization
+	* lib/canonicalize-lgpl.c, lib/canonicalize.c (NARROW_ADDRESSES):
+	Remove, and remove all uses, as the optimization is arguably not
+	worth the extra complexity.  Suggested by Adhemerval Zanella in:
+	https://sourceware.org/pipermail/libc-alpha/2020-December/121203.html
+
 2021-01-01  Bruno Haible  <bruno@clisp.org>
 
 	stddef: Try harder to get max_align_t defined on OpenBSD.
diff --git a/lib/canonicalize-lgpl.c b/lib/canonicalize-lgpl.c
index 560e24288..698f9ede2 100644
--- a/lib/canonicalize-lgpl.c
+++ b/lib/canonicalize-lgpl.c
@@ -85,10 +85,6 @@
 # define IF_LINT(Code) /* empty */
 #endif
 
-/* True if adding two valid object sizes might overflow idx_t.
-   As a practical matter, this cannot happen on 64-bit machines.  */
-enum { NARROW_ADDRESSES = IDX_MAX >> 31 >> 31 == 0 };
-
 #ifndef DOUBLE_SLASH_IS_DISTINCT_ROOT
 # define DOUBLE_SLASH_IS_DISTINCT_ROOT false
 #endif
@@ -343,7 +339,7 @@ realpath_stk (const char *name, char *resolved,
               if (end_in_extra_buffer)
                 end_idx = end - extra_buf;
               size_t len = strlen (end);
-              if (NARROW_ADDRESSES && INT_ADD_OVERFLOW (len, n))
+              if (INT_ADD_OVERFLOW (len, n))
                 {
                   __set_errno (ENOMEM);
                   goto error_nomem;
diff --git a/lib/canonicalize.c b/lib/canonicalize.c
index cc32260a8..3a1c8098b 100644
--- a/lib/canonicalize.c
+++ b/lib/canonicalize.c
@@ -42,10 +42,6 @@
 # define IF_LINT(Code) /* empty */
 #endif
 
-/* True if adding two valid object sizes might overflow idx_t.
-   As a practical matter, this cannot happen on 64-bit machines.  */
-enum { NARROW_ADDRESSES = IDX_MAX >> 31 >> 31 == 0 };
-
 #ifndef DOUBLE_SLASH_IS_DISTINCT_ROOT
 # define DOUBLE_SLASH_IS_DISTINCT_ROOT false
 #endif
@@ -393,7 +389,7 @@ canonicalize_filename_mode_stk (const char *name, canonicalize_mode_t can_mode,
               if (end_in_extra_buffer)
                 end_idx = end - extra_buf;
               size_t len = strlen (end);
-              if (NARROW_ADDRESSES && INT_ADD_OVERFLOW (len, n))
+              if (INT_ADD_OVERFLOW (len, n))
                 xalloc_die ();
               while (extra_buffer.length <= len + n)
                 {
-- 
2.27.0


  reply	other threads:[~2021-01-02  0:04 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-29 19:34 [PATCH v3 0/6] Multiple fixes to realpath Adhemerval Zanella
2020-12-29 19:34 ` [PATCH v3 1/6] Import idx.h from gnulib Adhemerval Zanella
2020-12-29 19:34 ` [PATCH v3 2/6] Import filename.h " Adhemerval Zanella
2020-12-29 19:34 ` [PATCH v3 3/6] malloc: Add scratch_buffer_dupfree Adhemerval Zanella
2020-12-29 19:34 ` [PATCH v3 4/6] stdlib: Sync canonicalize with gnulib [BZ #10635] [BZ #26592] [BZ #26341] [BZ #24970] Adhemerval Zanella
2020-12-30  1:21   ` Paul Eggert
2020-12-30  3:39     ` Paul Eggert
2020-12-30  6:19       ` Paul Eggert
2020-12-30 12:34     ` Adhemerval Zanella
2020-12-30 13:10       ` Adhemerval Zanella
2021-01-02  0:04         ` Paul Eggert [this message]
2021-01-04 12:52           ` Adhemerval Zanella
2021-01-09  1:24             ` Paul Eggert
2020-12-29 19:34 ` [PATCH v3 5/6] support: Add support_small_thread_stack_size Adhemerval Zanella
2020-12-30  8:54   ` Florian Weimer
2020-12-29 19:34 ` [PATCH v3 6/6] stdlib: Add testcase fro BZ #26241 Adhemerval Zanella
2021-01-20  4:33   ` DJ Delorie
2021-01-20 14:13     ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=275283e0-70ee-5ea4-e63d-d0f1d1393667@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=adhemerval.zanella@linaro.org \
    --cc=bug-gnulib@gnu.org \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).