[RFC] _FORTIFY_SOURCE strictness

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [RFC] _FORTIFY_SOURCE strictness
@ 2022-04-07  6:26 Siddhesh Poyarekar
  2022-04-07 10:16 ` Andreas Schwab
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-04-07  6:26 UTC (permalink / raw)
  To: libc-alpha
  Cc: Adhemerval Zanella, Andreas Schwab, Carlos O'Donell,
	Florian Weimer, Jakub Jelinek, Martin Liška

Hi,

_FORTIFY_SOURCE levels up to 2 rely on static object size computation to 
make decisions at compile time to either call a fortified 
bounds-checking variant or a regular variant of a function.  Because the 
size estimates were static and often upper limit estimates, the bounds 
were either optimistic or not present at all.  This made aggressive 
bounds checks such as those in __snprintf_chk[1] or __wcrtomb_chk[2] 
safe enough as to have low false positives (where there's no actual 
buffer overflow but we have defined stricter requirements in the 
fortified functions) and hence pass through uncontested.

This situation has changed with _FORTIFY_SOURCE=3, where it is now 
possible to get non-constant expressions for object sizes, providing not 
just more precise estimates, but also greater coverage, which appears to 
have increased the possibility of such false positives.

This is not limited to the two known examples either; __strncpy_chk for 
example will crash if n is greater than the destination buffer size and 
is similarly prone to such false positives.  One could envision a 
situation where an strncpy call is deeply nested and through compiler 
advances and attribute annotations, the callsite now gets precise size 
expressions for the call and not just an upper limit estimate or a 
(size_t)-1.

Of course, good defensive programming practice means that one always 
pass n that is within limits of both source and destination bounds in 
strncpy but is a runtime check the right place to enforce good practices 
as opposed to only crashing on actual overflows?

There appear to be two alternatives out of this situations and I wanted 
to build consensus around one of these or even a third alternative that 
I hadn't thought of.  I of course volunteer to work on the approach we 
have consensus on:

Option 1: Do nothing

We could take the stand that it is what it is and that fortification 
checks are necessarily stricter than standards allowances to enforce 
better application programming practice.  In this case I propose we 
improve the overflow messages to clearly indicate that we're enforcing a 
stricter limitation than the standard requires and not give the 
incorrect impression that we _detected_ an overflow.

The downside of this approach is the possibility that some applications 
don't fortify beyond level 2, insisting that their usage is safe enough. 
  At the moment it seems like a reasonably small set of applications are 
running into these, so maybe it's OK?

Option 2: Improve checking accuracy at _FORTIFY_SOURCE=3

This would involve making the checking functions smarter, at level 3 and 
abort only on actual overflows.  This could have a small runtime 
overhead in cases like __wcrtomb_chk but a larger one for cases like 
__snprintf_chk.  It is also a more invasive change since it will involve 
changing all implementations.  This essentially is similar to 
sanitizer-like functionality.

The downside of this approach is the possibility that some applications 
don't fortify beyond level 2 because the performance overhead is 
unacceptable.

Thoughts?  Maybe an Option 3 that's less worse than the above two options?

Thanks,
Siddhesh

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=28989
[2] https://lists.gnu.org/archive/html/bug-ncurses/2022-04/msg00002.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] _FORTIFY_SOURCE strictness
  2022-04-07  6:26 [RFC] _FORTIFY_SOURCE strictness Siddhesh Poyarekar
@ 2022-04-07 10:16 ` Andreas Schwab
  2022-04-08  3:24   ` Siddhesh Poyarekar
  2022-04-08  2:26 ` Paul Eggert
  2022-04-08  5:37 ` Florian Weimer
  2 siblings, 1 reply; 38+ messages in thread
From: Andreas Schwab @ 2022-04-07 10:16 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: libc-alpha, Adhemerval Zanella, Carlos O'Donell,
	Florian Weimer, Jakub Jelinek, Martin Liška

On Apr 07 2022, Siddhesh Poyarekar wrote:

> The downside of this approach is the possibility that some applications
> don't fortify beyond level 2, insisting that their usage is safe enough.

The problem with this argument is that what is safe enough now, may be
unsafe later due to an unrelated change elsewhere, or an attacker
injecting some unforeseen data.  It is generally better to be safer in
the first place, because aborting deep inside the call chain is a risk
in itself, even if it prevented an acute undefined behaviour from doing
bad side effects.  By checking bounds early better error recovery is
possible in general.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] _FORTIFY_SOURCE strictness
  2022-04-07  6:26 [RFC] _FORTIFY_SOURCE strictness Siddhesh Poyarekar
  2022-04-07 10:16 ` Andreas Schwab
@ 2022-04-08  2:26 ` Paul Eggert
  2022-04-08  3:32   ` Siddhesh Poyarekar
  2022-04-08  5:37 ` Florian Weimer
  2 siblings, 1 reply; 38+ messages in thread
From: Paul Eggert @ 2022-04-08  2:26 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Florian Weimer, Jakub Jelinek, Andreas Schwab, libc-alpha

On 4/6/22 23:26, Siddhesh Poyarekar wrote:

> Thoughts?  Maybe an Option 3 that's less worse than the above two options?

Perhaps we could use Option 1 (do nothing) for functions like snprintf, 
and Option 2 (abort only on actual overflows) for functions like strncpy.

For snprintf Option 1 is arguably allowed by the C standard, which says 
"If a function argument is described as being an array, the pointer 
actually passed to the function shall have a value such that all address 
computations and accesses to objects (that would be valid if the pointer 
did point to the first element of such an array) are in fact valid." In 
other words, if you call snprintf (a, n, ...) the implementation is 
allowed compute &a[0] through &a[n] even if it doesn't store into every 
byte in 'a' - and this means Option 1 is allowed.

For strncpy the situation is murkier and arguably something like Option 
2 would be needed, due to the funky way that the standard words the 
strncpy spec. However, this is not that important, since strncpy is (or 
at least should be) rarely used nowadays, so it's not that 
performance-relevant.

If we're lucky, most of the affected functions are more like snprintf 
than like strcpy, which means we can use Option 1 (do nothing) for most 
of the affected functions.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] _FORTIFY_SOURCE strictness
  2022-04-07 10:16 ` Andreas Schwab
@ 2022-04-08  3:24   ` Siddhesh Poyarekar
  0 siblings, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-04-08  3:24 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: libc-alpha, Adhemerval Zanella, Carlos O'Donell,
	Florian Weimer, Jakub Jelinek, Martin Liška

On 07/04/2022 15:46, Andreas Schwab wrote:
> On Apr 07 2022, Siddhesh Poyarekar wrote:
> 
>> The downside of this approach is the possibility that some applications
>> don't fortify beyond level 2, insisting that their usage is safe enough.
> 
> The problem with this argument is that what is safe enough now, may be
> unsafe later due to an unrelated change elsewhere, or an attacker
> injecting some unforeseen data.  It is generally better to be safer in
> the first place, because aborting deep inside the call chain is a risk
> in itself, even if it prevented an acute undefined behaviour from doing
> bad side effects.  By checking bounds early better error recovery is
> possible in general.
> 

That's a fair point.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] _FORTIFY_SOURCE strictness
  2022-04-08  2:26 ` Paul Eggert
@ 2022-04-08  3:32   ` Siddhesh Poyarekar
  0 siblings, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-04-08  3:32 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Florian Weimer, Jakub Jelinek, Andreas Schwab, libc-alpha

On 08/04/2022 07:56, Paul Eggert wrote:
> On 4/6/22 23:26, Siddhesh Poyarekar wrote:
> 
>> Thoughts?  Maybe an Option 3 that's less worse than the above two 
>> options?
> 
> Perhaps we could use Option 1 (do nothing) for functions like snprintf, 
> and Option 2 (abort only on actual overflows) for functions like strncpy.

I suppose for __strncpy_chk it will be about as expensive as 
__strcpy_chk since the latter already does a strlen check anyway.  We 
could, I suppose even limit it to __FORTIFY_LEVEL > 2 to limit the 
performance impact to levels where we already warn about potential 
performance impact.

> For snprintf Option 1 is arguably allowed by the C standard, which says 
> "If a function argument is described as being an array, the pointer 
> actually passed to the function shall have a value such that all address 
> computations and accesses to objects (that would be valid if the pointer 
> did point to the first element of such an array) are in fact valid." In 
> other words, if you call snprintf (a, n, ...) the implementation is 
> allowed compute &a[0] through &a[n] even if it doesn't store into every 
> byte in 'a' - and this means Option 1 is allowed.
> 
> For strncpy the situation is murkier and arguably something like Option 
> 2 would be needed, due to the funky way that the standard words the 
> strncpy spec. However, this is not that important, since strncpy is (or 
> at least should be) rarely used nowadays, so it's not that 
> performance-relevant.

I remember someone telling me about their pet project of finding 
frequencies of libc function use in distributions.  Let me see if I can 
find that and see where strncpy stands.

> If we're lucky, most of the affected functions are more like snprintf 
> than like strcpy, which means we can use Option 1 (do nothing) for most 
> of the affected functions.

How would you read wcrtomb[1]?  ISTM that it falls into the strncpy 
category, i.e. the fortified behaviour seems stricter than what the 
standards require.  I tried building an argument to the contrary[2] but 
even I can't stand behind it with full conviction :)

Thanks,
Siddhesh

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/wcrtomb.html#
[2] https://lists.gnu.org/archive/html/bug-ncurses/2022-04/msg00004.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] _FORTIFY_SOURCE strictness
  2022-04-07  6:26 [RFC] _FORTIFY_SOURCE strictness Siddhesh Poyarekar
  2022-04-07 10:16 ` Andreas Schwab
  2022-04-08  2:26 ` Paul Eggert
@ 2022-04-08  5:37 ` Florian Weimer
  2022-04-08  6:02   ` Siddhesh Poyarekar
  2 siblings, 1 reply; 38+ messages in thread
From: Florian Weimer @ 2022-04-08  5:37 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: libc-alpha, Adhemerval Zanella, Andreas Schwab,
	Carlos O'Donell, Jakub Jelinek, Martin Liška

* Siddhesh Poyarekar:

> This is not limited to the two known examples either; __strncpy_chk
> for example will crash if n is greater than the destination buffer
> size and is similarly prone to such false positives.  One could
> envision a situation where an strncpy call is deeply nested and
> through compiler advances and attribute annotations, the callsite now
> gets precise size expressions for the call and not just an upper limit
> estimate or a (size_t)-1.

Hmm.  __strncpy_chk always fills the destination buffer, so the only
thing we can do here is to alias it to strncpy?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] _FORTIFY_SOURCE strictness
  2022-04-08  5:37 ` Florian Weimer
@ 2022-04-08  6:02   ` Siddhesh Poyarekar
  2022-04-08 21:07     ` Paul Eggert
  0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-04-08  6:02 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Adhemerval Zanella, Andreas Schwab,
	Carlos O'Donell, Jakub Jelinek, Martin Liška

On 08/04/2022 11:07, Florian Weimer wrote:
> * Siddhesh Poyarekar:
> 
>> This is not limited to the two known examples either; __strncpy_chk
>> for example will crash if n is greater than the destination buffer
>> size and is similarly prone to such false positives.  One could
>> envision a situation where an strncpy call is deeply nested and
>> through compiler advances and attribute annotations, the callsite now
>> gets precise size expressions for the call and not just an upper limit
>> estimate or a (size_t)-1.
> 
> Hmm.  __strncpy_chk always fills the destination buffer, so the only
> thing we can do here is to alias it to strncpy?

Hmm, I think I conflated it with something other str* function.  You're 
right, strncpy probably doesn't fall into this category.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] _FORTIFY_SOURCE strictness
  2022-04-08  6:02   ` Siddhesh Poyarekar
@ 2022-04-08 21:07     ` Paul Eggert
  2022-04-11  8:02       ` Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Paul Eggert @ 2022-04-08 21:07 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Jakub Jelinek, libc-alpha, Andreas Schwab, Florian Weimer

On 4/7/22 23:02, Siddhesh Poyarekar wrote:
> Hmm, I think I conflated it with something other str* function.  You're 
> right, strncpy probably doesn't fall into this category.

Ouch, I made the same mistake.

As for wcrtomb, unfortunately the standard's wording appears to allow 
you to pass an output buffer smaller than MB_CUR_MAX if you know that 
the multibyte character will fit into the smaller buffer. So I guess 
this is an example of a function where __FORTIFY_LEVEL > 2 doesn't 
conform to the standard.

I don't know whether the standard's authors intended this.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] _FORTIFY_SOURCE strictness
  2022-04-08 21:07     ` Paul Eggert
@ 2022-04-11  8:02       ` Siddhesh Poyarekar
  2022-05-05 18:43         ` [PATCH 0/2] More compliant wcrtomb Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-04-11  8:02 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Jakub Jelinek, libc-alpha, Andreas Schwab, Florian Weimer, Thomas Dickey

On 09/04/2022 02:37, Paul Eggert wrote:
> On 4/7/22 23:02, Siddhesh Poyarekar wrote:
>> Hmm, I think I conflated it with something other str* function.  
>> You're right, strncpy probably doesn't fall into this category.
> 
> Ouch, I made the same mistake.
> 
> As for wcrtomb, unfortunately the standard's wording appears to allow 
> you to pass an output buffer smaller than MB_CUR_MAX if you know that 
> the multibyte character will fit into the smaller buffer. So I guess 
> this is an example of a function where __FORTIFY_LEVEL > 2 doesn't 
> conform to the standard.
> 
> I don't know whether the standard's authors intended this.
> 

Thomas (in cc, he maintains ncurses) found that the manual documents[1] 
the fact that the glibc implementation of wcrtomb assumes the 
destination buffer to have at least MB_CUR_MAX bytes, so this looks more 
like a situation where we *deliberately* deviate from the standard.

If we decide to comply with the standard now we would incur an 
additional copy from an internal buffer to the destination with in 
addition to wrappers to pass the object size from the checking variant 
whenever available.  The question then is whether that's the direction 
we want to take in glibc.

Siddhesh

[1] 
https://www.gnu.org/software/libc/manual/html_node/Converting-a-Character.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 0/2] More compliant wcrtomb
  2022-04-11  8:02       ` Siddhesh Poyarekar
@ 2022-05-05 18:43         ` Siddhesh Poyarekar
  2022-05-05 18:43           ` [PATCH 1/2] benchtests: Add wcrtomb microbenchmark Siddhesh Poyarekar
  2022-05-05 18:43           ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar
  0 siblings, 2 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-05 18:43 UTC (permalink / raw)
  To: libc-alpha; +Cc: fweimer, eggert, dickey, jakub, schwab

This two part series removes the MB_CUR_MAX limitation from wcrtomb to
make it comply with POSIX.  The full rationale for doing so is in 2/2 to
ensure that it remains in the git commit log.

Siddhesh Poyarekar (2):
  benchtests: Add wcrtomb microbenchmark
  wcrtomb: Make behavior POSIX compliant

 benchtests/Makefile        |   1 +
 benchtests/bench-wcrtomb.c | 140 +++++++++++++++++++++++++++++++++++++
 debug/tst-fortify.c        |   7 +-
 debug/wcrtomb_chk.c        |   7 +-
 include/wchar.h            |   4 ++
 manual/charset.texi        |  10 +--
 wcsmbs/wcrtomb.c           |  33 ++++++---
 7 files changed, 182 insertions(+), 20 deletions(-)
 create mode 100644 benchtests/bench-wcrtomb.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/2] benchtests: Add wcrtomb microbenchmark
  2022-05-05 18:43         ` [PATCH 0/2] More compliant wcrtomb Siddhesh Poyarekar
@ 2022-05-05 18:43           ` Siddhesh Poyarekar
  2022-05-06  9:10             ` Florian Weimer
  2022-05-06 12:50             ` [PATCH 1/2] " Adhemerval Zanella
  2022-05-05 18:43           ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar
  1 sibling, 2 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-05 18:43 UTC (permalink / raw)
  To: libc-alpha; +Cc: fweimer, eggert, dickey, jakub, schwab

Add a simple benchmark that measures wcrtomb performance with various
locales with 1-4 byte characters.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
---
 benchtests/Makefile        |   1 +
 benchtests/bench-wcrtomb.c | 140 +++++++++++++++++++++++++++++++++++++
 2 files changed, 141 insertions(+)
 create mode 100644 benchtests/bench-wcrtomb.c

diff --git a/benchtests/Makefile b/benchtests/Makefile
index 149d87e22e..de9de5cf58 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -171,6 +171,7 @@ ifeq (no,$(cross-compiling))
 wcsmbs-benchset := \
   wcpcpy \
   wcpncpy \
+  wcrtomb \
   wcscat \
   wcschr \
   wcschrnul \
diff --git a/benchtests/bench-wcrtomb.c b/benchtests/bench-wcrtomb.c
new file mode 100644
index 0000000000..6cef69cdbf
--- /dev/null
+++ b/benchtests/bench-wcrtomb.c
@@ -0,0 +1,140 @@
+/* Measure wcrtomb function.
+   Copyright The GNU Toolchain Authors.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <limits.h>
+#include <locale.h>
+#include <string.h>
+#include <wchar.h>
+
+#include "bench-timing.h"
+#include "json-lib.h"
+
+#define NITERS 100000
+
+struct test_inputs
+{
+  const char *locale;
+  const wchar_t *input_chars;
+};
+
+/* The inputs represent different types of characters, e.g. RTL, 1 byte, 2
+   byte, 3 byte and 4 byte chars.  The exact number of inputs per locale
+   doesn't really matter because we're not looking to compare performance
+   between locales.  */
+struct test_inputs inputs[] =
+{
+  /* RTL.  */
+  {"ar_SA.UTF-8",
+   L",-.،؟ـًُّ٠٢٣٤ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي"},
+
+  /* Various mixes of 1 and 2 byte chars.  */
+  {"cs_CZ.UTF-8",
+   L",.aAábcCčdDďeEéÉěĚfFghHiIíJlLmMnNňŇoóÓpPqQrřsSšŠTťuUúÚůŮvVWxyýz"},
+
+  {"el_GR.UTF-8",
+   L",.αΑβγδΔεΕζηΗθΘιΙκΚλμΜνΝξοΟπΠρΡσΣςτυΥφΦχψω"},
+
+  {"en_GB.UTF-8",
+   L",.aAāĀæÆǽǣǢbcCċdDðÐeEēĒfFgGġhHiIīĪlLmMnNoōpPqQrsSTuUūŪvVwxyȝzþÞƿǷ"},
+
+  {"fr_FR.UTF-8",
+   L",.aAàâbcCçdDeEéèêëfFghHiIîïjlLmMnNoOôœpPqQrRsSTuUùûvVwxyz"},
+
+  {"he_IL.UTF-8",
+   L"',.ִאבגדהוזחטיכךלמםנןסעפףצץקרשת"},
+
+  /* Devanagari, Japanese, 3-byte chars.  */
+  {"hi_IN.UTF-8",
+   L"(।ं०४५७अआइईउऎएओऔकखगघचछजञटडढणतथदधनपफ़बभमयरलवशषसहािीुूृेैोौ्"},
+
+  {"ja_JP.UTF-8",
+   L".ー0123456789あアいイうウえエおオかカがきキぎくクぐけケげこコごさサざ"},
+
+  /* More mixtures of 1 and 2 byte chars.  */
+  {"ru_RU.UTF-8",
+   L",.аАбвВгдДеЕёЁжЖзЗийЙкКлЛмМнНоОпПрстТуУфФхХЦчшШщъыЫьэЭюЮя"},
+
+  {"sr_RS.UTF-8",
+   L",.aAbcCćčdDđĐeEfgGhHiIlLmMnNoOpPqQrsSšŠTuUvVxyzZž"},
+
+  {"sv_SE.UTF-8",
+   L",.aAåÅäÄæÆbBcCdDeEfFghHiIjlLmMnNoOöÖpPqQrsSTuUvVwxyz"},
+
+  /* Chinese, 3-byte chars  */
+  {"zh_CN.UTF-8",
+   L"一七三下不与世両並中串主乱予事二五亡京人今仕付以任企伎会伸住佐体作使"},
+
+  /* 4-byte chars, because smileys are the universal language and we want to
+     ensure optimal performance with them 😊.  */
+  {"en_US.UTF-8",
+   L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"}
+};
+
+char buf[MB_LEN_MAX];
+size_t ret;
+
+int
+main (int argc, char **argv)
+{
+  const size_t inputs_len = sizeof (inputs) / sizeof (struct test_inputs);
+
+  json_ctx_t json_ctx;
+  json_init (&json_ctx, 0, stdout);
+  json_document_begin (&json_ctx);
+
+  json_attr_string (&json_ctx, "timing_type", TIMING_TYPE);
+  json_attr_object_begin (&json_ctx, "functions");
+  json_attr_object_begin (&json_ctx, "wcrtomb");
+
+  for (size_t i = 0; i < inputs_len; i++)
+    {
+      json_attr_object_begin (&json_ctx, inputs[i].locale);
+      setlocale (LC_ALL, inputs[i].locale);
+
+      timing_t min = 0x7fffffffffffffff, max = 0, total = 0;
+      const wchar_t *inp = inputs[i].input_chars;
+      const size_t len = wcslen (inp);
+      mbstate_t s;                  
+
+      memset (&s, '\0', sizeof (s));
+
+      for (size_t n = 0; n < NITERS; n++)
+	{
+	  timing_t start, end, elapsed;
+
+	  TIMING_NOW (start);
+	  for (size_t j = 0; j < len; j++)
+	    ret = wcrtomb (buf, inp[j], &s);
+	  TIMING_NOW (end);
+	  TIMING_DIFF (elapsed, start, end);
+	  if (min > elapsed)
+	    min = elapsed;
+	  if (max < elapsed)
+	    max = elapsed;
+	  TIMING_ACCUM (total, elapsed);
+	}
+      json_attr_double (&json_ctx, "max", max);
+      json_attr_double (&json_ctx, "min", min);
+      json_attr_double (&json_ctx, "mean", total / NITERS);
+      json_attr_object_end (&json_ctx);
+    }
+
+  json_attr_object_end (&json_ctx);
+  json_attr_object_end (&json_ctx);
+  json_document_end (&json_ctx);
+}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/2] wcrtomb: Make behavior POSIX compliant
  2022-05-05 18:43         ` [PATCH 0/2] More compliant wcrtomb Siddhesh Poyarekar
  2022-05-05 18:43           ` [PATCH 1/2] benchtests: Add wcrtomb microbenchmark Siddhesh Poyarekar
@ 2022-05-05 18:43           ` Siddhesh Poyarekar
  2022-05-06  9:25             ` Paul Eggert
                               ` (2 more replies)
  1 sibling, 3 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-05 18:43 UTC (permalink / raw)
  To: libc-alpha; +Cc: fweimer, eggert, dickey, jakub, schwab

The GNU implementation of wcrtomb assumes that there are at least
MB_CUR_MAX bytes available in the destination buffer passed to wcrtomb
as the first argument.  This is not compatible with the POSIX
definition, which only requires enough space for the input wide
character.

This does not break much in practice because when users supply buffers
smaller than MB_CUR_MAX (e.g. in ncurses), they compute and dynamically
allocate the buffer, which results in enough spare space (thanks to
usable_size in malloc and padding in alloca) that no actual buffer
overflow occurs.  However when the code is built with _FORTIFY_SOURCE,
it runs into the hard check against MB_CUR_MAX in __wcrtomb_chk and
hence fails.  It wasn't evident until now since dynamic allocations
would result in wcrtomb not being fortified but since _FORTIFY_SOURCE=3,
that limitation is gone, resulting in such code failing.

To fix this problem, introduce an internal buffer that is MB_LEN_MAX
long and use that to perform the conversion and then copy the resultant
bytes into the destination buffer.  Also move the fortification check
into the main implementation, which checks the result after conversion
and aborts if the resultant byte count is greater than the destination
buffer size.

One complication is that applications that assume the MB_CUR_MAX
limitation to be gone may not be able to run safely on older glibcs if
they use static destination buffers smaller than MB_CUR_MAX; dynamic
allocations will always have enough spare space that no actual overruns
will occur.  One alternative to fixing this is to bump symbol version to
prevent them from running on older glibcs but that seems too strict a
constraint.  Instead, since these users will only have made this
decision on reading the manual, I have put a note in the manual warning
them about the pitfalls of having static buffers smaller than
MB_CUR_MAX and running them on older glibc.

Benchmarking:

The wcrtomb microbenchmark shows significant increases in maximum
execution time for all locales, ranging from 10x for ar_SA.UTF-8 to
1.5x-2x for nearly everything else.  The mean execution time however saw
practically no impact, with some results even being quicker, indicating
that cache locality has a much bigger role in the overhead.

Given that the additional copy uses a temporary buffer inside wcrtomb,
it's likely that a hot path will end up putting that buffer (which is
responsible for the additional overhead) in a similar place on stack,
giving the necessary cache locality to negate the overhead.  However in
situations where wcrtomb ends up getting called at wildly different
spots on the call stack (or is on different call stacks, e.g. with
threads or different execution contexts) and is still a hotspot, the
performance lag will be visible.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
---
 debug/tst-fortify.c |  7 ++++++-
 debug/wcrtomb_chk.c |  8 ++------
 include/wchar.h     |  4 ++++
 manual/charset.texi | 10 +++++-----
 wcsmbs/wcrtomb.c    | 34 ++++++++++++++++++++++++++--------
 5 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/debug/tst-fortify.c b/debug/tst-fortify.c
index 03c9867714..8e94643bf2 100644
--- a/debug/tst-fortify.c
+++ b/debug/tst-fortify.c
@@ -1478,10 +1478,15 @@ do_test (void)
 	 character which has a multibyte representation which does not
 	 fit.  */
       CHK_FAIL_START
-      char smallbuf[2];
+      char smallbuf[1];
       if (wcrtomb (smallbuf, L'\x100', &s) != 2)
 	FAIL ();
       CHK_FAIL_END
+
+      /* Same input with a large enough buffer and we're good.  */
+      char bigenoughbuf[2];
+      if (wcrtomb (bigenoughbuf, L'\x100', &s) != 2)
+	FAIL ();
 #endif

       wchar_t wenough[10];
diff --git a/debug/wcrtomb_chk.c b/debug/wcrtomb_chk.c
index 8b6d026560..28c3ea0d2d 100644
--- a/debug/wcrtomb_chk.c
+++ b/debug/wcrtomb_chk.c
@@ -1,4 +1,5 @@
 /* Copyright (C) 2005-2022 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.

    The GNU C Library is free software; you can redistribute it and/or
@@ -25,10 +26,5 @@
 size_t
 __wcrtomb_chk (char *s, wchar_t wchar, mbstate_t *ps, size_t buflen)
 {
-  /* We do not have to implement the full wctomb semantics since we
-     know that S cannot be NULL when we come here.  */
-  if (buflen < MB_CUR_MAX)
-    __chk_fail ();
-
-  return __wcrtomb (s, wchar, ps);
+  return __wcrtomb_internal (s, wchar, ps, buflen);
 }
diff --git a/include/wchar.h b/include/wchar.h
index 4267985625..db83297bca 100644
--- a/include/wchar.h
+++ b/include/wchar.h
@@ -172,6 +172,10 @@ libc_hidden_proto (__mbrtowc)
 libc_hidden_proto (__mbrlen)
 extern size_t __wcrtomb (char *__restrict __s, wchar_t __wc,
 			 __mbstate_t *__restrict __ps) attribute_hidden;
+extern size_t __wcrtomb_internal (char *__restrict __s, wchar_t __wc,
+				  __mbstate_t *__restrict __ps,
+				  size_t __s_size)
+     attribute_hidden;
 extern size_t __mbsrtowcs (wchar_t *__restrict __dst,
 			   const char **__restrict __src,
 			   size_t __len, __mbstate_t *__restrict __ps)
diff --git a/manual/charset.texi b/manual/charset.texi
index a9b5cb4a37..67fe5bf3c7 100644
--- a/manual/charset.texi
+++ b/manual/charset.texi
@@ -883,11 +883,11 @@ the string @var{s}.  This includes all bytes representing shift
 sequences.

 One word about the interface of the function: there is no parameter
-specifying the length of the array @var{s}.  Instead the function
-assumes that there are at least @code{MB_CUR_MAX} bytes available since
-this is the maximum length of any byte sequence representing a single
-character.  So the caller has to make sure that there is enough space
-available, otherwise buffer overruns can occur.
+specifying the length of the array @var{s}, so the caller has to make sure
+that there is enough space available, otherwise buffer overruns can occur.
+Also, @theglibc{} versions older than 2.36 assume that @var{s} is at least
+@var{MB_CUR_MAX} bytes long, so programs that need to run on older
+@glibcadj{} versions must comply with this limit.

 @pindex wchar.h
 @code{wcrtomb} was introduced in @w{Amendment 1} to @w{ISO C90} and is
diff --git a/wcsmbs/wcrtomb.c b/wcsmbs/wcrtomb.c
index e17438989f..ad00b43390 100644
--- a/wcsmbs/wcrtomb.c
+++ b/wcsmbs/wcrtomb.c
@@ -1,4 +1,5 @@
 /* Copyright (C) 1996-2022 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.

    The GNU C Library is free software; you can redistribute it and/or
@@ -20,6 +21,7 @@
 #include <errno.h>
 #include <gconv.h>
 #include <stdlib.h>
+#include <string.h>
 #include <wchar.h>
 #include <wcsmbsload.h>

@@ -34,7 +36,7 @@
 static mbstate_t state;

 size_t
-__wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
+__wcrtomb_internal (char *s, wchar_t wc, mbstate_t *ps, size_t s_size)
 {
   char buf[MB_LEN_MAX];
   struct __gconv_step_data data;
@@ -52,14 +54,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
   /* A first special case is if S is NULL.  This means put PS in the
      initial state.  */
   if (s == NULL)
-    {
-      s = buf;
-      wc = L'\0';
-    }
+    wc = L'\0';

   /* Tell where we want to have the result.  */
-  data.__outbuf = (unsigned char *) s;
-  data.__outbufend = (unsigned char *) s + MB_CUR_MAX;
+  data.__outbuf = (unsigned char *) buf;
+  data.__outbufend = (unsigned char *) buf + MB_CUR_MAX;

   /* Get the conversion functions.  */
   fcts = get_gconv_fcts (_NL_CURRENT_DATA (LC_CTYPE));
@@ -101,14 +100,33 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps)

   if (status == __GCONV_OK || status == __GCONV_EMPTY_INPUT
       || status == __GCONV_FULL_OUTPUT)
-    result = data.__outbuf - (unsigned char *) s;
+    result = data.__outbuf - (unsigned char *) buf;
   else
     {
       result = (size_t) -1;
       __set_errno (EILSEQ);
     }

+  if (result != (size_t) -1 && s != NULL)
+    {
+      if (result > s_size)
+	__chk_fail ();
+
+      if (__glibc_unlikely (result > 2))
+	memcpy (s, buf, result);
+      else if (__glibc_unlikely (result > 1))
+	*((uint16_t *) s) = *((uint16_t *) buf);
+      else
+	*s = *buf;
+    }
+
   return result;
 }
+
+size_t
+__wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
+{
+  return __wcrtomb_internal (s, wc, ps, (size_t) -1);
+}
 weak_alias (__wcrtomb, wcrtomb)
 libc_hidden_weak (wcrtomb)
-- 
2.35.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark
  2022-05-05 18:43           ` [PATCH 1/2] benchtests: Add wcrtomb microbenchmark Siddhesh Poyarekar
@ 2022-05-06  9:10             ` Florian Weimer
  2022-05-06 12:49               ` [committed] " Siddhesh Poyarekar
  2022-05-06 12:50             ` [PATCH 1/2] " Adhemerval Zanella
  1 sibling, 1 reply; 38+ messages in thread
From: Florian Weimer @ 2022-05-06  9:10 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha, eggert, dickey, jakub, schwab

* Siddhesh Poyarekar:

> +int
> +main (int argc, char **argv)
> +{
> +  const size_t inputs_len = sizeof (inputs) / sizeof (struct test_inputs);

Can you use array_length?

> +      mbstate_t s;                  

Trailing whitespace on this line.

Rest looks okay.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/2] wcrtomb: Make behavior POSIX compliant
  2022-05-05 18:43           ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar
@ 2022-05-06  9:25             ` Paul Eggert
  2022-05-06 13:40               ` Adhemerval Zanella
  2022-05-06 14:04             ` [PATCH v2] " Siddhesh Poyarekar
  2022-05-12 13:15             ` [PATCH v3] " Siddhesh Poyarekar
  2 siblings, 1 reply; 38+ messages in thread
From: Paul Eggert @ 2022-05-06  9:25 UTC (permalink / raw)
  To: Siddhesh Poyarekar, libc-alpha; +Cc: fweimer, dickey, jakub, schwab

On 5/5/22 11:43, Siddhesh Poyarekar wrote:
> +      else if (__glibc_unlikely (result > 1))
> +	*((uint16_t *) s) = *((uint16_t *) buf);

Shouldn't this be protected by "#if _STRING_ARCH_unaligned"? Also, 
unnecessary parens.

But better yet, just call memcpy as the compiler will figure it out; see 
below.


> -    result = data.__outbuf - (unsigned char *) s;
> +    result = data.__outbuf - (unsigned char *) buf;
>    else
>      {
>        result = (size_t) -1;
>        __set_errno (EILSEQ);
>      }
>  
> +  if (result != (size_t) -1 && s != NULL)

The 'result != (size_t) -1' can be omitted if you move that 'if' into 
the previous if's then-part.


> +  data.__outbufend = (unsigned char *) buf + MB_CUR_MAX;

This'd be a bit faster (and less confusing) if we replace 'MB_CUR_MAX' 
with 'sizeof buf'.


> +      if (__glibc_unlikely (result > 2))
> +	memcpy (s, buf, result);
> +      else if (__glibc_unlikely (result > 1))
> +	*((uint16_t *) s) = *((uint16_t *) buf);
> +      else
> +	*s = *buf;

If the likely path is result == 1, shouldn't that be checked first? 
Something like this:

      if (__glibc_likely (result < 2))
        *s = *buf;
      else if (__glibc_likely (result == 2))
        memcpy (s, buf, result); /* Help the compiler.  */
      else
        memcpy (s, buf, result);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [committed] benchtests: Add wcrtomb microbenchmark
  2022-05-06  9:10             ` Florian Weimer
@ 2022-05-06 12:49               ` Siddhesh Poyarekar
  0 siblings, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-06 12:49 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer

Add a simple benchmark that measures wcrtomb performance with various
locales with 1-4 byte characters.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
---
 benchtests/Makefile        |   1 +
 benchtests/bench-wcrtomb.c | 139 +++++++++++++++++++++++++++++++++++++
 2 files changed, 140 insertions(+)
 create mode 100644 benchtests/bench-wcrtomb.c

diff --git a/benchtests/Makefile b/benchtests/Makefile
index 149d87e22e..de9de5cf58 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -171,6 +171,7 @@ ifeq (no,$(cross-compiling))
 wcsmbs-benchset := \
   wcpcpy \
   wcpncpy \
+  wcrtomb \
   wcscat \
   wcschr \
   wcschrnul \
diff --git a/benchtests/bench-wcrtomb.c b/benchtests/bench-wcrtomb.c
new file mode 100644
index 0000000000..232a7d59de
--- /dev/null
+++ b/benchtests/bench-wcrtomb.c
@@ -0,0 +1,139 @@
+/* Measure wcrtomb function.
+   Copyright The GNU Toolchain Authors.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <array_length.h>
+#include <limits.h>
+#include <locale.h>
+#include <string.h>
+#include <wchar.h>
+
+#include "bench-timing.h"
+#include "json-lib.h"
+
+#define NITERS 100000
+
+struct test_inputs
+{
+  const char *locale;
+  const wchar_t *input_chars;
+};
+
+/* The inputs represent different types of characters, e.g. RTL, 1 byte, 2
+   byte, 3 byte and 4 byte chars.  The exact number of inputs per locale
+   doesn't really matter because we're not looking to compare performance
+   between locales.  */
+struct test_inputs inputs[] =
+{
+  /* RTL.  */
+  {"ar_SA.UTF-8",
+   L",-.،؟ـًُّ٠٢٣٤ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي"},
+
+  /* Various mixes of 1 and 2 byte chars.  */
+  {"cs_CZ.UTF-8",
+   L",.aAábcCčdDďeEéÉěĚfFghHiIíJlLmMnNňŇoóÓpPqQrřsSšŠTťuUúÚůŮvVWxyýz"},
+
+  {"el_GR.UTF-8",
+   L",.αΑβγδΔεΕζηΗθΘιΙκΚλμΜνΝξοΟπΠρΡσΣςτυΥφΦχψω"},
+
+  {"en_GB.UTF-8",
+   L",.aAāĀæÆǽǣǢbcCċdDðÐeEēĒfFgGġhHiIīĪlLmMnNoōpPqQrsSTuUūŪvVwxyȝzþÞƿǷ"},
+
+  {"fr_FR.UTF-8",
+   L",.aAàâbcCçdDeEéèêëfFghHiIîïjlLmMnNoOôœpPqQrRsSTuUùûvVwxyz"},
+
+  {"he_IL.UTF-8",
+   L"',.ִאבגדהוזחטיכךלמםנןסעפףצץקרשת"},
+
+  /* Devanagari, Japanese, 3-byte chars.  */
+  {"hi_IN.UTF-8",
+   L"(।ं०४५७अआइईउऎएओऔकखगघचछजञटडढणतथदधनपफ़बभमयरलवशषसहािीुूृेैोौ्"},
+
+  {"ja_JP.UTF-8",
+   L".ー0123456789あアいイうウえエおオかカがきキぎくクぐけケげこコごさサざ"},
+
+  /* More mixtures of 1 and 2 byte chars.  */
+  {"ru_RU.UTF-8",
+   L",.аАбвВгдДеЕёЁжЖзЗийЙкКлЛмМнНоОпПрстТуУфФхХЦчшШщъыЫьэЭюЮя"},
+
+  {"sr_RS.UTF-8",
+   L",.aAbcCćčdDđĐeEfgGhHiIlLmMnNoOpPqQrsSšŠTuUvVxyzZž"},
+
+  {"sv_SE.UTF-8",
+   L",.aAåÅäÄæÆbBcCdDeEfFghHiIjlLmMnNoOöÖpPqQrsSTuUvVwxyz"},
+
+  /* Chinese, 3-byte chars  */
+  {"zh_CN.UTF-8",
+   L"一七三下不与世両並中串主乱予事二五亡京人今仕付以任企伎会伸住佐体作使"},
+
+  /* 4-byte chars, because smileys are the universal language and we want to
+     ensure optimal performance with them 😊.  */
+  {"en_US.UTF-8",
+   L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"}
+};
+
+char buf[MB_LEN_MAX];
+size_t ret;
+
+int
+main (int argc, char **argv)
+{
+  json_ctx_t json_ctx;
+  json_init (&json_ctx, 0, stdout);
+  json_document_begin (&json_ctx);
+
+  json_attr_string (&json_ctx, "timing_type", TIMING_TYPE);
+  json_attr_object_begin (&json_ctx, "functions");
+  json_attr_object_begin (&json_ctx, "wcrtomb");
+
+  for (size_t i = 0; i < array_length (inputs); i++)
+    {
+      json_attr_object_begin (&json_ctx, inputs[i].locale);
+      setlocale (LC_ALL, inputs[i].locale);
+
+      timing_t min = 0x7fffffffffffffff, max = 0, total = 0;
+      const wchar_t *inp = inputs[i].input_chars;
+      const size_t len = wcslen (inp);
+      mbstate_t s;
+
+      memset (&s, '\0', sizeof (s));
+
+      for (size_t n = 0; n < NITERS; n++)
+	{
+	  timing_t start, end, elapsed;
+
+	  TIMING_NOW (start);
+	  for (size_t j = 0; j < len; j++)
+	    ret = wcrtomb (buf, inp[j], &s);
+	  TIMING_NOW (end);
+	  TIMING_DIFF (elapsed, start, end);
+	  if (min > elapsed)
+	    min = elapsed;
+	  if (max < elapsed)
+	    max = elapsed;
+	  TIMING_ACCUM (total, elapsed);
+	}
+      json_attr_double (&json_ctx, "max", max);
+      json_attr_double (&json_ctx, "min", min);
+      json_attr_double (&json_ctx, "mean", total / NITERS);
+      json_attr_object_end (&json_ctx);
+    }
+
+  json_attr_object_end (&json_ctx);
+  json_attr_object_end (&json_ctx);
+  json_document_end (&json_ctx);
+}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark
  2022-05-05 18:43           ` [PATCH 1/2] benchtests: Add wcrtomb microbenchmark Siddhesh Poyarekar
  2022-05-06  9:10             ` Florian Weimer
@ 2022-05-06 12:50             ` Adhemerval Zanella
  2022-05-06 12:59               ` Siddhesh Poyarekar
  1 sibling, 1 reply; 38+ messages in thread
From: Adhemerval Zanella @ 2022-05-06 12:50 UTC (permalink / raw)
  To: libc-alpha



On 05/05/2022 15:43, Siddhesh Poyarekar via Libc-alpha wrote:
> Add a simple benchmark that measures wcrtomb performance with various
> locales with 1-4 byte characters.
> 
> Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
> ---
>  benchtests/Makefile        |   1 +
>  benchtests/bench-wcrtomb.c | 140 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 141 insertions(+)
>  create mode 100644 benchtests/bench-wcrtomb.c
> 
> diff --git a/benchtests/Makefile b/benchtests/Makefile
> index 149d87e22e..de9de5cf58 100644
> --- a/benchtests/Makefile
> +++ b/benchtests/Makefile
> @@ -171,6 +171,7 @@ ifeq (no,$(cross-compiling))
>  wcsmbs-benchset := \
>    wcpcpy \
>    wcpncpy \
> +  wcrtomb \
>    wcscat \
>    wcschr \
>    wcschrnul \
> diff --git a/benchtests/bench-wcrtomb.c b/benchtests/bench-wcrtomb.c
> new file mode 100644
> index 0000000000..6cef69cdbf
> --- /dev/null
> +++ b/benchtests/bench-wcrtomb.c
> @@ -0,0 +1,140 @@
> +/* Measure wcrtomb function.
> +   Copyright The GNU Toolchain Authors.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <limits.h>
> +#include <locale.h>
> +#include <string.h>
> +#include <wchar.h>
> +
> +#include "bench-timing.h"
> +#include "json-lib.h"
> +
> +#define NITERS 100000
> +
> +struct test_inputs
> +{
> +  const char *locale;
> +  const wchar_t *input_chars;
> +};
> +
> +/* The inputs represent different types of characters, e.g. RTL, 1 byte, 2
> +   byte, 3 byte and 4 byte chars.  The exact number of inputs per locale
> +   doesn't really matter because we're not looking to compare performance
> +   between locales.  */
> +struct test_inputs inputs[] =
> +{
> +  /* RTL.  */
> +  {"ar_SA.UTF-8",
> +   L",-.،؟ـًُّ٠٢٣٤ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي"},
> +
> +  /* Various mixes of 1 and 2 byte chars.  */
> +  {"cs_CZ.UTF-8",
> +   L",.aAábcCčdDďeEéÉěĚfFghHiIíJlLmMnNňŇoóÓpPqQrřsSšŠTťuUúÚůŮvVWxyýz"},
> +
> +  {"el_GR.UTF-8",
> +   L",.αΑβγδΔεΕζηΗθΘιΙκΚλμΜνΝξοΟπΠρΡσΣςτυΥφΦχψω"},
> +
> +  {"en_GB.UTF-8",
> +   L",.aAāĀæÆǽǣǢbcCċdDðÐeEēĒfFgGġhHiIīĪlLmMnNoōpPqQrsSTuUūŪvVwxyȝzþÞƿǷ"},
> +
> +  {"fr_FR.UTF-8",
> +   L",.aAàâbcCçdDeEéèêëfFghHiIîïjlLmMnNoOôœpPqQrRsSTuUùûvVwxyz"},
> +
> +  {"he_IL.UTF-8",
> +   L"',.ִאבגדהוזחטיכךלמםנןסעפףצץקרשת"},
> +
> +  /* Devanagari, Japanese, 3-byte chars.  */
> +  {"hi_IN.UTF-8",
> +   L"(।ं०४५७अआइईउऎएओऔकखगघचछजञटडढणतथदधनपफ़बभमयरलवशषसहािीुूृेैोौ्"},
> +
> +  {"ja_JP.UTF-8",
> +   L".ー0123456789あアいイうウえエおオかカがきキぎくクぐけケげこコごさサざ"},
> +
> +  /* More mixtures of 1 and 2 byte chars.  */
> +  {"ru_RU.UTF-8",
> +   L",.аАбвВгдДеЕёЁжЖзЗийЙкКлЛмМнНоОпПрстТуУфФхХЦчшШщъыЫьэЭюЮя"},
> +
> +  {"sr_RS.UTF-8",
> +   L",.aAbcCćčdDđĐeEfgGhHiIlLmMnNoOpPqQrsSšŠTuUvVxyzZž"},
> +
> +  {"sv_SE.UTF-8",
> +   L",.aAåÅäÄæÆbBcCdDeEfFghHiIjlLmMnNoOöÖpPqQrsSTuUvVwxyz"},
> +
> +  /* Chinese, 3-byte chars  */
> +  {"zh_CN.UTF-8",
> +   L"一七三下不与世両並中串主乱予事二五亡京人今仕付以任企伎会伸住佐体作使"},
> +
> +  /* 4-byte chars, because smileys are the universal language and we want to
> +     ensure optimal performance with them 😊.  */
> +  {"en_US.UTF-8",
> +   L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"}
> +};

Could you use use hexadecimal character escape in tests? Although gcc handle multiple
-fexec-charset, trying to build it with a different compiler usually emits a lot of
warnings.

> +
> +char buf[MB_LEN_MAX];
> +size_t ret;
> +
> +int
> +main (int argc, char **argv)
> +{
> +  const size_t inputs_len = sizeof (inputs) / sizeof (struct test_inputs);
> +
> +  json_ctx_t json_ctx;
> +  json_init (&json_ctx, 0, stdout);
> +  json_document_begin (&json_ctx);
> +
> +  json_attr_string (&json_ctx, "timing_type", TIMING_TYPE);
> +  json_attr_object_begin (&json_ctx, "functions");
> +  json_attr_object_begin (&json_ctx, "wcrtomb");
> +
> +  for (size_t i = 0; i < inputs_len; i++)
> +    {
> +      json_attr_object_begin (&json_ctx, inputs[i].locale);
> +      setlocale (LC_ALL, inputs[i].locale);
> +
> +      timing_t min = 0x7fffffffffffffff, max = 0, total = 0;
> +      const wchar_t *inp = inputs[i].input_chars;
> +      const size_t len = wcslen (inp);
> +      mbstate_t s;                  
> +
> +      memset (&s, '\0', sizeof (s));
> +
> +      for (size_t n = 0; n < NITERS; n++)
> +	{
> +	  timing_t start, end, elapsed;
> +
> +	  TIMING_NOW (start);
> +	  for (size_t j = 0; j < len; j++)
> +	    ret = wcrtomb (buf, inp[j], &s);
> +	  TIMING_NOW (end);
> +	  TIMING_DIFF (elapsed, start, end);
> +	  if (min > elapsed)
> +	    min = elapsed;
> +	  if (max < elapsed)
> +	    max = elapsed;
> +	  TIMING_ACCUM (total, elapsed);
> +	}
> +      json_attr_double (&json_ctx, "max", max);
> +      json_attr_double (&json_ctx, "min", min);
> +      json_attr_double (&json_ctx, "mean", total / NITERS);
> +      json_attr_object_end (&json_ctx);
> +    }
> +
> +  json_attr_object_end (&json_ctx);
> +  json_attr_object_end (&json_ctx);
> +  json_document_end (&json_ctx);
> +}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark
  2022-05-06 12:50             ` [PATCH 1/2] " Adhemerval Zanella
@ 2022-05-06 12:59               ` Siddhesh Poyarekar
  2022-05-06 13:20                 ` Adhemerval Zanella
  0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-06 12:59 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

(Sorry I just committed it before I saw your message)

On 06/05/2022 18:20, Adhemerval Zanella via Libc-alpha wrote:
>> +  /* 4-byte chars, because smileys are the universal language and we want to
>> +     ensure optimal performance with them 😊.  */
>> +  {"en_US.UTF-8",
>> +   L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"}
>> +};
> 
> Could you use use hexadecimal character escape in tests? Although gcc handle multiple
> -fexec-charset, trying to build it with a different compiler usually emits a lot of
> warnings.
> 

Hmm, I'm curious to know which compiler this breaks on; it's straight up 
UTF-8 so should parse just fine, no?

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark
  2022-05-06 12:59               ` Siddhesh Poyarekar
@ 2022-05-06 13:20                 ` Adhemerval Zanella
  2022-05-06 13:26                   ` Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Adhemerval Zanella @ 2022-05-06 13:20 UTC (permalink / raw)
  To: Siddhesh Poyarekar, libc-alpha



On 06/05/2022 09:59, Siddhesh Poyarekar wrote:
> (Sorry I just committed it before I saw your message)
> 
> On 06/05/2022 18:20, Adhemerval Zanella via Libc-alpha wrote:
>>> +  /* 4-byte chars, because smileys are the universal language and we want to
>>> +     ensure optimal performance with them 😊.  */
>>> +  {"en_US.UTF-8",
>>> +   L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"}
>>> +};
>>
>> Could you use use hexadecimal character escape in tests? Although gcc handle multiple
>> -fexec-charset, trying to build it with a different compiler usually emits a lot of
>> warnings.
>>
> 
> Hmm, I'm curious to know which compiler this breaks on; it's straight up UTF-8 so should parse just fine, no?

I does not really breaks, but clang warns about character enconding (I forgot
which was the specific error, I will check it). I have to fix a lot of internal
usage to get a clean make check on clang [1].

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=058bb0d51c3f85041c7cbd14704f72003bdbdee9

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark
  2022-05-06 13:20                 ` Adhemerval Zanella
@ 2022-05-06 13:26                   ` Siddhesh Poyarekar
  2022-05-06 13:36                     ` Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-06 13:26 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

On 06/05/2022 18:50, Adhemerval Zanella wrote:
> 
> 
> On 06/05/2022 09:59, Siddhesh Poyarekar wrote:
>> (Sorry I just committed it before I saw your message)
>>
>> On 06/05/2022 18:20, Adhemerval Zanella via Libc-alpha wrote:
>>>> +  /* 4-byte chars, because smileys are the universal language and we want to
>>>> +     ensure optimal performance with them 😊.  */
>>>> +  {"en_US.UTF-8",
>>>> +   L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"}
>>>> +};
>>>
>>> Could you use use hexadecimal character escape in tests? Although gcc handle multiple
>>> -fexec-charset, trying to build it with a different compiler usually emits a lot of
>>> warnings.
>>>
>>
>> Hmm, I'm curious to know which compiler this breaks on; it's straight up UTF-8 so should parse just fine, no?
> 
> I does not really breaks, but clang warns about character enconding (I forgot
> which was the specific error, I will check it). I have to fix a lot of internal
> usage to get a clean make check on clang [1].
> 
> [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=058bb0d51c3f85041c7cbd14704f72003bdbdee9
> 

Ah, so that's all ISO-8859, it shouldn't warn on utf-8.

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark
  2022-05-06 13:26                   ` Siddhesh Poyarekar
@ 2022-05-06 13:36                     ` Siddhesh Poyarekar
  2022-05-06 13:46                       ` Adhemerval Zanella
  0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-06 13:36 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

On 06/05/2022 18:56, Siddhesh Poyarekar wrote:
> On 06/05/2022 18:50, Adhemerval Zanella wrote:
>>
>>
>> On 06/05/2022 09:59, Siddhesh Poyarekar wrote:
>>> (Sorry I just committed it before I saw your message)
>>>
>>> On 06/05/2022 18:20, Adhemerval Zanella via Libc-alpha wrote:
>>>>> +  /* 4-byte chars, because smileys are the universal language and 
>>>>> we want to
>>>>> +     ensure optimal performance with them 😊.  */
>>>>> +  {"en_US.UTF-8",
>>>>> +   
>>>>> L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"} 
>>>>>
>>>>> +};
>>>>
>>>> Could you use use hexadecimal character escape in tests? Although 
>>>> gcc handle multiple
>>>> -fexec-charset, trying to build it with a different compiler usually 
>>>> emits a lot of
>>>> warnings.
>>>>
>>>
>>> Hmm, I'm curious to know which compiler this breaks on; it's straight 
>>> up UTF-8 so should parse just fine, no?
>>
>> I does not really breaks, but clang warns about character enconding (I 
>> forgot
>> which was the specific error, I will check it). I have to fix a lot of 
>> internal
>> usage to get a clean make check on clang [1].
>>
>> [1] 
>> https://sourceware.org/git/?p=glibc.git;a=commit;h=058bb0d51c3f85041c7cbd14704f72003bdbdee9 
>>
>>
> 
> Ah, so that's all ISO-8859, it shouldn't warn on utf-8.

In any case, if you see warnings on this, please pass it on to me and 
I'll fix it but it seems wrong for clang to warn on this.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/2] wcrtomb: Make behavior POSIX compliant
  2022-05-06  9:25             ` Paul Eggert
@ 2022-05-06 13:40               ` Adhemerval Zanella
  2022-05-06 13:46                 ` Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Adhemerval Zanella @ 2022-05-06 13:40 UTC (permalink / raw)
  To: Paul Eggert, Siddhesh Poyarekar, libc-alpha
  Cc: fweimer, jakub, schwab, dickey



On 06/05/2022 06:25, Paul Eggert wrote:
> On 5/5/22 11:43, Siddhesh Poyarekar wrote:
>> +      else if (__glibc_unlikely (result > 1))
>> +    *((uint16_t *) s) = *((uint16_t *) buf);
> 
> Shouldn't this be protected by "#if _STRING_ARCH_unaligned"? Also, unnecessary parens.

I think this is an aliasing violation and most likely will fault on architectures
that do not support unaligned access (such as sparc and some arm environments).


> 
> But better yet, just call memcpy as the compiler will figure it out; see below.
> 
> 
>> -    result = data.__outbuf - (unsigned char *) s;
>> +    result = data.__outbuf - (unsigned char *) buf;
>>    else
>>      {
>>        result = (size_t) -1;
>>        __set_errno (EILSEQ);
>>      }
>>  
>> +  if (result != (size_t) -1 && s != NULL)
> 
> The 'result != (size_t) -1' can be omitted if you move that 'if' into the previous if's then-part.
> 
> 
>> +  data.__outbufend = (unsigned char *) buf + MB_CUR_MAX;
> 
> This'd be a bit faster (and less confusing) if we replace 'MB_CUR_MAX' with 'sizeof buf'.
> 
> 
>> +      if (__glibc_unlikely (result > 2))
>> +    memcpy (s, buf, result);
>> +      else if (__glibc_unlikely (result > 1))
>> +    *((uint16_t *) s) = *((uint16_t *) buf);
>> +      else
>> +    *s = *buf;
> 
> If the likely path is result == 1, shouldn't that be checked first? Something like this:
> 
>      if (__glibc_likely (result < 2))
>        *s = *buf;
>      else if (__glibc_likely (result == 2))
>        memcpy (s, buf, result); /* Help the compiler.  */
>      else
>        memcpy (s, buf, result);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark
  2022-05-06 13:36                     ` Siddhesh Poyarekar
@ 2022-05-06 13:46                       ` Adhemerval Zanella
  0 siblings, 0 replies; 38+ messages in thread
From: Adhemerval Zanella @ 2022-05-06 13:46 UTC (permalink / raw)
  To: Siddhesh Poyarekar, libc-alpha



On 06/05/2022 10:36, Siddhesh Poyarekar wrote:
> On 06/05/2022 18:56, Siddhesh Poyarekar wrote:
>> On 06/05/2022 18:50, Adhemerval Zanella wrote:
>>>
>>>
>>> On 06/05/2022 09:59, Siddhesh Poyarekar wrote:
>>>> (Sorry I just committed it before I saw your message)
>>>>
>>>> On 06/05/2022 18:20, Adhemerval Zanella via Libc-alpha wrote:
>>>>>> +  /* 4-byte chars, because smileys are the universal language and we want to
>>>>>> +     ensure optimal performance with them 😊.  */
>>>>>> +  {"en_US.UTF-8",
>>>>>> +   L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"}
>>>>>> +};
>>>>>
>>>>> Could you use use hexadecimal character escape in tests? Although gcc handle multiple
>>>>> -fexec-charset, trying to build it with a different compiler usually emits a lot of
>>>>> warnings.
>>>>>
>>>>
>>>> Hmm, I'm curious to know which compiler this breaks on; it's straight up UTF-8 so should parse just fine, no?
>>>
>>> I does not really breaks, but clang warns about character enconding (I forgot
>>> which was the specific error, I will check it). I have to fix a lot of internal
>>> usage to get a clean make check on clang [1].
>>>
>>> [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=058bb0d51c3f85041c7cbd14704f72003bdbdee9
>>>
>>
>> Ah, so that's all ISO-8859, it shouldn't warn on utf-8.
> 
> In any case, if you see warnings on this, please pass it on to me and I'll fix it but it seems wrong for clang to warn on this.

Yeah, the issue is ISO-8859 in fact.  The L'...' with UTF-8 does not trigger any
issue.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/2] wcrtomb: Make behavior POSIX compliant
  2022-05-06 13:40               ` Adhemerval Zanella
@ 2022-05-06 13:46                 ` Siddhesh Poyarekar
  0 siblings, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-06 13:46 UTC (permalink / raw)
  To: Adhemerval Zanella, Paul Eggert, libc-alpha
  Cc: fweimer, jakub, schwab, dickey

On 06/05/2022 19:10, Adhemerval Zanella via Libc-alpha wrote:
> 
> 
> On 06/05/2022 06:25, Paul Eggert wrote:
>> On 5/5/22 11:43, Siddhesh Poyarekar wrote:
>>> +      else if (__glibc_unlikely (result > 1))
>>> +    *((uint16_t *) s) = *((uint16_t *) buf);
>>
>> Shouldn't this be protected by "#if _STRING_ARCH_unaligned"? Also, unnecessary parens.
> 
> I think this is an aliasing violation and most likely will fault on architectures
> that do not support unaligned access (such as sparc and some arm environments).
> 

I've dropped this in v2 in favour of calling memcpy and rely on the 
compiler to inline the call.  I'll post as soon as `make check` is done.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2] wcrtomb: Make behavior POSIX compliant
  2022-05-05 18:43           ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar
  2022-05-06  9:25             ` Paul Eggert
@ 2022-05-06 14:04             ` Siddhesh Poyarekar
  2022-05-09 13:22               ` Adhemerval Zanella
  2022-05-12 13:15             ` [PATCH v3] " Siddhesh Poyarekar
  2 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-06 14:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: fweimer, jakub, schwab, dickey, eggert

The GNU implementation of wcrtomb assumes that there are at least
MB_CUR_MAX bytes available in the destination buffer passed to wcrtomb
as the first argument.  This is not compatible with the POSIX
definition, which only requires enough space for the input wide
character.

This does not break much in practice because when users supply buffers
smaller than MB_CUR_MAX (e.g. in ncurses), they compute and dynamically
allocate the buffer, which results in enough spare space (thanks to
usable_size in malloc and padding in alloca) that no actual buffer
overflow occurs.  However when the code is built with _FORTIFY_SOURCE,
it runs into the hard check against MB_CUR_MAX in __wcrtomb_chk and
hence fails.  It wasn't evident until now since dynamic allocations
would result in wcrtomb not being fortified but since _FORTIFY_SOURCE=3,
that limitation is gone, resulting in such code failing.

To fix this problem, introduce an internal buffer that is MB_LEN_MAX
long and use that to perform the conversion and then copy the resultant
bytes into the destination buffer.  Also move the fortification check
into the main implementation, which checks the result after conversion
and aborts if the resultant byte count is greater than the destination
buffer size.

One complication is that applications that assume the MB_CUR_MAX
limitation to be gone may not be able to run safely on older glibcs if
they use static destination buffers smaller than MB_CUR_MAX; dynamic
allocations will always have enough spare space that no actual overruns
will occur.  One alternative to fixing this is to bump symbol version to
prevent them from running on older glibcs but that seems too strict a
constraint.  Instead, since these users will only have made this
decision on reading the manual, I have put a note in the manual warning
them about the pitfalls of having static buffers smaller than
MB_CUR_MAX and running them on older glibc.

Benchmarking:

The wcrtomb microbenchmark shows significant increases in maximum
execution time for all locales, ranging from 10x for ar_SA.UTF-8 to
1.5x-2x for nearly everything else.  The mean execution time however saw
practically no impact, with some results even being quicker, indicating
that cache locality has a much bigger role in the overhead.

Given that the additional copy uses a temporary buffer inside wcrtomb,
it's likely that a hot path will end up putting that buffer (which is
responsible for the additional overhead) in a similar place on stack,
giving the necessary cache locality to negate the overhead.  However in
situations where wcrtomb ends up getting called at wildly different
spots on the call stack (or is on different call stacks, e.g. with
threads or different execution contexts) and is still a hotspot, the
performance lag will be visible.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
---
Changes from v1:
- Fixed nits suggested by Paul Eggert.

 debug/tst-fortify.c |  7 ++++++-
 debug/wcrtomb_chk.c |  8 ++------
 include/wchar.h     |  4 ++++
 manual/charset.texi | 10 +++++-----
 wcsmbs/wcrtomb.c    | 36 ++++++++++++++++++++++++++++--------
 5 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/debug/tst-fortify.c b/debug/tst-fortify.c
index 03c9867714..8e94643bf2 100644
--- a/debug/tst-fortify.c
+++ b/debug/tst-fortify.c
@@ -1478,10 +1478,15 @@ do_test (void)
 	 character which has a multibyte representation which does not
 	 fit.  */
       CHK_FAIL_START
-      char smallbuf[2];
+      char smallbuf[1];
       if (wcrtomb (smallbuf, L'\x100', &s) != 2)
 	FAIL ();
       CHK_FAIL_END
+
+      /* Same input with a large enough buffer and we're good.  */
+      char bigenoughbuf[2];
+      if (wcrtomb (bigenoughbuf, L'\x100', &s) != 2)
+	FAIL ();
 #endif

       wchar_t wenough[10];
diff --git a/debug/wcrtomb_chk.c b/debug/wcrtomb_chk.c
index 8b6d026560..28c3ea0d2d 100644
--- a/debug/wcrtomb_chk.c
+++ b/debug/wcrtomb_chk.c
@@ -1,4 +1,5 @@
 /* Copyright (C) 2005-2022 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.

    The GNU C Library is free software; you can redistribute it and/or
@@ -25,10 +26,5 @@
 size_t
 __wcrtomb_chk (char *s, wchar_t wchar, mbstate_t *ps, size_t buflen)
 {
-  /* We do not have to implement the full wctomb semantics since we
-     know that S cannot be NULL when we come here.  */
-  if (buflen < MB_CUR_MAX)
-    __chk_fail ();
-
-  return __wcrtomb (s, wchar, ps);
+  return __wcrtomb_internal (s, wchar, ps, buflen);
 }
diff --git a/include/wchar.h b/include/wchar.h
index 4267985625..db83297bca 100644
--- a/include/wchar.h
+++ b/include/wchar.h
@@ -172,6 +172,10 @@ libc_hidden_proto (__mbrtowc)
 libc_hidden_proto (__mbrlen)
 extern size_t __wcrtomb (char *__restrict __s, wchar_t __wc,
 			 __mbstate_t *__restrict __ps) attribute_hidden;
+extern size_t __wcrtomb_internal (char *__restrict __s, wchar_t __wc,
+				  __mbstate_t *__restrict __ps,
+				  size_t __s_size)
+     attribute_hidden;
 extern size_t __mbsrtowcs (wchar_t *__restrict __dst,
 			   const char **__restrict __src,
 			   size_t __len, __mbstate_t *__restrict __ps)
diff --git a/manual/charset.texi b/manual/charset.texi
index a9b5cb4a37..67fe5bf3c7 100644
--- a/manual/charset.texi
+++ b/manual/charset.texi
@@ -883,11 +883,11 @@ the string @var{s}.  This includes all bytes representing shift
 sequences.

 One word about the interface of the function: there is no parameter
-specifying the length of the array @var{s}.  Instead the function
-assumes that there are at least @code{MB_CUR_MAX} bytes available since
-this is the maximum length of any byte sequence representing a single
-character.  So the caller has to make sure that there is enough space
-available, otherwise buffer overruns can occur.
+specifying the length of the array @var{s}, so the caller has to make sure
+that there is enough space available, otherwise buffer overruns can occur.
+Also, @theglibc{} versions older than 2.36 assume that @var{s} is at least
+@var{MB_CUR_MAX} bytes long, so programs that need to run on older
+@glibcadj{} versions must comply with this limit.

 @pindex wchar.h
 @code{wcrtomb} was introduced in @w{Amendment 1} to @w{ISO C90} and is
diff --git a/wcsmbs/wcrtomb.c b/wcsmbs/wcrtomb.c
index e17438989f..db12e46297 100644
--- a/wcsmbs/wcrtomb.c
+++ b/wcsmbs/wcrtomb.c
@@ -1,4 +1,5 @@
 /* Copyright (C) 1996-2022 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.

    The GNU C Library is free software; you can redistribute it and/or
@@ -20,6 +21,7 @@
 #include <errno.h>
 #include <gconv.h>
 #include <stdlib.h>
+#include <string.h>
 #include <wchar.h>
 #include <wcsmbsload.h>

@@ -34,7 +36,7 @@
 static mbstate_t state;

 size_t
-__wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
+__wcrtomb_internal (char *s, wchar_t wc, mbstate_t *ps, size_t s_size)
 {
   char buf[MB_LEN_MAX];
   struct __gconv_step_data data;
@@ -52,14 +54,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
   /* A first special case is if S is NULL.  This means put PS in the
      initial state.  */
   if (s == NULL)
-    {
-      s = buf;
-      wc = L'\0';
-    }
+    wc = L'\0';

   /* Tell where we want to have the result.  */
-  data.__outbuf = (unsigned char *) s;
-  data.__outbufend = (unsigned char *) s + MB_CUR_MAX;
+  data.__outbuf = (unsigned char *) buf;
+  data.__outbufend = (unsigned char *) buf + sizeof buf;

   /* Get the conversion functions.  */
   fcts = get_gconv_fcts (_NL_CURRENT_DATA (LC_CTYPE));
@@ -101,7 +100,22 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps)

   if (status == __GCONV_OK || status == __GCONV_EMPTY_INPUT
       || status == __GCONV_FULL_OUTPUT)
-    result = data.__outbuf - (unsigned char *) s;
+    {
+      result = data.__outbuf - (unsigned char *) buf;
+
+      if (s != NULL)
+	{
+	  if (result > s_size)
+	    __chk_fail ();
+
+	  if (__glibc_likely (result < 2))
+	    *s = *buf;
+	  else if (__glibc_likely (result == 2))
+	    memcpy (s, buf, result);	/* Help the compiler.  */
+	  else
+	    memcpy (s, buf, result);
+	}
+    }
   else
     {
       result = (size_t) -1;
@@ -110,5 +124,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps)

   return result;
 }
+
+size_t
+__wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
+{
+  return __wcrtomb_internal (s, wc, ps, (size_t) -1);
+}
 weak_alias (__wcrtomb, wcrtomb)
 libc_hidden_weak (wcrtomb)
-- 
2.35.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] wcrtomb: Make behavior POSIX compliant
  2022-05-06 14:04             ` [PATCH v2] " Siddhesh Poyarekar
@ 2022-05-09 13:22               ` Adhemerval Zanella
  2022-05-09 13:35                 ` Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Adhemerval Zanella @ 2022-05-09 13:22 UTC (permalink / raw)
  To: libc-alpha



On 06/05/2022 11:04, Siddhesh Poyarekar via Libc-alpha wrote:

> +
> +	  if (__glibc_likely (result < 2))
> +	    *s = *buf;
> +	  else if (__glibc_likely (result == 2))
> +	    memcpy (s, buf, result);	/* Help the compiler.  */
> +	  else
> +	    memcpy (s, buf, result);
> +	}
> +    }

I am not following how this help the compiler here, it just duplicates the
same memcpy call.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] wcrtomb: Make behavior POSIX compliant
  2022-05-09 13:22               ` Adhemerval Zanella
@ 2022-05-09 13:35                 ` Siddhesh Poyarekar
  0 siblings, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-09 13:35 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

On 09/05/2022 18:52, Adhemerval Zanella via Libc-alpha wrote:
> 
> 
> On 06/05/2022 11:04, Siddhesh Poyarekar via Libc-alpha wrote:
> 
>> +
>> +	  if (__glibc_likely (result < 2))
>> +	    *s = *buf;
>> +	  else if (__glibc_likely (result == 2))
>> +	    memcpy (s, buf, result);	/* Help the compiler.  */
>> +	  else
>> +	    memcpy (s, buf, result);
>> +	}
>> +    }
> 
> I am not following how this help the compiler here, it just duplicates the
> same memcpy call.
> 

It should be able to inline the first memcpy call into a copy 
instruction sequence.  With just an `else`, result may be 2 or more, so 
the compiler doesn't have enough information to do any inlining.

However, it just occurred to me that maybe something like this:

if (result <= MB_LEN_MAX)
   memcpy (s, buf, result);
else
   __builtin_unreachable ();

will likely produce an always avoid the memcpy call and generate an 
inlined sequence.  I'll test that and repost.  Thanks for making me 
think about this a bit more.

I'll also update the manual based on the discussion in the weekly call 
to make the blurb more backport-friendly.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3] wcrtomb: Make behavior POSIX compliant
  2022-05-05 18:43           ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar
  2022-05-06  9:25             ` Paul Eggert
  2022-05-06 14:04             ` [PATCH v2] " Siddhesh Poyarekar
@ 2022-05-12 13:15             ` Siddhesh Poyarekar
  2022-05-13  4:56               ` Paul Eggert
  2 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-12 13:15 UTC (permalink / raw)
  To: libc-alpha; +Cc: adhemerval.zanella, fweimer, dickey, eggert

The GNU implementation of wcrtomb assumes that there are at least
MB_CUR_MAX bytes available in the destination buffer passed to wcrtomb
as the first argument.  This is not compatible with the POSIX
definition, which only requires enough space for the input wide
character.

This does not break much in practice because when users supply buffers
smaller than MB_CUR_MAX (e.g. in ncurses), they compute and dynamically
allocate the buffer, which results in enough spare space (thanks to
usable_size in malloc and padding in alloca) that no actual buffer
overflow occurs.  However when the code is built with _FORTIFY_SOURCE,
it runs into the hard check against MB_CUR_MAX in __wcrtomb_chk and
hence fails.  It wasn't evident until now since dynamic allocations
would result in wcrtomb not being fortified but since _FORTIFY_SOURCE=3,
that limitation is gone, resulting in such code failing.

To fix this problem, introduce an internal buffer that is MB_LEN_MAX
long and use that to perform the conversion and then copy the resultant
bytes into the destination buffer.  Also move the fortification check
into the main implementation, which checks the result after conversion
and aborts if the resultant byte count is greater than the destination
buffer size.

One complication is that applications that assume the MB_CUR_MAX
limitation to be gone may not be able to run safely on older glibcs if
they use static destination buffers smaller than MB_CUR_MAX; dynamic
allocations will always have enough spare space that no actual overruns
will occur.  One alternative to fixing this is to bump symbol version to
prevent them from running on older glibcs but that seems too strict a
constraint.  Instead, since these users will only have made this
decision on reading the manual, I have put a note in the manual warning
them about the pitfalls of having static buffers smaller than
MB_CUR_MAX and running them on older glibc.

Benchmarking:

The wcrtomb microbenchmark shows significant increases in maximum
execution time for all locales, ranging from 10x for ar_SA.UTF-8 to
1.5x-2x for nearly everything else.  The mean execution time however saw
practically no impact, with some results even being quicker, indicating
that cache locality has a much bigger role in the overhead.

Given that the additional copy uses a temporary buffer inside wcrtomb,
it's likely that a hot path will end up putting that buffer (which is
responsible for the additional overhead) in a similar place on stack,
giving the necessary cache locality to negate the overhead.  However in
situations where wcrtomb ends up getting called at wildly different
spots on the call stack (or is on different call stacks, e.g. with
threads or different execution contexts) and is still a hotspot, the
performance lag will be visible.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
---
Changes from v2:
- Updated manual blurb to refer to "this version" so that it is
  backport-clean.

- Changed copy logic to inline copies of up to 4 bytes.  It is 1-7%
  faster that the previous inlining of only 1 and 2 bytes and a good
  10%+ faster than using plain memcpy.  4 bytes is the maximum length
  for byte encodings in charmaps shipped in glibc.

Changes from v1:
- Fixed nits suggested by Paul Eggert.

 debug/tst-fortify.c |  7 ++++++-
 debug/wcrtomb_chk.c |  8 ++-----
 include/wchar.h     |  4 ++++
 manual/charset.texi | 11 +++++-----
 wcsmbs/wcrtomb.c    | 51 ++++++++++++++++++++++++++++++++++++++-------
 5 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/debug/tst-fortify.c b/debug/tst-fortify.c
index 03c9867714..8e94643bf2 100644
--- a/debug/tst-fortify.c
+++ b/debug/tst-fortify.c
@@ -1478,10 +1478,15 @@ do_test (void)
 	 character which has a multibyte representation which does not
 	 fit.  */
       CHK_FAIL_START
-      char smallbuf[2];
+      char smallbuf[1];
       if (wcrtomb (smallbuf, L'\x100', &s) != 2)
 	FAIL ();
       CHK_FAIL_END
+
+      /* Same input with a large enough buffer and we're good.  */
+      char bigenoughbuf[2];
+      if (wcrtomb (bigenoughbuf, L'\x100', &s) != 2)
+	FAIL ();
 #endif
 
       wchar_t wenough[10];
diff --git a/debug/wcrtomb_chk.c b/debug/wcrtomb_chk.c
index 8b6d026560..28c3ea0d2d 100644
--- a/debug/wcrtomb_chk.c
+++ b/debug/wcrtomb_chk.c
@@ -1,4 +1,5 @@
 /* Copyright (C) 2005-2022 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -25,10 +26,5 @@
 size_t
 __wcrtomb_chk (char *s, wchar_t wchar, mbstate_t *ps, size_t buflen)
 {
-  /* We do not have to implement the full wctomb semantics since we
-     know that S cannot be NULL when we come here.  */
-  if (buflen < MB_CUR_MAX)
-    __chk_fail ();
-
-  return __wcrtomb (s, wchar, ps);
+  return __wcrtomb_internal (s, wchar, ps, buflen);
 }
diff --git a/include/wchar.h b/include/wchar.h
index 4267985625..db83297bca 100644
--- a/include/wchar.h
+++ b/include/wchar.h
@@ -172,6 +172,10 @@ libc_hidden_proto (__mbrtowc)
 libc_hidden_proto (__mbrlen)
 extern size_t __wcrtomb (char *__restrict __s, wchar_t __wc,
 			 __mbstate_t *__restrict __ps) attribute_hidden;
+extern size_t __wcrtomb_internal (char *__restrict __s, wchar_t __wc,
+				  __mbstate_t *__restrict __ps,
+				  size_t __s_size)
+     attribute_hidden;
 extern size_t __mbsrtowcs (wchar_t *__restrict __dst,
 			   const char **__restrict __src,
 			   size_t __len, __mbstate_t *__restrict __ps)
diff --git a/manual/charset.texi b/manual/charset.texi
index a9b5cb4a37..427db3bc80 100644
--- a/manual/charset.texi
+++ b/manual/charset.texi
@@ -883,11 +883,12 @@ the string @var{s}.  This includes all bytes representing shift
 sequences.
 
 One word about the interface of the function: there is no parameter
-specifying the length of the array @var{s}.  Instead the function
-assumes that there are at least @code{MB_CUR_MAX} bytes available since
-this is the maximum length of any byte sequence representing a single
-character.  So the caller has to make sure that there is enough space
-available, otherwise buffer overruns can occur.
+specifying the length of the array @var{s}, so the caller has to make sure
+that there is enough space available, otherwise buffer overruns can occur.
+This version of @theglibc{} does not assume that @var{s} is at least
+@var{MB_CUR_MAX} bytes long, but programs that need to run on @glibcadj{}
+versions that have this assumption documented in the manual must comply
+with this limit.
 
 @pindex wchar.h
 @code{wcrtomb} was introduced in @w{Amendment 1} to @w{ISO C90} and is
diff --git a/wcsmbs/wcrtomb.c b/wcsmbs/wcrtomb.c
index e17438989f..d68dcdac82 100644
--- a/wcsmbs/wcrtomb.c
+++ b/wcsmbs/wcrtomb.c
@@ -1,4 +1,5 @@
 /* Copyright (C) 1996-2022 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -20,6 +21,7 @@
 #include <errno.h>
 #include <gconv.h>
 #include <stdlib.h>
+#include <string.h>
 #include <wchar.h>
 #include <wcsmbsload.h>
 
@@ -34,7 +36,7 @@
 static mbstate_t state;
 
 size_t
-__wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
+__wcrtomb_internal (char *s, wchar_t wc, mbstate_t *ps, size_t s_size)
 {
   char buf[MB_LEN_MAX];
   struct __gconv_step_data data;
@@ -52,14 +54,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
   /* A first special case is if S is NULL.  This means put PS in the
      initial state.  */
   if (s == NULL)
-    {
-      s = buf;
-      wc = L'\0';
-    }
+    wc = L'\0';
 
   /* Tell where we want to have the result.  */
-  data.__outbuf = (unsigned char *) s;
-  data.__outbufend = (unsigned char *) s + MB_CUR_MAX;
+  data.__outbuf = (unsigned char *) buf;
+  data.__outbufend = (unsigned char *) buf + sizeof buf;
 
   /* Get the conversion functions.  */
   fcts = get_gconv_fcts (_NL_CURRENT_DATA (LC_CTYPE));
@@ -101,7 +100,37 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
 
   if (status == __GCONV_OK || status == __GCONV_EMPTY_INPUT
       || status == __GCONV_FULL_OUTPUT)
-    result = data.__outbuf - (unsigned char *) s;
+    {
+      result = data.__outbuf - (unsigned char *) buf;
+
+      if (s != NULL)
+	{
+	  if (result > s_size)
+	    __chk_fail ();
+
+	  /* The maximum byte size in all charmaps is 4, so inline copies up to
+	     4 bytes.  One must benchmark before bumping this in future because
+	     the returns for inlining will diminish as the number of branches
+	     increases.  */
+	  switch (result)
+	    {
+	    case 4:
+	      s[3] = buf[3];
+	      /* Fall through.  */
+	    case 3:
+	      s[2] = buf[2];
+	      /* Fall through.  */
+	    case 2:
+	      s[1] = buf[1];
+	      /* Fall through.  */
+	    case 1:
+	      s[0] = buf[0];
+	      break;
+	    default:
+	      memcpy (s, buf, result);
+	    }
+	}
+    }
   else
     {
       result = (size_t) -1;
@@ -110,5 +139,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
 
   return result;
 }
+
+size_t
+__wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
+{
+  return __wcrtomb_internal (s, wc, ps, (size_t) -1);
+}
 weak_alias (__wcrtomb, wcrtomb)
 libc_hidden_weak (wcrtomb)
-- 
2.35.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant
  2022-05-12 13:15             ` [PATCH v3] " Siddhesh Poyarekar
@ 2022-05-13  4:56               ` Paul Eggert
  2022-05-13  5:28                 ` Paul Eggert
  2022-05-13  8:18                 ` [PATCH v3] " Siddhesh Poyarekar
  0 siblings, 2 replies; 38+ messages in thread
From: Paul Eggert @ 2022-05-13  4:56 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: adhemerval.zanella, fweimer, dickey, libc-alpha

On 5/12/22 06:15, Siddhesh Poyarekar wrote:
> +	  switch (result)
> +	    {
> +	    case 4:
> +	      s[3] = buf[3];
> +	      /* Fall through.  */
> +	    case 3:
> +	      s[2] = buf[2];
> +	      /* Fall through.  */
> +	    case 2:
> +	      s[1] = buf[1];
> +	      /* Fall through.  */
> +	    case 1:
> +	      s[0] = buf[0];
> +	      break;
> +	    default:
> +	      memcpy (s, buf, result);
> +	    }

For me with GCC 12.1 -O2 on x86-64, the above code generates 2 
comparisons and 3 conditional branches in the common case where RESULT 
is 1. Plus, GCC generates a jmp from the end of case 3 to the start of 
case 2 (which precedes case 3 in the machine code), which is a bit odd.

How about something like the following instead? This generates machine 
code with only one conditional branch - the one that decides whether to 
call memcpy - and this branch is never taken with glibc-supplied 
charmaps. (I'm assuming RESULT is at least 1.)

   s[0] = buf[0];
   if (2 <= result && result <= 4)
     {
       s[1] = buf[1];
       memcpy (&s[result != 2], &buf[result != 2], 2);
       s[result - 1] = buf[result - 1];
     }
   else
     memcpy (s, buf, result);

Hope you don't mind my bikeshedding. (PATCH v3 looks correct as-is, for 
what it's worth.)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant
  2022-05-13  4:56               ` Paul Eggert
@ 2022-05-13  5:28                 ` Paul Eggert
  2022-05-13 11:31                   ` Siddhesh Poyarekar
  2022-05-13  8:18                 ` [PATCH v3] " Siddhesh Poyarekar
  1 sibling, 1 reply; 38+ messages in thread
From: Paul Eggert @ 2022-05-13  5:28 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: fweimer, libc-alpha, dickey

On 5/12/22 21:56, Paul Eggert wrote:
> Hope you don't mind my bikeshedding.

Better yet, this:

   s[0] = buf[0];
   if (2 <= result && result <= 4)
     {
       s[1] = buf[1];
       memcpy (&s[result - 2], &buf[result - 2], 2);
     }
   else
     memcpy (s, buf, result);

On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is only 9 
straight-line instructions, counting the compare insn and the 
conditional-branch insn that is never taken.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant
  2022-05-13  4:56               ` Paul Eggert
  2022-05-13  5:28                 ` Paul Eggert
@ 2022-05-13  8:18                 ` Siddhesh Poyarekar
  1 sibling, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-13  8:18 UTC (permalink / raw)
  To: Paul Eggert; +Cc: adhemerval.zanella, fweimer, dickey, libc-alpha

On 13/05/2022 10:26, Paul Eggert wrote:
> On 5/12/22 06:15, Siddhesh Poyarekar wrote:
>> +      switch (result)
>> +        {
>> +        case 4:
>> +          s[3] = buf[3];
>> +          /* Fall through.  */
>> +        case 3:
>> +          s[2] = buf[2];
>> +          /* Fall through.  */
>> +        case 2:
>> +          s[1] = buf[1];
>> +          /* Fall through.  */
>> +        case 1:
>> +          s[0] = buf[0];
>> +          break;
>> +        default:
>> +          memcpy (s, buf, result);
>> +        }
> 
> For me with GCC 12.1 -O2 on x86-64, the above code generates 2 
> comparisons and 3 conditional branches in the common case where RESULT 
> is 1. Plus, GCC generates a jmp from the end of case 3 to the start of 
> case 2 (which precedes case 3 in the machine code), which is a bit odd.

It seems to hit actual performance less, presumably because those 
branches get predicted correctly most times.  However we could simply 
elide the check for the first byte like you've done.

> How about something like the following instead? This generates machine 
> code with only one conditional branch - the one that decides whether to 
> call memcpy - and this branch is never taken with glibc-supplied 
> charmaps. (I'm assuming RESULT is at least 1.)
> 
>    s[0] = buf[0];
>    if (2 <= result && result <= 4)
>      {
>        s[1] = buf[1];
>        memcpy (&s[result != 2], &buf[result != 2], 2);
>        s[result - 1] = buf[result - 1];
>      }
>    else
>      memcpy (s, buf, result);

This produces redundant loads and stores, which hits 2, 3, 4 byte copies 
much worse; I reckon the condition in the subscript ends up preventing 
the compiler from eliminating them.

If result == 1 is a safe assumption (as it should be I think) then we 
could hoist it out of the switch like so:

           s[0] = buf[0];
           switch (result)
             {
             case 4:
               s[3] = buf[3];
               /* Fall through.  */
             case 3:
               s[2] = buf[2];
               /* Fall through.  */
             case 2:
               s[1] = buf[1];
               /* Fall through.  */
             case 1:
               break;
             default:
               memcpy (s, buf, result);
             }

This improves things ever so slightly for the 1 and 2 byte locale tests 
in the microbenchmark, while keeping things the same for the rest. 
There's still a redundant byte copy in the > 4 bytes case, but that 
should be nothing compared to the PLT indirection.

What do you think?

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant
  2022-05-13  5:28                 ` Paul Eggert
@ 2022-05-13 11:31                   ` Siddhesh Poyarekar
  2022-05-13 11:38                     ` Florian Weimer
  0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-13 11:31 UTC (permalink / raw)
  To: Paul Eggert; +Cc: fweimer, libc-alpha, dickey

On 13/05/2022 10:58, Paul Eggert wrote:
> On 5/12/22 21:56, Paul Eggert wrote:
>> Hope you don't mind my bikeshedding.
> 
> Better yet, this:
> 
>    s[0] = buf[0];
>    if (2 <= result && result <= 4)
>      {
>        s[1] = buf[1];
>        memcpy (&s[result - 2], &buf[result - 2], 2);
>      }
>    else
>      memcpy (s, buf, result);
> 
> On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is only 9 
> straight-line instructions, counting the compare insn and the 
> conditional-branch insn that is never taken.
> 

Sorry I missed this one.  I tried it and with gcc 11 it seems to produce 
worse code, merging the two memcpys instead of inlining the first one.

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant
  2022-05-13 11:31                   ` Siddhesh Poyarekar
@ 2022-05-13 11:38                     ` Florian Weimer
  2022-05-13 11:51                       ` Siddhesh Poyarekar
  2022-05-13 12:30                       ` Adhemerval Zanella
  0 siblings, 2 replies; 38+ messages in thread
From: Florian Weimer @ 2022-05-13 11:38 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Paul Eggert, libc-alpha, dickey

* Siddhesh Poyarekar:

> On 13/05/2022 10:58, Paul Eggert wrote:
>> On 5/12/22 21:56, Paul Eggert wrote:
>>> Hope you don't mind my bikeshedding.
>> Better yet, this:
>>    s[0] = buf[0];
>>    if (2 <= result && result <= 4)
>>      {
>>        s[1] = buf[1];
>>        memcpy (&s[result - 2], &buf[result - 2], 2);
>>      }
>>    else
>>      memcpy (s, buf, result);
>> On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is
>> only 9 straight-line instructions, counting the compare insn and the 
>> conditional-branch insn that is never taken.
>> 
>
> Sorry I missed this one.  I tried it and with gcc 11 it seems to
> produce worse code, merging the two memcpys instead of inlining the
> first one.

Can we please use the simplified code?

“char buf[MB_LEN_MAX];” already gives GCC a strong hint this is a short
memcpy.  If it is beneficial on some architectures to inline such short,
but still variable memcpy calls, we should teach the compiler how to do
it, and not do it manually.

And wcrtomb would for sure benefit more from a fast path for ASCII and
UTF-8. 8-p

(Not that I think there are many users out there, given glibc's
historically poor performance in the multibyte/wide string conversion
space.)

Thanks,
Florian


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant
  2022-05-13 11:38                     ` Florian Weimer
@ 2022-05-13 11:51                       ` Siddhesh Poyarekar
  2022-05-13 12:55                         ` Florian Weimer
  2022-05-13 12:30                       ` Adhemerval Zanella
  1 sibling, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-13 11:51 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, dickey

On 13/05/2022 17:08, Florian Weimer via Libc-alpha wrote:
> * Siddhesh Poyarekar:
> 
>> On 13/05/2022 10:58, Paul Eggert wrote:
>>> On 5/12/22 21:56, Paul Eggert wrote:
>>>> Hope you don't mind my bikeshedding.
>>> Better yet, this:
>>>     s[0] = buf[0];
>>>     if (2 <= result && result <= 4)
>>>       {
>>>         s[1] = buf[1];
>>>         memcpy (&s[result - 2], &buf[result - 2], 2);
>>>       }
>>>     else
>>>       memcpy (s, buf, result);
>>> On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is
>>> only 9 straight-line instructions, counting the compare insn and the
>>> conditional-branch insn that is never taken.
>>>
>>
>> Sorry I missed this one.  I tried it and with gcc 11 it seems to
>> produce worse code, merging the two memcpys instead of inlining the
>> first one.
> 
> Can we please use the simplified code?

So just a memcpy call?  I could push that and attempt to micro-optimize 
if there are reports of slowdown due to this.

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant
  2022-05-13 11:38                     ` Florian Weimer
  2022-05-13 11:51                       ` Siddhesh Poyarekar
@ 2022-05-13 12:30                       ` Adhemerval Zanella
  2022-05-13 13:42                         ` Siddhesh Poyarekar
  2022-05-13 13:45                         ` [committed] " Siddhesh Poyarekar
  1 sibling, 2 replies; 38+ messages in thread
From: Adhemerval Zanella @ 2022-05-13 12:30 UTC (permalink / raw)
  To: libc-alpha



On 13/05/2022 08:38, Florian Weimer via Libc-alpha wrote:
> * Siddhesh Poyarekar:
> 
>> On 13/05/2022 10:58, Paul Eggert wrote:
>>> On 5/12/22 21:56, Paul Eggert wrote:
>>>> Hope you don't mind my bikeshedding.
>>> Better yet, this:
>>>    s[0] = buf[0];
>>>    if (2 <= result && result <= 4)
>>>      {
>>>        s[1] = buf[1];
>>>        memcpy (&s[result - 2], &buf[result - 2], 2);
>>>      }
>>>    else
>>>      memcpy (s, buf, result);
>>> On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is
>>> only 9 straight-line instructions, counting the compare insn and the 
>>> conditional-branch insn that is never taken.
>>>
>>
>> Sorry I missed this one.  I tried it and with gcc 11 it seems to
>> produce worse code, merging the two memcpys instead of inlining the
>> first one.
> 
> Can we please use the simplified code?
> 
> “char buf[MB_LEN_MAX];” already gives GCC a strong hint this is a short
> memcpy.  If it is beneficial on some architectures to inline such short,
> but still variable memcpy calls, we should teach the compiler how to do
> it, and not do it manually.

Totally agreed.

> 
> And wcrtomb would for sure benefit more from a fast path for ASCII and
> UTF-8. 8-p

And it does make sene.

> 
> (Not that I think there are many users out there, given glibc's
> historically poor performance in the multibyte/wide string conversion
> space.)



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant
  2022-05-13 11:51                       ` Siddhesh Poyarekar
@ 2022-05-13 12:55                         ` Florian Weimer
  0 siblings, 0 replies; 38+ messages in thread
From: Florian Weimer @ 2022-05-13 12:55 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha, dickey

* Siddhesh Poyarekar:

> On 13/05/2022 17:08, Florian Weimer via Libc-alpha wrote:
>> * Siddhesh Poyarekar:
>> 
>>> On 13/05/2022 10:58, Paul Eggert wrote:
>>>> On 5/12/22 21:56, Paul Eggert wrote:
>>>>> Hope you don't mind my bikeshedding.
>>>> Better yet, this:
>>>>     s[0] = buf[0];
>>>>     if (2 <= result && result <= 4)
>>>>       {
>>>>         s[1] = buf[1];
>>>>         memcpy (&s[result - 2], &buf[result - 2], 2);
>>>>       }
>>>>     else
>>>>       memcpy (s, buf, result);
>>>> On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is
>>>> only 9 straight-line instructions, counting the compare insn and the
>>>> conditional-branch insn that is never taken.
>>>>
>>>
>>> Sorry I missed this one.  I tried it and with gcc 11 it seems to
>>> produce worse code, merging the two memcpys instead of inlining the
>>> first one.
>> Can we please use the simplified code?
>
> So just a memcpy call?  I could push that and attempt to
> micro-optimize if there are reports of slowdown due to this.

Yes, I would prefer that.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant
  2022-05-13 12:30                       ` Adhemerval Zanella
@ 2022-05-13 13:42                         ` Siddhesh Poyarekar
  2022-05-13 17:58                           ` Paul Eggert
  2022-05-13 13:45                         ` [committed] " Siddhesh Poyarekar
  1 sibling, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-13 13:42 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

On 13/05/2022 18:00, Adhemerval Zanella via Libc-alpha wrote:
>> Can we please use the simplified code?
>>
>> “char buf[MB_LEN_MAX];” already gives GCC a strong hint this is a short
>> memcpy.  If it is beneficial on some architectures to inline such short,
>> but still variable memcpy calls, we should teach the compiler how to do
>> it, and not do it manually.
> 
> Totally agreed.
> 

Ugh, both of you just kill all the fun.  I'll push a fix with just the 
memcpy ;)

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [committed] wcrtomb: Make behavior POSIX compliant
  2022-05-13 12:30                       ` Adhemerval Zanella
  2022-05-13 13:42                         ` Siddhesh Poyarekar
@ 2022-05-13 13:45                         ` Siddhesh Poyarekar
  1 sibling, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2022-05-13 13:45 UTC (permalink / raw)
  To: libc-alpha

The GNU implementation of wcrtomb assumes that there are at least
MB_CUR_MAX bytes available in the destination buffer passed to wcrtomb
as the first argument.  This is not compatible with the POSIX
definition, which only requires enough space for the input wide
character.

This does not break much in practice because when users supply buffers
smaller than MB_CUR_MAX (e.g. in ncurses), they compute and dynamically
allocate the buffer, which results in enough spare space (thanks to
usable_size in malloc and padding in alloca) that no actual buffer
overflow occurs.  However when the code is built with _FORTIFY_SOURCE,
it runs into the hard check against MB_CUR_MAX in __wcrtomb_chk and
hence fails.  It wasn't evident until now since dynamic allocations
would result in wcrtomb not being fortified but since _FORTIFY_SOURCE=3,
that limitation is gone, resulting in such code failing.

To fix this problem, introduce an internal buffer that is MB_LEN_MAX
long and use that to perform the conversion and then copy the resultant
bytes into the destination buffer.  Also move the fortification check
into the main implementation, which checks the result after conversion
and aborts if the resultant byte count is greater than the destination
buffer size.

One complication is that applications that assume the MB_CUR_MAX
limitation to be gone may not be able to run safely on older glibcs if
they use static destination buffers smaller than MB_CUR_MAX; dynamic
allocations will always have enough spare space that no actual overruns
will occur.  One alternative to fixing this is to bump symbol version to
prevent them from running on older glibcs but that seems too strict a
constraint.  Instead, since these users will only have made this
decision on reading the manual, I have put a note in the manual warning
them about the pitfalls of having static buffers smaller than
MB_CUR_MAX and running them on older glibc.

Benchmarking:

The wcrtomb microbenchmark shows significant increases in maximum
execution time for all locales, ranging from 10x for ar_SA.UTF-8 to
1.5x-2x for nearly everything else.  The mean execution time however saw
practically no impact, with some results even being quicker, indicating
that cache locality has a much bigger role in the overhead.

Given that the additional copy uses a temporary buffer inside wcrtomb,
it's likely that a hot path will end up putting that buffer (which is
responsible for the additional overhead) in a similar place on stack,
giving the necessary cache locality to negate the overhead.  However in
situations where wcrtomb ends up getting called at wildly different
spots on the call stack (or is on different call stacks, e.g. with
threads or different execution contexts) and is still a hotspot, the
performance lag will be visible.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
---
 debug/tst-fortify.c |  7 ++++++-
 debug/wcrtomb_chk.c |  8 ++------
 include/wchar.h     |  4 ++++
 manual/charset.texi | 11 ++++++-----
 wcsmbs/wcrtomb.c    | 31 +++++++++++++++++++++++--------
 5 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/debug/tst-fortify.c b/debug/tst-fortify.c
index 03c9867714..8e94643bf2 100644
--- a/debug/tst-fortify.c
+++ b/debug/tst-fortify.c
@@ -1478,10 +1478,15 @@ do_test (void)
 	 character which has a multibyte representation which does not
 	 fit.  */
       CHK_FAIL_START
-      char smallbuf[2];
+      char smallbuf[1];
       if (wcrtomb (smallbuf, L'\x100', &s) != 2)
 	FAIL ();
       CHK_FAIL_END
+
+      /* Same input with a large enough buffer and we're good.  */
+      char bigenoughbuf[2];
+      if (wcrtomb (bigenoughbuf, L'\x100', &s) != 2)
+	FAIL ();
 #endif

       wchar_t wenough[10];
diff --git a/debug/wcrtomb_chk.c b/debug/wcrtomb_chk.c
index 8b6d026560..28c3ea0d2d 100644
--- a/debug/wcrtomb_chk.c
+++ b/debug/wcrtomb_chk.c
@@ -1,4 +1,5 @@
 /* Copyright (C) 2005-2022 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.

    The GNU C Library is free software; you can redistribute it and/or
@@ -25,10 +26,5 @@
 size_t
 __wcrtomb_chk (char *s, wchar_t wchar, mbstate_t *ps, size_t buflen)
 {
-  /* We do not have to implement the full wctomb semantics since we
-     know that S cannot be NULL when we come here.  */
-  if (buflen < MB_CUR_MAX)
-    __chk_fail ();
-
-  return __wcrtomb (s, wchar, ps);
+  return __wcrtomb_internal (s, wchar, ps, buflen);
 }
diff --git a/include/wchar.h b/include/wchar.h
index 4267985625..db83297bca 100644
--- a/include/wchar.h
+++ b/include/wchar.h
@@ -172,6 +172,10 @@ libc_hidden_proto (__mbrtowc)
 libc_hidden_proto (__mbrlen)
 extern size_t __wcrtomb (char *__restrict __s, wchar_t __wc,
 			 __mbstate_t *__restrict __ps) attribute_hidden;
+extern size_t __wcrtomb_internal (char *__restrict __s, wchar_t __wc,
+				  __mbstate_t *__restrict __ps,
+				  size_t __s_size)
+     attribute_hidden;
 extern size_t __mbsrtowcs (wchar_t *__restrict __dst,
 			   const char **__restrict __src,
 			   size_t __len, __mbstate_t *__restrict __ps)
diff --git a/manual/charset.texi b/manual/charset.texi
index a9b5cb4a37..427db3bc80 100644
--- a/manual/charset.texi
+++ b/manual/charset.texi
@@ -883,11 +883,12 @@ the string @var{s}.  This includes all bytes representing shift
 sequences.

 One word about the interface of the function: there is no parameter
-specifying the length of the array @var{s}.  Instead the function
-assumes that there are at least @code{MB_CUR_MAX} bytes available since
-this is the maximum length of any byte sequence representing a single
-character.  So the caller has to make sure that there is enough space
-available, otherwise buffer overruns can occur.
+specifying the length of the array @var{s}, so the caller has to make sure
+that there is enough space available, otherwise buffer overruns can occur.
+This version of @theglibc{} does not assume that @var{s} is at least
+@var{MB_CUR_MAX} bytes long, but programs that need to run on @glibcadj{}
+versions that have this assumption documented in the manual must comply
+with this limit.

 @pindex wchar.h
 @code{wcrtomb} was introduced in @w{Amendment 1} to @w{ISO C90} and is
diff --git a/wcsmbs/wcrtomb.c b/wcsmbs/wcrtomb.c
index e17438989f..c0cce3792f 100644
--- a/wcsmbs/wcrtomb.c
+++ b/wcsmbs/wcrtomb.c
@@ -1,4 +1,5 @@
 /* Copyright (C) 1996-2022 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.

    The GNU C Library is free software; you can redistribute it and/or
@@ -20,6 +21,7 @@
 #include <errno.h>
 #include <gconv.h>
 #include <stdlib.h>
+#include <string.h>
 #include <wchar.h>
 #include <wcsmbsload.h>

@@ -34,7 +36,7 @@
 static mbstate_t state;

 size_t
-__wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
+__wcrtomb_internal (char *s, wchar_t wc, mbstate_t *ps, size_t s_size)
 {
   char buf[MB_LEN_MAX];
   struct __gconv_step_data data;
@@ -52,14 +54,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
   /* A first special case is if S is NULL.  This means put PS in the
      initial state.  */
   if (s == NULL)
-    {
-      s = buf;
-      wc = L'\0';
-    }
+    wc = L'\0';

   /* Tell where we want to have the result.  */
-  data.__outbuf = (unsigned char *) s;
-  data.__outbufend = (unsigned char *) s + MB_CUR_MAX;
+  data.__outbuf = (unsigned char *) buf;
+  data.__outbufend = (unsigned char *) buf + sizeof buf;

   /* Get the conversion functions.  */
   fcts = get_gconv_fcts (_NL_CURRENT_DATA (LC_CTYPE));
@@ -101,7 +100,17 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps)

   if (status == __GCONV_OK || status == __GCONV_EMPTY_INPUT
       || status == __GCONV_FULL_OUTPUT)
-    result = data.__outbuf - (unsigned char *) s;
+    {
+      result = data.__outbuf - (unsigned char *) buf;
+
+      if (s != NULL)
+	{
+	  if (result > s_size)
+	    __chk_fail ();
+
+	  memcpy (s, buf, result);
+	}
+    }
   else
     {
       result = (size_t) -1;
@@ -110,5 +119,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps)

   return result;
 }
+
+size_t
+__wcrtomb (char *s, wchar_t wc, mbstate_t *ps)
+{
+  return __wcrtomb_internal (s, wc, ps, (size_t) -1);
+}
 weak_alias (__wcrtomb, wcrtomb)
 libc_hidden_weak (wcrtomb)
-- 
2.35.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant
  2022-05-13 13:42                         ` Siddhesh Poyarekar
@ 2022-05-13 17:58                           ` Paul Eggert
  0 siblings, 0 replies; 38+ messages in thread
From: Paul Eggert @ 2022-05-13 17:58 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Adhemerval Zanella, libc-alpha

On 5/13/22 06:42, Siddhesh Poyarekar wrote:
> Ugh, both of you just kill all the fun.  I'll push a fix with just the 
> memcpy ;)

Just as well, as the code I sent before is slow for ASCII anyway. I 
shouldn't let that stand as my suggestion, even if just for fun.

> with gcc 11 it seems to produce worse code, merging the two memcpys instead of inlining the first one.

Here's a better version if someone ever wants to tune this stuff. In 
this version there's only one memcpy so GCC won't merge it. On x86-64 
with GCC 12.1 -O2, if RESULT<=2 this executes 6 instructions (including 
1 comparison and 1 conditional branch); if 3<=RESULT<=4 this executes 8 
instructions total (including 2 comparisons and 2 conditional branches).

   if (result <= 2)
     {
       s[0] = buf[0];
       s[result - 1] = buf[result - 1];
     }
   else if (result <= 4)
     {
       s[0] = buf[0];
       s[1] = buf[1];
       s[result - 2] = buf[result - 2];
       s[result - 1] = buf[result - 1];
     }
   else
     memcpy (s, buf, result);

There are of course other variants....

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-05-13 17:58 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07  6:26 [RFC] _FORTIFY_SOURCE strictness Siddhesh Poyarekar
2022-04-07 10:16 ` Andreas Schwab
2022-04-08  3:24   ` Siddhesh Poyarekar
2022-04-08  2:26 ` Paul Eggert
2022-04-08  3:32   ` Siddhesh Poyarekar
2022-04-08  5:37 ` Florian Weimer
2022-04-08  6:02   ` Siddhesh Poyarekar
2022-04-08 21:07     ` Paul Eggert
2022-04-11  8:02       ` Siddhesh Poyarekar
2022-05-05 18:43         ` [PATCH 0/2] More compliant wcrtomb Siddhesh Poyarekar
2022-05-05 18:43           ` [PATCH 1/2] benchtests: Add wcrtomb microbenchmark Siddhesh Poyarekar
2022-05-06  9:10             ` Florian Weimer
2022-05-06 12:49               ` [committed] " Siddhesh Poyarekar
2022-05-06 12:50             ` [PATCH 1/2] " Adhemerval Zanella
2022-05-06 12:59               ` Siddhesh Poyarekar
2022-05-06 13:20                 ` Adhemerval Zanella
2022-05-06 13:26                   ` Siddhesh Poyarekar
2022-05-06 13:36                     ` Siddhesh Poyarekar
2022-05-06 13:46                       ` Adhemerval Zanella
2022-05-05 18:43           ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar
2022-05-06  9:25             ` Paul Eggert
2022-05-06 13:40               ` Adhemerval Zanella
2022-05-06 13:46                 ` Siddhesh Poyarekar
2022-05-06 14:04             ` [PATCH v2] " Siddhesh Poyarekar
2022-05-09 13:22               ` Adhemerval Zanella
2022-05-09 13:35                 ` Siddhesh Poyarekar
2022-05-12 13:15             ` [PATCH v3] " Siddhesh Poyarekar
2022-05-13  4:56               ` Paul Eggert
2022-05-13  5:28                 ` Paul Eggert
2022-05-13 11:31                   ` Siddhesh Poyarekar
2022-05-13 11:38                     ` Florian Weimer
2022-05-13 11:51                       ` Siddhesh Poyarekar
2022-05-13 12:55                         ` Florian Weimer
2022-05-13 12:30                       ` Adhemerval Zanella
2022-05-13 13:42                         ` Siddhesh Poyarekar
2022-05-13 17:58                           ` Paul Eggert
2022-05-13 13:45                         ` [committed] " Siddhesh Poyarekar
2022-05-13  8:18                 ` [PATCH v3] " Siddhesh Poyarekar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).