* [RFC] _FORTIFY_SOURCE strictness @ 2022-04-07 6:26 Siddhesh Poyarekar 2022-04-07 10:16 ` Andreas Schwab ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-04-07 6:26 UTC (permalink / raw) To: libc-alpha Cc: Adhemerval Zanella, Andreas Schwab, Carlos O'Donell, Florian Weimer, Jakub Jelinek, Martin Liška Hi, _FORTIFY_SOURCE levels up to 2 rely on static object size computation to make decisions at compile time to either call a fortified bounds-checking variant or a regular variant of a function. Because the size estimates were static and often upper limit estimates, the bounds were either optimistic or not present at all. This made aggressive bounds checks such as those in __snprintf_chk[1] or __wcrtomb_chk[2] safe enough as to have low false positives (where there's no actual buffer overflow but we have defined stricter requirements in the fortified functions) and hence pass through uncontested. This situation has changed with _FORTIFY_SOURCE=3, where it is now possible to get non-constant expressions for object sizes, providing not just more precise estimates, but also greater coverage, which appears to have increased the possibility of such false positives. This is not limited to the two known examples either; __strncpy_chk for example will crash if n is greater than the destination buffer size and is similarly prone to such false positives. One could envision a situation where an strncpy call is deeply nested and through compiler advances and attribute annotations, the callsite now gets precise size expressions for the call and not just an upper limit estimate or a (size_t)-1. Of course, good defensive programming practice means that one always pass n that is within limits of both source and destination bounds in strncpy but is a runtime check the right place to enforce good practices as opposed to only crashing on actual overflows? There appear to be two alternatives out of this situations and I wanted to build consensus around one of these or even a third alternative that I hadn't thought of. I of course volunteer to work on the approach we have consensus on: Option 1: Do nothing We could take the stand that it is what it is and that fortification checks are necessarily stricter than standards allowances to enforce better application programming practice. In this case I propose we improve the overflow messages to clearly indicate that we're enforcing a stricter limitation than the standard requires and not give the incorrect impression that we _detected_ an overflow. The downside of this approach is the possibility that some applications don't fortify beyond level 2, insisting that their usage is safe enough. At the moment it seems like a reasonably small set of applications are running into these, so maybe it's OK? Option 2: Improve checking accuracy at _FORTIFY_SOURCE=3 This would involve making the checking functions smarter, at level 3 and abort only on actual overflows. This could have a small runtime overhead in cases like __wcrtomb_chk but a larger one for cases like __snprintf_chk. It is also a more invasive change since it will involve changing all implementations. This essentially is similar to sanitizer-like functionality. The downside of this approach is the possibility that some applications don't fortify beyond level 2 because the performance overhead is unacceptable. Thoughts? Maybe an Option 3 that's less worse than the above two options? Thanks, Siddhesh [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28989 [2] https://lists.gnu.org/archive/html/bug-ncurses/2022-04/msg00002.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC] _FORTIFY_SOURCE strictness 2022-04-07 6:26 [RFC] _FORTIFY_SOURCE strictness Siddhesh Poyarekar @ 2022-04-07 10:16 ` Andreas Schwab 2022-04-08 3:24 ` Siddhesh Poyarekar 2022-04-08 2:26 ` Paul Eggert 2022-04-08 5:37 ` Florian Weimer 2 siblings, 1 reply; 38+ messages in thread From: Andreas Schwab @ 2022-04-07 10:16 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: libc-alpha, Adhemerval Zanella, Carlos O'Donell, Florian Weimer, Jakub Jelinek, Martin Liška On Apr 07 2022, Siddhesh Poyarekar wrote: > The downside of this approach is the possibility that some applications > don't fortify beyond level 2, insisting that their usage is safe enough. The problem with this argument is that what is safe enough now, may be unsafe later due to an unrelated change elsewhere, or an attacker injecting some unforeseen data. It is generally better to be safer in the first place, because aborting deep inside the call chain is a risk in itself, even if it prevented an acute undefined behaviour from doing bad side effects. By checking bounds early better error recovery is possible in general. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC] _FORTIFY_SOURCE strictness 2022-04-07 10:16 ` Andreas Schwab @ 2022-04-08 3:24 ` Siddhesh Poyarekar 0 siblings, 0 replies; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-04-08 3:24 UTC (permalink / raw) To: Andreas Schwab Cc: libc-alpha, Adhemerval Zanella, Carlos O'Donell, Florian Weimer, Jakub Jelinek, Martin Liška On 07/04/2022 15:46, Andreas Schwab wrote: > On Apr 07 2022, Siddhesh Poyarekar wrote: > >> The downside of this approach is the possibility that some applications >> don't fortify beyond level 2, insisting that their usage is safe enough. > > The problem with this argument is that what is safe enough now, may be > unsafe later due to an unrelated change elsewhere, or an attacker > injecting some unforeseen data. It is generally better to be safer in > the first place, because aborting deep inside the call chain is a risk > in itself, even if it prevented an acute undefined behaviour from doing > bad side effects. By checking bounds early better error recovery is > possible in general. > That's a fair point. Thanks, Siddhesh ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC] _FORTIFY_SOURCE strictness 2022-04-07 6:26 [RFC] _FORTIFY_SOURCE strictness Siddhesh Poyarekar 2022-04-07 10:16 ` Andreas Schwab @ 2022-04-08 2:26 ` Paul Eggert 2022-04-08 3:32 ` Siddhesh Poyarekar 2022-04-08 5:37 ` Florian Weimer 2 siblings, 1 reply; 38+ messages in thread From: Paul Eggert @ 2022-04-08 2:26 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Florian Weimer, Jakub Jelinek, Andreas Schwab, libc-alpha On 4/6/22 23:26, Siddhesh Poyarekar wrote: > Thoughts? Maybe an Option 3 that's less worse than the above two options? Perhaps we could use Option 1 (do nothing) for functions like snprintf, and Option 2 (abort only on actual overflows) for functions like strncpy. For snprintf Option 1 is arguably allowed by the C standard, which says "If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid." In other words, if you call snprintf (a, n, ...) the implementation is allowed compute &a[0] through &a[n] even if it doesn't store into every byte in 'a' - and this means Option 1 is allowed. For strncpy the situation is murkier and arguably something like Option 2 would be needed, due to the funky way that the standard words the strncpy spec. However, this is not that important, since strncpy is (or at least should be) rarely used nowadays, so it's not that performance-relevant. If we're lucky, most of the affected functions are more like snprintf than like strcpy, which means we can use Option 1 (do nothing) for most of the affected functions. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC] _FORTIFY_SOURCE strictness 2022-04-08 2:26 ` Paul Eggert @ 2022-04-08 3:32 ` Siddhesh Poyarekar 0 siblings, 0 replies; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-04-08 3:32 UTC (permalink / raw) To: Paul Eggert; +Cc: Florian Weimer, Jakub Jelinek, Andreas Schwab, libc-alpha On 08/04/2022 07:56, Paul Eggert wrote: > On 4/6/22 23:26, Siddhesh Poyarekar wrote: > >> Thoughts? Maybe an Option 3 that's less worse than the above two >> options? > > Perhaps we could use Option 1 (do nothing) for functions like snprintf, > and Option 2 (abort only on actual overflows) for functions like strncpy. I suppose for __strncpy_chk it will be about as expensive as __strcpy_chk since the latter already does a strlen check anyway. We could, I suppose even limit it to __FORTIFY_LEVEL > 2 to limit the performance impact to levels where we already warn about potential performance impact. > For snprintf Option 1 is arguably allowed by the C standard, which says > "If a function argument is described as being an array, the pointer > actually passed to the function shall have a value such that all address > computations and accesses to objects (that would be valid if the pointer > did point to the first element of such an array) are in fact valid." In > other words, if you call snprintf (a, n, ...) the implementation is > allowed compute &a[0] through &a[n] even if it doesn't store into every > byte in 'a' - and this means Option 1 is allowed. > > For strncpy the situation is murkier and arguably something like Option > 2 would be needed, due to the funky way that the standard words the > strncpy spec. However, this is not that important, since strncpy is (or > at least should be) rarely used nowadays, so it's not that > performance-relevant. I remember someone telling me about their pet project of finding frequencies of libc function use in distributions. Let me see if I can find that and see where strncpy stands. > If we're lucky, most of the affected functions are more like snprintf > than like strcpy, which means we can use Option 1 (do nothing) for most > of the affected functions. How would you read wcrtomb[1]? ISTM that it falls into the strncpy category, i.e. the fortified behaviour seems stricter than what the standards require. I tried building an argument to the contrary[2] but even I can't stand behind it with full conviction :) Thanks, Siddhesh [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/wcrtomb.html# [2] https://lists.gnu.org/archive/html/bug-ncurses/2022-04/msg00004.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC] _FORTIFY_SOURCE strictness 2022-04-07 6:26 [RFC] _FORTIFY_SOURCE strictness Siddhesh Poyarekar 2022-04-07 10:16 ` Andreas Schwab 2022-04-08 2:26 ` Paul Eggert @ 2022-04-08 5:37 ` Florian Weimer 2022-04-08 6:02 ` Siddhesh Poyarekar 2 siblings, 1 reply; 38+ messages in thread From: Florian Weimer @ 2022-04-08 5:37 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: libc-alpha, Adhemerval Zanella, Andreas Schwab, Carlos O'Donell, Jakub Jelinek, Martin Liška * Siddhesh Poyarekar: > This is not limited to the two known examples either; __strncpy_chk > for example will crash if n is greater than the destination buffer > size and is similarly prone to such false positives. One could > envision a situation where an strncpy call is deeply nested and > through compiler advances and attribute annotations, the callsite now > gets precise size expressions for the call and not just an upper limit > estimate or a (size_t)-1. Hmm. __strncpy_chk always fills the destination buffer, so the only thing we can do here is to alias it to strncpy? Thanks, Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC] _FORTIFY_SOURCE strictness 2022-04-08 5:37 ` Florian Weimer @ 2022-04-08 6:02 ` Siddhesh Poyarekar 2022-04-08 21:07 ` Paul Eggert 0 siblings, 1 reply; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-04-08 6:02 UTC (permalink / raw) To: Florian Weimer Cc: libc-alpha, Adhemerval Zanella, Andreas Schwab, Carlos O'Donell, Jakub Jelinek, Martin Liška On 08/04/2022 11:07, Florian Weimer wrote: > * Siddhesh Poyarekar: > >> This is not limited to the two known examples either; __strncpy_chk >> for example will crash if n is greater than the destination buffer >> size and is similarly prone to such false positives. One could >> envision a situation where an strncpy call is deeply nested and >> through compiler advances and attribute annotations, the callsite now >> gets precise size expressions for the call and not just an upper limit >> estimate or a (size_t)-1. > > Hmm. __strncpy_chk always fills the destination buffer, so the only > thing we can do here is to alias it to strncpy? Hmm, I think I conflated it with something other str* function. You're right, strncpy probably doesn't fall into this category. Thanks, Siddhesh ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC] _FORTIFY_SOURCE strictness 2022-04-08 6:02 ` Siddhesh Poyarekar @ 2022-04-08 21:07 ` Paul Eggert 2022-04-11 8:02 ` Siddhesh Poyarekar 0 siblings, 1 reply; 38+ messages in thread From: Paul Eggert @ 2022-04-08 21:07 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Jakub Jelinek, libc-alpha, Andreas Schwab, Florian Weimer On 4/7/22 23:02, Siddhesh Poyarekar wrote: > Hmm, I think I conflated it with something other str* function. You're > right, strncpy probably doesn't fall into this category. Ouch, I made the same mistake. As for wcrtomb, unfortunately the standard's wording appears to allow you to pass an output buffer smaller than MB_CUR_MAX if you know that the multibyte character will fit into the smaller buffer. So I guess this is an example of a function where __FORTIFY_LEVEL > 2 doesn't conform to the standard. I don't know whether the standard's authors intended this. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC] _FORTIFY_SOURCE strictness 2022-04-08 21:07 ` Paul Eggert @ 2022-04-11 8:02 ` Siddhesh Poyarekar 2022-05-05 18:43 ` [PATCH 0/2] More compliant wcrtomb Siddhesh Poyarekar 0 siblings, 1 reply; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-04-11 8:02 UTC (permalink / raw) To: Paul Eggert Cc: Jakub Jelinek, libc-alpha, Andreas Schwab, Florian Weimer, Thomas Dickey On 09/04/2022 02:37, Paul Eggert wrote: > On 4/7/22 23:02, Siddhesh Poyarekar wrote: >> Hmm, I think I conflated it with something other str* function. >> You're right, strncpy probably doesn't fall into this category. > > Ouch, I made the same mistake. > > As for wcrtomb, unfortunately the standard's wording appears to allow > you to pass an output buffer smaller than MB_CUR_MAX if you know that > the multibyte character will fit into the smaller buffer. So I guess > this is an example of a function where __FORTIFY_LEVEL > 2 doesn't > conform to the standard. > > I don't know whether the standard's authors intended this. > Thomas (in cc, he maintains ncurses) found that the manual documents[1] the fact that the glibc implementation of wcrtomb assumes the destination buffer to have at least MB_CUR_MAX bytes, so this looks more like a situation where we *deliberately* deviate from the standard. If we decide to comply with the standard now we would incur an additional copy from an internal buffer to the destination with in addition to wrappers to pass the object size from the checking variant whenever available. The question then is whether that's the direction we want to take in glibc. Siddhesh [1] https://www.gnu.org/software/libc/manual/html_node/Converting-a-Character.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 0/2] More compliant wcrtomb 2022-04-11 8:02 ` Siddhesh Poyarekar @ 2022-05-05 18:43 ` Siddhesh Poyarekar 2022-05-05 18:43 ` [PATCH 1/2] benchtests: Add wcrtomb microbenchmark Siddhesh Poyarekar 2022-05-05 18:43 ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar 0 siblings, 2 replies; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-05 18:43 UTC (permalink / raw) To: libc-alpha; +Cc: fweimer, eggert, dickey, jakub, schwab This two part series removes the MB_CUR_MAX limitation from wcrtomb to make it comply with POSIX. The full rationale for doing so is in 2/2 to ensure that it remains in the git commit log. Siddhesh Poyarekar (2): benchtests: Add wcrtomb microbenchmark wcrtomb: Make behavior POSIX compliant benchtests/Makefile | 1 + benchtests/bench-wcrtomb.c | 140 +++++++++++++++++++++++++++++++++++++ debug/tst-fortify.c | 7 +- debug/wcrtomb_chk.c | 7 +- include/wchar.h | 4 ++ manual/charset.texi | 10 +-- wcsmbs/wcrtomb.c | 33 ++++++--- 7 files changed, 182 insertions(+), 20 deletions(-) create mode 100644 benchtests/bench-wcrtomb.c -- 2.35.1 ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 1/2] benchtests: Add wcrtomb microbenchmark 2022-05-05 18:43 ` [PATCH 0/2] More compliant wcrtomb Siddhesh Poyarekar @ 2022-05-05 18:43 ` Siddhesh Poyarekar 2022-05-06 9:10 ` Florian Weimer 2022-05-06 12:50 ` [PATCH 1/2] " Adhemerval Zanella 2022-05-05 18:43 ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar 1 sibling, 2 replies; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-05 18:43 UTC (permalink / raw) To: libc-alpha; +Cc: fweimer, eggert, dickey, jakub, schwab Add a simple benchmark that measures wcrtomb performance with various locales with 1-4 byte characters. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> --- benchtests/Makefile | 1 + benchtests/bench-wcrtomb.c | 140 +++++++++++++++++++++++++++++++++++++ 2 files changed, 141 insertions(+) create mode 100644 benchtests/bench-wcrtomb.c diff --git a/benchtests/Makefile b/benchtests/Makefile index 149d87e22e..de9de5cf58 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -171,6 +171,7 @@ ifeq (no,$(cross-compiling)) wcsmbs-benchset := \ wcpcpy \ wcpncpy \ + wcrtomb \ wcscat \ wcschr \ wcschrnul \ diff --git a/benchtests/bench-wcrtomb.c b/benchtests/bench-wcrtomb.c new file mode 100644 index 0000000000..6cef69cdbf --- /dev/null +++ b/benchtests/bench-wcrtomb.c @@ -0,0 +1,140 @@ +/* Measure wcrtomb function. + Copyright The GNU Toolchain Authors. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <limits.h> +#include <locale.h> +#include <string.h> +#include <wchar.h> + +#include "bench-timing.h" +#include "json-lib.h" + +#define NITERS 100000 + +struct test_inputs +{ + const char *locale; + const wchar_t *input_chars; +}; + +/* The inputs represent different types of characters, e.g. RTL, 1 byte, 2 + byte, 3 byte and 4 byte chars. The exact number of inputs per locale + doesn't really matter because we're not looking to compare performance + between locales. */ +struct test_inputs inputs[] = +{ + /* RTL. */ + {"ar_SA.UTF-8", + L",-.،؟ـًُّ٠٢٣٤ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي"}, + + /* Various mixes of 1 and 2 byte chars. */ + {"cs_CZ.UTF-8", + L",.aAábcCčdDďeEéÉěĚfFghHiIíJlLmMnNňŇoóÓpPqQrřsSšŠTťuUúÚůŮvVWxyýz"}, + + {"el_GR.UTF-8", + L",.αΑβγδΔεΕζηΗθΘιΙκΚλμΜνΝξοΟπΠρΡσΣςτυΥφΦχψω"}, + + {"en_GB.UTF-8", + L",.aAāĀæÆǽǣǢbcCċdDðÐeEēĒfFgGġhHiIīĪlLmMnNoōpPqQrsSTuUūŪvVwxyȝzþÞƿǷ"}, + + {"fr_FR.UTF-8", + L",.aAàâbcCçdDeEéèêëfFghHiIîïjlLmMnNoOôœpPqQrRsSTuUùûvVwxyz"}, + + {"he_IL.UTF-8", + L"',.ִאבגדהוזחטיכךלמםנןסעפףצץקרשת"}, + + /* Devanagari, Japanese, 3-byte chars. */ + {"hi_IN.UTF-8", + L"(।ं०४५७अआइईउऎएओऔकखगघचछजञटडढणतथदधनपफ़बभमयरलवशषसहािीुूृेैोौ्"}, + + {"ja_JP.UTF-8", + L".ー0123456789あアいイうウえエおオかカがきキぎくクぐけケげこコごさサざ"}, + + /* More mixtures of 1 and 2 byte chars. */ + {"ru_RU.UTF-8", + L",.аАбвВгдДеЕёЁжЖзЗийЙкКлЛмМнНоОпПрстТуУфФхХЦчшШщъыЫьэЭюЮя"}, + + {"sr_RS.UTF-8", + L",.aAbcCćčdDđĐeEfgGhHiIlLmMnNoOpPqQrsSšŠTuUvVxyzZž"}, + + {"sv_SE.UTF-8", + L",.aAåÅäÄæÆbBcCdDeEfFghHiIjlLmMnNoOöÖpPqQrsSTuUvVwxyz"}, + + /* Chinese, 3-byte chars */ + {"zh_CN.UTF-8", + L"一七三下不与世両並中串主乱予事二五亡京人今仕付以任企伎会伸住佐体作使"}, + + /* 4-byte chars, because smileys are the universal language and we want to + ensure optimal performance with them 😊. */ + {"en_US.UTF-8", + L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"} +}; + +char buf[MB_LEN_MAX]; +size_t ret; + +int +main (int argc, char **argv) +{ + const size_t inputs_len = sizeof (inputs) / sizeof (struct test_inputs); + + json_ctx_t json_ctx; + json_init (&json_ctx, 0, stdout); + json_document_begin (&json_ctx); + + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); + json_attr_object_begin (&json_ctx, "functions"); + json_attr_object_begin (&json_ctx, "wcrtomb"); + + for (size_t i = 0; i < inputs_len; i++) + { + json_attr_object_begin (&json_ctx, inputs[i].locale); + setlocale (LC_ALL, inputs[i].locale); + + timing_t min = 0x7fffffffffffffff, max = 0, total = 0; + const wchar_t *inp = inputs[i].input_chars; + const size_t len = wcslen (inp); + mbstate_t s; + + memset (&s, '\0', sizeof (s)); + + for (size_t n = 0; n < NITERS; n++) + { + timing_t start, end, elapsed; + + TIMING_NOW (start); + for (size_t j = 0; j < len; j++) + ret = wcrtomb (buf, inp[j], &s); + TIMING_NOW (end); + TIMING_DIFF (elapsed, start, end); + if (min > elapsed) + min = elapsed; + if (max < elapsed) + max = elapsed; + TIMING_ACCUM (total, elapsed); + } + json_attr_double (&json_ctx, "max", max); + json_attr_double (&json_ctx, "min", min); + json_attr_double (&json_ctx, "mean", total / NITERS); + json_attr_object_end (&json_ctx); + } + + json_attr_object_end (&json_ctx); + json_attr_object_end (&json_ctx); + json_document_end (&json_ctx); +} -- 2.35.1 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark 2022-05-05 18:43 ` [PATCH 1/2] benchtests: Add wcrtomb microbenchmark Siddhesh Poyarekar @ 2022-05-06 9:10 ` Florian Weimer 2022-05-06 12:49 ` [committed] " Siddhesh Poyarekar 2022-05-06 12:50 ` [PATCH 1/2] " Adhemerval Zanella 1 sibling, 1 reply; 38+ messages in thread From: Florian Weimer @ 2022-05-06 9:10 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: libc-alpha, eggert, dickey, jakub, schwab * Siddhesh Poyarekar: > +int > +main (int argc, char **argv) > +{ > + const size_t inputs_len = sizeof (inputs) / sizeof (struct test_inputs); Can you use array_length? > + mbstate_t s; Trailing whitespace on this line. Rest looks okay. Thanks, Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* [committed] benchtests: Add wcrtomb microbenchmark 2022-05-06 9:10 ` Florian Weimer @ 2022-05-06 12:49 ` Siddhesh Poyarekar 0 siblings, 0 replies; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-06 12:49 UTC (permalink / raw) To: libc-alpha; +Cc: Florian Weimer Add a simple benchmark that measures wcrtomb performance with various locales with 1-4 byte characters. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Florian Weimer <fweimer@redhat.com> --- benchtests/Makefile | 1 + benchtests/bench-wcrtomb.c | 139 +++++++++++++++++++++++++++++++++++++ 2 files changed, 140 insertions(+) create mode 100644 benchtests/bench-wcrtomb.c diff --git a/benchtests/Makefile b/benchtests/Makefile index 149d87e22e..de9de5cf58 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -171,6 +171,7 @@ ifeq (no,$(cross-compiling)) wcsmbs-benchset := \ wcpcpy \ wcpncpy \ + wcrtomb \ wcscat \ wcschr \ wcschrnul \ diff --git a/benchtests/bench-wcrtomb.c b/benchtests/bench-wcrtomb.c new file mode 100644 index 0000000000..232a7d59de --- /dev/null +++ b/benchtests/bench-wcrtomb.c @@ -0,0 +1,139 @@ +/* Measure wcrtomb function. + Copyright The GNU Toolchain Authors. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <array_length.h> +#include <limits.h> +#include <locale.h> +#include <string.h> +#include <wchar.h> + +#include "bench-timing.h" +#include "json-lib.h" + +#define NITERS 100000 + +struct test_inputs +{ + const char *locale; + const wchar_t *input_chars; +}; + +/* The inputs represent different types of characters, e.g. RTL, 1 byte, 2 + byte, 3 byte and 4 byte chars. The exact number of inputs per locale + doesn't really matter because we're not looking to compare performance + between locales. */ +struct test_inputs inputs[] = +{ + /* RTL. */ + {"ar_SA.UTF-8", + L",-.،؟ـًُّ٠٢٣٤ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي"}, + + /* Various mixes of 1 and 2 byte chars. */ + {"cs_CZ.UTF-8", + L",.aAábcCčdDďeEéÉěĚfFghHiIíJlLmMnNňŇoóÓpPqQrřsSšŠTťuUúÚůŮvVWxyýz"}, + + {"el_GR.UTF-8", + L",.αΑβγδΔεΕζηΗθΘιΙκΚλμΜνΝξοΟπΠρΡσΣςτυΥφΦχψω"}, + + {"en_GB.UTF-8", + L",.aAāĀæÆǽǣǢbcCċdDðÐeEēĒfFgGġhHiIīĪlLmMnNoōpPqQrsSTuUūŪvVwxyȝzþÞƿǷ"}, + + {"fr_FR.UTF-8", + L",.aAàâbcCçdDeEéèêëfFghHiIîïjlLmMnNoOôœpPqQrRsSTuUùûvVwxyz"}, + + {"he_IL.UTF-8", + L"',.ִאבגדהוזחטיכךלמםנןסעפףצץקרשת"}, + + /* Devanagari, Japanese, 3-byte chars. */ + {"hi_IN.UTF-8", + L"(।ं०४५७अआइईउऎएओऔकखगघचछजञटडढणतथदधनपफ़बभमयरलवशषसहािीुूृेैोौ्"}, + + {"ja_JP.UTF-8", + L".ー0123456789あアいイうウえエおオかカがきキぎくクぐけケげこコごさサざ"}, + + /* More mixtures of 1 and 2 byte chars. */ + {"ru_RU.UTF-8", + L",.аАбвВгдДеЕёЁжЖзЗийЙкКлЛмМнНоОпПрстТуУфФхХЦчшШщъыЫьэЭюЮя"}, + + {"sr_RS.UTF-8", + L",.aAbcCćčdDđĐeEfgGhHiIlLmMnNoOpPqQrsSšŠTuUvVxyzZž"}, + + {"sv_SE.UTF-8", + L",.aAåÅäÄæÆbBcCdDeEfFghHiIjlLmMnNoOöÖpPqQrsSTuUvVwxyz"}, + + /* Chinese, 3-byte chars */ + {"zh_CN.UTF-8", + L"一七三下不与世両並中串主乱予事二五亡京人今仕付以任企伎会伸住佐体作使"}, + + /* 4-byte chars, because smileys are the universal language and we want to + ensure optimal performance with them 😊. */ + {"en_US.UTF-8", + L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"} +}; + +char buf[MB_LEN_MAX]; +size_t ret; + +int +main (int argc, char **argv) +{ + json_ctx_t json_ctx; + json_init (&json_ctx, 0, stdout); + json_document_begin (&json_ctx); + + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); + json_attr_object_begin (&json_ctx, "functions"); + json_attr_object_begin (&json_ctx, "wcrtomb"); + + for (size_t i = 0; i < array_length (inputs); i++) + { + json_attr_object_begin (&json_ctx, inputs[i].locale); + setlocale (LC_ALL, inputs[i].locale); + + timing_t min = 0x7fffffffffffffff, max = 0, total = 0; + const wchar_t *inp = inputs[i].input_chars; + const size_t len = wcslen (inp); + mbstate_t s; + + memset (&s, '\0', sizeof (s)); + + for (size_t n = 0; n < NITERS; n++) + { + timing_t start, end, elapsed; + + TIMING_NOW (start); + for (size_t j = 0; j < len; j++) + ret = wcrtomb (buf, inp[j], &s); + TIMING_NOW (end); + TIMING_DIFF (elapsed, start, end); + if (min > elapsed) + min = elapsed; + if (max < elapsed) + max = elapsed; + TIMING_ACCUM (total, elapsed); + } + json_attr_double (&json_ctx, "max", max); + json_attr_double (&json_ctx, "min", min); + json_attr_double (&json_ctx, "mean", total / NITERS); + json_attr_object_end (&json_ctx); + } + + json_attr_object_end (&json_ctx); + json_attr_object_end (&json_ctx); + json_document_end (&json_ctx); +} -- 2.35.1 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark 2022-05-05 18:43 ` [PATCH 1/2] benchtests: Add wcrtomb microbenchmark Siddhesh Poyarekar 2022-05-06 9:10 ` Florian Weimer @ 2022-05-06 12:50 ` Adhemerval Zanella 2022-05-06 12:59 ` Siddhesh Poyarekar 1 sibling, 1 reply; 38+ messages in thread From: Adhemerval Zanella @ 2022-05-06 12:50 UTC (permalink / raw) To: libc-alpha On 05/05/2022 15:43, Siddhesh Poyarekar via Libc-alpha wrote: > Add a simple benchmark that measures wcrtomb performance with various > locales with 1-4 byte characters. > > Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> > --- > benchtests/Makefile | 1 + > benchtests/bench-wcrtomb.c | 140 +++++++++++++++++++++++++++++++++++++ > 2 files changed, 141 insertions(+) > create mode 100644 benchtests/bench-wcrtomb.c > > diff --git a/benchtests/Makefile b/benchtests/Makefile > index 149d87e22e..de9de5cf58 100644 > --- a/benchtests/Makefile > +++ b/benchtests/Makefile > @@ -171,6 +171,7 @@ ifeq (no,$(cross-compiling)) > wcsmbs-benchset := \ > wcpcpy \ > wcpncpy \ > + wcrtomb \ > wcscat \ > wcschr \ > wcschrnul \ > diff --git a/benchtests/bench-wcrtomb.c b/benchtests/bench-wcrtomb.c > new file mode 100644 > index 0000000000..6cef69cdbf > --- /dev/null > +++ b/benchtests/bench-wcrtomb.c > @@ -0,0 +1,140 @@ > +/* Measure wcrtomb function. > + Copyright The GNU Toolchain Authors. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <https://www.gnu.org/licenses/>. */ > + > +#include <limits.h> > +#include <locale.h> > +#include <string.h> > +#include <wchar.h> > + > +#include "bench-timing.h" > +#include "json-lib.h" > + > +#define NITERS 100000 > + > +struct test_inputs > +{ > + const char *locale; > + const wchar_t *input_chars; > +}; > + > +/* The inputs represent different types of characters, e.g. RTL, 1 byte, 2 > + byte, 3 byte and 4 byte chars. The exact number of inputs per locale > + doesn't really matter because we're not looking to compare performance > + between locales. */ > +struct test_inputs inputs[] = > +{ > + /* RTL. */ > + {"ar_SA.UTF-8", > + L",-.،؟ـًُّ٠٢٣٤ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي"}, > + > + /* Various mixes of 1 and 2 byte chars. */ > + {"cs_CZ.UTF-8", > + L",.aAábcCčdDďeEéÉěĚfFghHiIíJlLmMnNňŇoóÓpPqQrřsSšŠTťuUúÚůŮvVWxyýz"}, > + > + {"el_GR.UTF-8", > + L",.αΑβγδΔεΕζηΗθΘιΙκΚλμΜνΝξοΟπΠρΡσΣςτυΥφΦχψω"}, > + > + {"en_GB.UTF-8", > + L",.aAāĀæÆǽǣǢbcCċdDðÐeEēĒfFgGġhHiIīĪlLmMnNoōpPqQrsSTuUūŪvVwxyȝzþÞƿǷ"}, > + > + {"fr_FR.UTF-8", > + L",.aAàâbcCçdDeEéèêëfFghHiIîïjlLmMnNoOôœpPqQrRsSTuUùûvVwxyz"}, > + > + {"he_IL.UTF-8", > + L"',.ִאבגדהוזחטיכךלמםנןסעפףצץקרשת"}, > + > + /* Devanagari, Japanese, 3-byte chars. */ > + {"hi_IN.UTF-8", > + L"(।ं०४५७अआइईउऎएओऔकखगघचछजञटडढणतथदधनपफ़बभमयरलवशषसहािीुूृेैोौ्"}, > + > + {"ja_JP.UTF-8", > + L".ー0123456789あアいイうウえエおオかカがきキぎくクぐけケげこコごさサざ"}, > + > + /* More mixtures of 1 and 2 byte chars. */ > + {"ru_RU.UTF-8", > + L",.аАбвВгдДеЕёЁжЖзЗийЙкКлЛмМнНоОпПрстТуУфФхХЦчшШщъыЫьэЭюЮя"}, > + > + {"sr_RS.UTF-8", > + L",.aAbcCćčdDđĐeEfgGhHiIlLmMnNoOpPqQrsSšŠTuUvVxyzZž"}, > + > + {"sv_SE.UTF-8", > + L",.aAåÅäÄæÆbBcCdDeEfFghHiIjlLmMnNoOöÖpPqQrsSTuUvVwxyz"}, > + > + /* Chinese, 3-byte chars */ > + {"zh_CN.UTF-8", > + L"一七三下不与世両並中串主乱予事二五亡京人今仕付以任企伎会伸住佐体作使"}, > + > + /* 4-byte chars, because smileys are the universal language and we want to > + ensure optimal performance with them 😊. */ > + {"en_US.UTF-8", > + L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"} > +}; Could you use use hexadecimal character escape in tests? Although gcc handle multiple -fexec-charset, trying to build it with a different compiler usually emits a lot of warnings. > + > +char buf[MB_LEN_MAX]; > +size_t ret; > + > +int > +main (int argc, char **argv) > +{ > + const size_t inputs_len = sizeof (inputs) / sizeof (struct test_inputs); > + > + json_ctx_t json_ctx; > + json_init (&json_ctx, 0, stdout); > + json_document_begin (&json_ctx); > + > + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); > + json_attr_object_begin (&json_ctx, "functions"); > + json_attr_object_begin (&json_ctx, "wcrtomb"); > + > + for (size_t i = 0; i < inputs_len; i++) > + { > + json_attr_object_begin (&json_ctx, inputs[i].locale); > + setlocale (LC_ALL, inputs[i].locale); > + > + timing_t min = 0x7fffffffffffffff, max = 0, total = 0; > + const wchar_t *inp = inputs[i].input_chars; > + const size_t len = wcslen (inp); > + mbstate_t s; > + > + memset (&s, '\0', sizeof (s)); > + > + for (size_t n = 0; n < NITERS; n++) > + { > + timing_t start, end, elapsed; > + > + TIMING_NOW (start); > + for (size_t j = 0; j < len; j++) > + ret = wcrtomb (buf, inp[j], &s); > + TIMING_NOW (end); > + TIMING_DIFF (elapsed, start, end); > + if (min > elapsed) > + min = elapsed; > + if (max < elapsed) > + max = elapsed; > + TIMING_ACCUM (total, elapsed); > + } > + json_attr_double (&json_ctx, "max", max); > + json_attr_double (&json_ctx, "min", min); > + json_attr_double (&json_ctx, "mean", total / NITERS); > + json_attr_object_end (&json_ctx); > + } > + > + json_attr_object_end (&json_ctx); > + json_attr_object_end (&json_ctx); > + json_document_end (&json_ctx); > +} ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark 2022-05-06 12:50 ` [PATCH 1/2] " Adhemerval Zanella @ 2022-05-06 12:59 ` Siddhesh Poyarekar 2022-05-06 13:20 ` Adhemerval Zanella 0 siblings, 1 reply; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-06 12:59 UTC (permalink / raw) To: Adhemerval Zanella, libc-alpha (Sorry I just committed it before I saw your message) On 06/05/2022 18:20, Adhemerval Zanella via Libc-alpha wrote: >> + /* 4-byte chars, because smileys are the universal language and we want to >> + ensure optimal performance with them 😊. */ >> + {"en_US.UTF-8", >> + L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"} >> +}; > > Could you use use hexadecimal character escape in tests? Although gcc handle multiple > -fexec-charset, trying to build it with a different compiler usually emits a lot of > warnings. > Hmm, I'm curious to know which compiler this breaks on; it's straight up UTF-8 so should parse just fine, no? Thanks, Siddhesh ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark 2022-05-06 12:59 ` Siddhesh Poyarekar @ 2022-05-06 13:20 ` Adhemerval Zanella 2022-05-06 13:26 ` Siddhesh Poyarekar 0 siblings, 1 reply; 38+ messages in thread From: Adhemerval Zanella @ 2022-05-06 13:20 UTC (permalink / raw) To: Siddhesh Poyarekar, libc-alpha On 06/05/2022 09:59, Siddhesh Poyarekar wrote: > (Sorry I just committed it before I saw your message) > > On 06/05/2022 18:20, Adhemerval Zanella via Libc-alpha wrote: >>> + /* 4-byte chars, because smileys are the universal language and we want to >>> + ensure optimal performance with them 😊. */ >>> + {"en_US.UTF-8", >>> + L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"} >>> +}; >> >> Could you use use hexadecimal character escape in tests? Although gcc handle multiple >> -fexec-charset, trying to build it with a different compiler usually emits a lot of >> warnings. >> > > Hmm, I'm curious to know which compiler this breaks on; it's straight up UTF-8 so should parse just fine, no? I does not really breaks, but clang warns about character enconding (I forgot which was the specific error, I will check it). I have to fix a lot of internal usage to get a clean make check on clang [1]. [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=058bb0d51c3f85041c7cbd14704f72003bdbdee9 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark 2022-05-06 13:20 ` Adhemerval Zanella @ 2022-05-06 13:26 ` Siddhesh Poyarekar 2022-05-06 13:36 ` Siddhesh Poyarekar 0 siblings, 1 reply; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-06 13:26 UTC (permalink / raw) To: Adhemerval Zanella, libc-alpha On 06/05/2022 18:50, Adhemerval Zanella wrote: > > > On 06/05/2022 09:59, Siddhesh Poyarekar wrote: >> (Sorry I just committed it before I saw your message) >> >> On 06/05/2022 18:20, Adhemerval Zanella via Libc-alpha wrote: >>>> + /* 4-byte chars, because smileys are the universal language and we want to >>>> + ensure optimal performance with them 😊. */ >>>> + {"en_US.UTF-8", >>>> + L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"} >>>> +}; >>> >>> Could you use use hexadecimal character escape in tests? Although gcc handle multiple >>> -fexec-charset, trying to build it with a different compiler usually emits a lot of >>> warnings. >>> >> >> Hmm, I'm curious to know which compiler this breaks on; it's straight up UTF-8 so should parse just fine, no? > > I does not really breaks, but clang warns about character enconding (I forgot > which was the specific error, I will check it). I have to fix a lot of internal > usage to get a clean make check on clang [1]. > > [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=058bb0d51c3f85041c7cbd14704f72003bdbdee9 > Ah, so that's all ISO-8859, it shouldn't warn on utf-8. Siddhesh ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark 2022-05-06 13:26 ` Siddhesh Poyarekar @ 2022-05-06 13:36 ` Siddhesh Poyarekar 2022-05-06 13:46 ` Adhemerval Zanella 0 siblings, 1 reply; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-06 13:36 UTC (permalink / raw) To: Adhemerval Zanella, libc-alpha On 06/05/2022 18:56, Siddhesh Poyarekar wrote: > On 06/05/2022 18:50, Adhemerval Zanella wrote: >> >> >> On 06/05/2022 09:59, Siddhesh Poyarekar wrote: >>> (Sorry I just committed it before I saw your message) >>> >>> On 06/05/2022 18:20, Adhemerval Zanella via Libc-alpha wrote: >>>>> + /* 4-byte chars, because smileys are the universal language and >>>>> we want to >>>>> + ensure optimal performance with them 😊. */ >>>>> + {"en_US.UTF-8", >>>>> + >>>>> L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"} >>>>> >>>>> +}; >>>> >>>> Could you use use hexadecimal character escape in tests? Although >>>> gcc handle multiple >>>> -fexec-charset, trying to build it with a different compiler usually >>>> emits a lot of >>>> warnings. >>>> >>> >>> Hmm, I'm curious to know which compiler this breaks on; it's straight >>> up UTF-8 so should parse just fine, no? >> >> I does not really breaks, but clang warns about character enconding (I >> forgot >> which was the specific error, I will check it). I have to fix a lot of >> internal >> usage to get a clean make check on clang [1]. >> >> [1] >> https://sourceware.org/git/?p=glibc.git;a=commit;h=058bb0d51c3f85041c7cbd14704f72003bdbdee9 >> >> > > Ah, so that's all ISO-8859, it shouldn't warn on utf-8. In any case, if you see warnings on this, please pass it on to me and I'll fix it but it seems wrong for clang to warn on this. Thanks, Siddhesh ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark 2022-05-06 13:36 ` Siddhesh Poyarekar @ 2022-05-06 13:46 ` Adhemerval Zanella 0 siblings, 0 replies; 38+ messages in thread From: Adhemerval Zanella @ 2022-05-06 13:46 UTC (permalink / raw) To: Siddhesh Poyarekar, libc-alpha On 06/05/2022 10:36, Siddhesh Poyarekar wrote: > On 06/05/2022 18:56, Siddhesh Poyarekar wrote: >> On 06/05/2022 18:50, Adhemerval Zanella wrote: >>> >>> >>> On 06/05/2022 09:59, Siddhesh Poyarekar wrote: >>>> (Sorry I just committed it before I saw your message) >>>> >>>> On 06/05/2022 18:20, Adhemerval Zanella via Libc-alpha wrote: >>>>>> + /* 4-byte chars, because smileys are the universal language and we want to >>>>>> + ensure optimal performance with them 😊. */ >>>>>> + {"en_US.UTF-8", >>>>>> + L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"} >>>>>> +}; >>>>> >>>>> Could you use use hexadecimal character escape in tests? Although gcc handle multiple >>>>> -fexec-charset, trying to build it with a different compiler usually emits a lot of >>>>> warnings. >>>>> >>>> >>>> Hmm, I'm curious to know which compiler this breaks on; it's straight up UTF-8 so should parse just fine, no? >>> >>> I does not really breaks, but clang warns about character enconding (I forgot >>> which was the specific error, I will check it). I have to fix a lot of internal >>> usage to get a clean make check on clang [1]. >>> >>> [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=058bb0d51c3f85041c7cbd14704f72003bdbdee9 >>> >> >> Ah, so that's all ISO-8859, it shouldn't warn on utf-8. > > In any case, if you see warnings on this, please pass it on to me and I'll fix it but it seems wrong for clang to warn on this. Yeah, the issue is ISO-8859 in fact. The L'...' with UTF-8 does not trigger any issue. ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 2/2] wcrtomb: Make behavior POSIX compliant 2022-05-05 18:43 ` [PATCH 0/2] More compliant wcrtomb Siddhesh Poyarekar 2022-05-05 18:43 ` [PATCH 1/2] benchtests: Add wcrtomb microbenchmark Siddhesh Poyarekar @ 2022-05-05 18:43 ` Siddhesh Poyarekar 2022-05-06 9:25 ` Paul Eggert ` (2 more replies) 1 sibling, 3 replies; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-05 18:43 UTC (permalink / raw) To: libc-alpha; +Cc: fweimer, eggert, dickey, jakub, schwab The GNU implementation of wcrtomb assumes that there are at least MB_CUR_MAX bytes available in the destination buffer passed to wcrtomb as the first argument. This is not compatible with the POSIX definition, which only requires enough space for the input wide character. This does not break much in practice because when users supply buffers smaller than MB_CUR_MAX (e.g. in ncurses), they compute and dynamically allocate the buffer, which results in enough spare space (thanks to usable_size in malloc and padding in alloca) that no actual buffer overflow occurs. However when the code is built with _FORTIFY_SOURCE, it runs into the hard check against MB_CUR_MAX in __wcrtomb_chk and hence fails. It wasn't evident until now since dynamic allocations would result in wcrtomb not being fortified but since _FORTIFY_SOURCE=3, that limitation is gone, resulting in such code failing. To fix this problem, introduce an internal buffer that is MB_LEN_MAX long and use that to perform the conversion and then copy the resultant bytes into the destination buffer. Also move the fortification check into the main implementation, which checks the result after conversion and aborts if the resultant byte count is greater than the destination buffer size. One complication is that applications that assume the MB_CUR_MAX limitation to be gone may not be able to run safely on older glibcs if they use static destination buffers smaller than MB_CUR_MAX; dynamic allocations will always have enough spare space that no actual overruns will occur. One alternative to fixing this is to bump symbol version to prevent them from running on older glibcs but that seems too strict a constraint. Instead, since these users will only have made this decision on reading the manual, I have put a note in the manual warning them about the pitfalls of having static buffers smaller than MB_CUR_MAX and running them on older glibc. Benchmarking: The wcrtomb microbenchmark shows significant increases in maximum execution time for all locales, ranging from 10x for ar_SA.UTF-8 to 1.5x-2x for nearly everything else. The mean execution time however saw practically no impact, with some results even being quicker, indicating that cache locality has a much bigger role in the overhead. Given that the additional copy uses a temporary buffer inside wcrtomb, it's likely that a hot path will end up putting that buffer (which is responsible for the additional overhead) in a similar place on stack, giving the necessary cache locality to negate the overhead. However in situations where wcrtomb ends up getting called at wildly different spots on the call stack (or is on different call stacks, e.g. with threads or different execution contexts) and is still a hotspot, the performance lag will be visible. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> --- debug/tst-fortify.c | 7 ++++++- debug/wcrtomb_chk.c | 8 ++------ include/wchar.h | 4 ++++ manual/charset.texi | 10 +++++----- wcsmbs/wcrtomb.c | 34 ++++++++++++++++++++++++++-------- 5 files changed, 43 insertions(+), 20 deletions(-) diff --git a/debug/tst-fortify.c b/debug/tst-fortify.c index 03c9867714..8e94643bf2 100644 --- a/debug/tst-fortify.c +++ b/debug/tst-fortify.c @@ -1478,10 +1478,15 @@ do_test (void) character which has a multibyte representation which does not fit. */ CHK_FAIL_START - char smallbuf[2]; + char smallbuf[1]; if (wcrtomb (smallbuf, L'\x100', &s) != 2) FAIL (); CHK_FAIL_END + + /* Same input with a large enough buffer and we're good. */ + char bigenoughbuf[2]; + if (wcrtomb (bigenoughbuf, L'\x100', &s) != 2) + FAIL (); #endif wchar_t wenough[10]; diff --git a/debug/wcrtomb_chk.c b/debug/wcrtomb_chk.c index 8b6d026560..28c3ea0d2d 100644 --- a/debug/wcrtomb_chk.c +++ b/debug/wcrtomb_chk.c @@ -1,4 +1,5 @@ /* Copyright (C) 2005-2022 Free Software Foundation, Inc. + Copyright The GNU Toolchain Authors. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -25,10 +26,5 @@ size_t __wcrtomb_chk (char *s, wchar_t wchar, mbstate_t *ps, size_t buflen) { - /* We do not have to implement the full wctomb semantics since we - know that S cannot be NULL when we come here. */ - if (buflen < MB_CUR_MAX) - __chk_fail (); - - return __wcrtomb (s, wchar, ps); + return __wcrtomb_internal (s, wchar, ps, buflen); } diff --git a/include/wchar.h b/include/wchar.h index 4267985625..db83297bca 100644 --- a/include/wchar.h +++ b/include/wchar.h @@ -172,6 +172,10 @@ libc_hidden_proto (__mbrtowc) libc_hidden_proto (__mbrlen) extern size_t __wcrtomb (char *__restrict __s, wchar_t __wc, __mbstate_t *__restrict __ps) attribute_hidden; +extern size_t __wcrtomb_internal (char *__restrict __s, wchar_t __wc, + __mbstate_t *__restrict __ps, + size_t __s_size) + attribute_hidden; extern size_t __mbsrtowcs (wchar_t *__restrict __dst, const char **__restrict __src, size_t __len, __mbstate_t *__restrict __ps) diff --git a/manual/charset.texi b/manual/charset.texi index a9b5cb4a37..67fe5bf3c7 100644 --- a/manual/charset.texi +++ b/manual/charset.texi @@ -883,11 +883,11 @@ the string @var{s}. This includes all bytes representing shift sequences. One word about the interface of the function: there is no parameter -specifying the length of the array @var{s}. Instead the function -assumes that there are at least @code{MB_CUR_MAX} bytes available since -this is the maximum length of any byte sequence representing a single -character. So the caller has to make sure that there is enough space -available, otherwise buffer overruns can occur. +specifying the length of the array @var{s}, so the caller has to make sure +that there is enough space available, otherwise buffer overruns can occur. +Also, @theglibc{} versions older than 2.36 assume that @var{s} is at least +@var{MB_CUR_MAX} bytes long, so programs that need to run on older +@glibcadj{} versions must comply with this limit. @pindex wchar.h @code{wcrtomb} was introduced in @w{Amendment 1} to @w{ISO C90} and is diff --git a/wcsmbs/wcrtomb.c b/wcsmbs/wcrtomb.c index e17438989f..ad00b43390 100644 --- a/wcsmbs/wcrtomb.c +++ b/wcsmbs/wcrtomb.c @@ -1,4 +1,5 @@ /* Copyright (C) 1996-2022 Free Software Foundation, Inc. + Copyright The GNU Toolchain Authors. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -20,6 +21,7 @@ #include <errno.h> #include <gconv.h> #include <stdlib.h> +#include <string.h> #include <wchar.h> #include <wcsmbsload.h> @@ -34,7 +36,7 @@ static mbstate_t state; size_t -__wcrtomb (char *s, wchar_t wc, mbstate_t *ps) +__wcrtomb_internal (char *s, wchar_t wc, mbstate_t *ps, size_t s_size) { char buf[MB_LEN_MAX]; struct __gconv_step_data data; @@ -52,14 +54,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) /* A first special case is if S is NULL. This means put PS in the initial state. */ if (s == NULL) - { - s = buf; - wc = L'\0'; - } + wc = L'\0'; /* Tell where we want to have the result. */ - data.__outbuf = (unsigned char *) s; - data.__outbufend = (unsigned char *) s + MB_CUR_MAX; + data.__outbuf = (unsigned char *) buf; + data.__outbufend = (unsigned char *) buf + MB_CUR_MAX; /* Get the conversion functions. */ fcts = get_gconv_fcts (_NL_CURRENT_DATA (LC_CTYPE)); @@ -101,14 +100,33 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) if (status == __GCONV_OK || status == __GCONV_EMPTY_INPUT || status == __GCONV_FULL_OUTPUT) - result = data.__outbuf - (unsigned char *) s; + result = data.__outbuf - (unsigned char *) buf; else { result = (size_t) -1; __set_errno (EILSEQ); } + if (result != (size_t) -1 && s != NULL) + { + if (result > s_size) + __chk_fail (); + + if (__glibc_unlikely (result > 2)) + memcpy (s, buf, result); + else if (__glibc_unlikely (result > 1)) + *((uint16_t *) s) = *((uint16_t *) buf); + else + *s = *buf; + } + return result; } + +size_t +__wcrtomb (char *s, wchar_t wc, mbstate_t *ps) +{ + return __wcrtomb_internal (s, wc, ps, (size_t) -1); +} weak_alias (__wcrtomb, wcrtomb) libc_hidden_weak (wcrtomb) -- 2.35.1 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] wcrtomb: Make behavior POSIX compliant 2022-05-05 18:43 ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar @ 2022-05-06 9:25 ` Paul Eggert 2022-05-06 13:40 ` Adhemerval Zanella 2022-05-06 14:04 ` [PATCH v2] " Siddhesh Poyarekar 2022-05-12 13:15 ` [PATCH v3] " Siddhesh Poyarekar 2 siblings, 1 reply; 38+ messages in thread From: Paul Eggert @ 2022-05-06 9:25 UTC (permalink / raw) To: Siddhesh Poyarekar, libc-alpha; +Cc: fweimer, dickey, jakub, schwab On 5/5/22 11:43, Siddhesh Poyarekar wrote: > + else if (__glibc_unlikely (result > 1)) > + *((uint16_t *) s) = *((uint16_t *) buf); Shouldn't this be protected by "#if _STRING_ARCH_unaligned"? Also, unnecessary parens. But better yet, just call memcpy as the compiler will figure it out; see below. > - result = data.__outbuf - (unsigned char *) s; > + result = data.__outbuf - (unsigned char *) buf; > else > { > result = (size_t) -1; > __set_errno (EILSEQ); > } > > + if (result != (size_t) -1 && s != NULL) The 'result != (size_t) -1' can be omitted if you move that 'if' into the previous if's then-part. > + data.__outbufend = (unsigned char *) buf + MB_CUR_MAX; This'd be a bit faster (and less confusing) if we replace 'MB_CUR_MAX' with 'sizeof buf'. > + if (__glibc_unlikely (result > 2)) > + memcpy (s, buf, result); > + else if (__glibc_unlikely (result > 1)) > + *((uint16_t *) s) = *((uint16_t *) buf); > + else > + *s = *buf; If the likely path is result == 1, shouldn't that be checked first? Something like this: if (__glibc_likely (result < 2)) *s = *buf; else if (__glibc_likely (result == 2)) memcpy (s, buf, result); /* Help the compiler. */ else memcpy (s, buf, result); ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] wcrtomb: Make behavior POSIX compliant 2022-05-06 9:25 ` Paul Eggert @ 2022-05-06 13:40 ` Adhemerval Zanella 2022-05-06 13:46 ` Siddhesh Poyarekar 0 siblings, 1 reply; 38+ messages in thread From: Adhemerval Zanella @ 2022-05-06 13:40 UTC (permalink / raw) To: Paul Eggert, Siddhesh Poyarekar, libc-alpha Cc: fweimer, jakub, schwab, dickey On 06/05/2022 06:25, Paul Eggert wrote: > On 5/5/22 11:43, Siddhesh Poyarekar wrote: >> + else if (__glibc_unlikely (result > 1)) >> + *((uint16_t *) s) = *((uint16_t *) buf); > > Shouldn't this be protected by "#if _STRING_ARCH_unaligned"? Also, unnecessary parens. I think this is an aliasing violation and most likely will fault on architectures that do not support unaligned access (such as sparc and some arm environments). > > But better yet, just call memcpy as the compiler will figure it out; see below. > > >> - result = data.__outbuf - (unsigned char *) s; >> + result = data.__outbuf - (unsigned char *) buf; >> else >> { >> result = (size_t) -1; >> __set_errno (EILSEQ); >> } >> >> + if (result != (size_t) -1 && s != NULL) > > The 'result != (size_t) -1' can be omitted if you move that 'if' into the previous if's then-part. > > >> + data.__outbufend = (unsigned char *) buf + MB_CUR_MAX; > > This'd be a bit faster (and less confusing) if we replace 'MB_CUR_MAX' with 'sizeof buf'. > > >> + if (__glibc_unlikely (result > 2)) >> + memcpy (s, buf, result); >> + else if (__glibc_unlikely (result > 1)) >> + *((uint16_t *) s) = *((uint16_t *) buf); >> + else >> + *s = *buf; > > If the likely path is result == 1, shouldn't that be checked first? Something like this: > > if (__glibc_likely (result < 2)) > *s = *buf; > else if (__glibc_likely (result == 2)) > memcpy (s, buf, result); /* Help the compiler. */ > else > memcpy (s, buf, result); ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] wcrtomb: Make behavior POSIX compliant 2022-05-06 13:40 ` Adhemerval Zanella @ 2022-05-06 13:46 ` Siddhesh Poyarekar 0 siblings, 0 replies; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-06 13:46 UTC (permalink / raw) To: Adhemerval Zanella, Paul Eggert, libc-alpha Cc: fweimer, jakub, schwab, dickey On 06/05/2022 19:10, Adhemerval Zanella via Libc-alpha wrote: > > > On 06/05/2022 06:25, Paul Eggert wrote: >> On 5/5/22 11:43, Siddhesh Poyarekar wrote: >>> + else if (__glibc_unlikely (result > 1)) >>> + *((uint16_t *) s) = *((uint16_t *) buf); >> >> Shouldn't this be protected by "#if _STRING_ARCH_unaligned"? Also, unnecessary parens. > > I think this is an aliasing violation and most likely will fault on architectures > that do not support unaligned access (such as sparc and some arm environments). > I've dropped this in v2 in favour of calling memcpy and rely on the compiler to inline the call. I'll post as soon as `make check` is done. Thanks, Siddhesh ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v2] wcrtomb: Make behavior POSIX compliant 2022-05-05 18:43 ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar 2022-05-06 9:25 ` Paul Eggert @ 2022-05-06 14:04 ` Siddhesh Poyarekar 2022-05-09 13:22 ` Adhemerval Zanella 2022-05-12 13:15 ` [PATCH v3] " Siddhesh Poyarekar 2 siblings, 1 reply; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-06 14:04 UTC (permalink / raw) To: libc-alpha; +Cc: fweimer, jakub, schwab, dickey, eggert The GNU implementation of wcrtomb assumes that there are at least MB_CUR_MAX bytes available in the destination buffer passed to wcrtomb as the first argument. This is not compatible with the POSIX definition, which only requires enough space for the input wide character. This does not break much in practice because when users supply buffers smaller than MB_CUR_MAX (e.g. in ncurses), they compute and dynamically allocate the buffer, which results in enough spare space (thanks to usable_size in malloc and padding in alloca) that no actual buffer overflow occurs. However when the code is built with _FORTIFY_SOURCE, it runs into the hard check against MB_CUR_MAX in __wcrtomb_chk and hence fails. It wasn't evident until now since dynamic allocations would result in wcrtomb not being fortified but since _FORTIFY_SOURCE=3, that limitation is gone, resulting in such code failing. To fix this problem, introduce an internal buffer that is MB_LEN_MAX long and use that to perform the conversion and then copy the resultant bytes into the destination buffer. Also move the fortification check into the main implementation, which checks the result after conversion and aborts if the resultant byte count is greater than the destination buffer size. One complication is that applications that assume the MB_CUR_MAX limitation to be gone may not be able to run safely on older glibcs if they use static destination buffers smaller than MB_CUR_MAX; dynamic allocations will always have enough spare space that no actual overruns will occur. One alternative to fixing this is to bump symbol version to prevent them from running on older glibcs but that seems too strict a constraint. Instead, since these users will only have made this decision on reading the manual, I have put a note in the manual warning them about the pitfalls of having static buffers smaller than MB_CUR_MAX and running them on older glibc. Benchmarking: The wcrtomb microbenchmark shows significant increases in maximum execution time for all locales, ranging from 10x for ar_SA.UTF-8 to 1.5x-2x for nearly everything else. The mean execution time however saw practically no impact, with some results even being quicker, indicating that cache locality has a much bigger role in the overhead. Given that the additional copy uses a temporary buffer inside wcrtomb, it's likely that a hot path will end up putting that buffer (which is responsible for the additional overhead) in a similar place on stack, giving the necessary cache locality to negate the overhead. However in situations where wcrtomb ends up getting called at wildly different spots on the call stack (or is on different call stacks, e.g. with threads or different execution contexts) and is still a hotspot, the performance lag will be visible. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> --- Changes from v1: - Fixed nits suggested by Paul Eggert. debug/tst-fortify.c | 7 ++++++- debug/wcrtomb_chk.c | 8 ++------ include/wchar.h | 4 ++++ manual/charset.texi | 10 +++++----- wcsmbs/wcrtomb.c | 36 ++++++++++++++++++++++++++++-------- 5 files changed, 45 insertions(+), 20 deletions(-) diff --git a/debug/tst-fortify.c b/debug/tst-fortify.c index 03c9867714..8e94643bf2 100644 --- a/debug/tst-fortify.c +++ b/debug/tst-fortify.c @@ -1478,10 +1478,15 @@ do_test (void) character which has a multibyte representation which does not fit. */ CHK_FAIL_START - char smallbuf[2]; + char smallbuf[1]; if (wcrtomb (smallbuf, L'\x100', &s) != 2) FAIL (); CHK_FAIL_END + + /* Same input with a large enough buffer and we're good. */ + char bigenoughbuf[2]; + if (wcrtomb (bigenoughbuf, L'\x100', &s) != 2) + FAIL (); #endif wchar_t wenough[10]; diff --git a/debug/wcrtomb_chk.c b/debug/wcrtomb_chk.c index 8b6d026560..28c3ea0d2d 100644 --- a/debug/wcrtomb_chk.c +++ b/debug/wcrtomb_chk.c @@ -1,4 +1,5 @@ /* Copyright (C) 2005-2022 Free Software Foundation, Inc. + Copyright The GNU Toolchain Authors. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -25,10 +26,5 @@ size_t __wcrtomb_chk (char *s, wchar_t wchar, mbstate_t *ps, size_t buflen) { - /* We do not have to implement the full wctomb semantics since we - know that S cannot be NULL when we come here. */ - if (buflen < MB_CUR_MAX) - __chk_fail (); - - return __wcrtomb (s, wchar, ps); + return __wcrtomb_internal (s, wchar, ps, buflen); } diff --git a/include/wchar.h b/include/wchar.h index 4267985625..db83297bca 100644 --- a/include/wchar.h +++ b/include/wchar.h @@ -172,6 +172,10 @@ libc_hidden_proto (__mbrtowc) libc_hidden_proto (__mbrlen) extern size_t __wcrtomb (char *__restrict __s, wchar_t __wc, __mbstate_t *__restrict __ps) attribute_hidden; +extern size_t __wcrtomb_internal (char *__restrict __s, wchar_t __wc, + __mbstate_t *__restrict __ps, + size_t __s_size) + attribute_hidden; extern size_t __mbsrtowcs (wchar_t *__restrict __dst, const char **__restrict __src, size_t __len, __mbstate_t *__restrict __ps) diff --git a/manual/charset.texi b/manual/charset.texi index a9b5cb4a37..67fe5bf3c7 100644 --- a/manual/charset.texi +++ b/manual/charset.texi @@ -883,11 +883,11 @@ the string @var{s}. This includes all bytes representing shift sequences. One word about the interface of the function: there is no parameter -specifying the length of the array @var{s}. Instead the function -assumes that there are at least @code{MB_CUR_MAX} bytes available since -this is the maximum length of any byte sequence representing a single -character. So the caller has to make sure that there is enough space -available, otherwise buffer overruns can occur. +specifying the length of the array @var{s}, so the caller has to make sure +that there is enough space available, otherwise buffer overruns can occur. +Also, @theglibc{} versions older than 2.36 assume that @var{s} is at least +@var{MB_CUR_MAX} bytes long, so programs that need to run on older +@glibcadj{} versions must comply with this limit. @pindex wchar.h @code{wcrtomb} was introduced in @w{Amendment 1} to @w{ISO C90} and is diff --git a/wcsmbs/wcrtomb.c b/wcsmbs/wcrtomb.c index e17438989f..db12e46297 100644 --- a/wcsmbs/wcrtomb.c +++ b/wcsmbs/wcrtomb.c @@ -1,4 +1,5 @@ /* Copyright (C) 1996-2022 Free Software Foundation, Inc. + Copyright The GNU Toolchain Authors. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -20,6 +21,7 @@ #include <errno.h> #include <gconv.h> #include <stdlib.h> +#include <string.h> #include <wchar.h> #include <wcsmbsload.h> @@ -34,7 +36,7 @@ static mbstate_t state; size_t -__wcrtomb (char *s, wchar_t wc, mbstate_t *ps) +__wcrtomb_internal (char *s, wchar_t wc, mbstate_t *ps, size_t s_size) { char buf[MB_LEN_MAX]; struct __gconv_step_data data; @@ -52,14 +54,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) /* A first special case is if S is NULL. This means put PS in the initial state. */ if (s == NULL) - { - s = buf; - wc = L'\0'; - } + wc = L'\0'; /* Tell where we want to have the result. */ - data.__outbuf = (unsigned char *) s; - data.__outbufend = (unsigned char *) s + MB_CUR_MAX; + data.__outbuf = (unsigned char *) buf; + data.__outbufend = (unsigned char *) buf + sizeof buf; /* Get the conversion functions. */ fcts = get_gconv_fcts (_NL_CURRENT_DATA (LC_CTYPE)); @@ -101,7 +100,22 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) if (status == __GCONV_OK || status == __GCONV_EMPTY_INPUT || status == __GCONV_FULL_OUTPUT) - result = data.__outbuf - (unsigned char *) s; + { + result = data.__outbuf - (unsigned char *) buf; + + if (s != NULL) + { + if (result > s_size) + __chk_fail (); + + if (__glibc_likely (result < 2)) + *s = *buf; + else if (__glibc_likely (result == 2)) + memcpy (s, buf, result); /* Help the compiler. */ + else + memcpy (s, buf, result); + } + } else { result = (size_t) -1; @@ -110,5 +124,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) return result; } + +size_t +__wcrtomb (char *s, wchar_t wc, mbstate_t *ps) +{ + return __wcrtomb_internal (s, wc, ps, (size_t) -1); +} weak_alias (__wcrtomb, wcrtomb) libc_hidden_weak (wcrtomb) -- 2.35.1 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2] wcrtomb: Make behavior POSIX compliant 2022-05-06 14:04 ` [PATCH v2] " Siddhesh Poyarekar @ 2022-05-09 13:22 ` Adhemerval Zanella 2022-05-09 13:35 ` Siddhesh Poyarekar 0 siblings, 1 reply; 38+ messages in thread From: Adhemerval Zanella @ 2022-05-09 13:22 UTC (permalink / raw) To: libc-alpha On 06/05/2022 11:04, Siddhesh Poyarekar via Libc-alpha wrote: > + > + if (__glibc_likely (result < 2)) > + *s = *buf; > + else if (__glibc_likely (result == 2)) > + memcpy (s, buf, result); /* Help the compiler. */ > + else > + memcpy (s, buf, result); > + } > + } I am not following how this help the compiler here, it just duplicates the same memcpy call. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2] wcrtomb: Make behavior POSIX compliant 2022-05-09 13:22 ` Adhemerval Zanella @ 2022-05-09 13:35 ` Siddhesh Poyarekar 0 siblings, 0 replies; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-09 13:35 UTC (permalink / raw) To: Adhemerval Zanella, libc-alpha On 09/05/2022 18:52, Adhemerval Zanella via Libc-alpha wrote: > > > On 06/05/2022 11:04, Siddhesh Poyarekar via Libc-alpha wrote: > >> + >> + if (__glibc_likely (result < 2)) >> + *s = *buf; >> + else if (__glibc_likely (result == 2)) >> + memcpy (s, buf, result); /* Help the compiler. */ >> + else >> + memcpy (s, buf, result); >> + } >> + } > > I am not following how this help the compiler here, it just duplicates the > same memcpy call. > It should be able to inline the first memcpy call into a copy instruction sequence. With just an `else`, result may be 2 or more, so the compiler doesn't have enough information to do any inlining. However, it just occurred to me that maybe something like this: if (result <= MB_LEN_MAX) memcpy (s, buf, result); else __builtin_unreachable (); will likely produce an always avoid the memcpy call and generate an inlined sequence. I'll test that and repost. Thanks for making me think about this a bit more. I'll also update the manual based on the discussion in the weekly call to make the blurb more backport-friendly. Thanks, Siddhesh ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v3] wcrtomb: Make behavior POSIX compliant 2022-05-05 18:43 ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar 2022-05-06 9:25 ` Paul Eggert 2022-05-06 14:04 ` [PATCH v2] " Siddhesh Poyarekar @ 2022-05-12 13:15 ` Siddhesh Poyarekar 2022-05-13 4:56 ` Paul Eggert 2 siblings, 1 reply; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-12 13:15 UTC (permalink / raw) To: libc-alpha; +Cc: adhemerval.zanella, fweimer, dickey, eggert The GNU implementation of wcrtomb assumes that there are at least MB_CUR_MAX bytes available in the destination buffer passed to wcrtomb as the first argument. This is not compatible with the POSIX definition, which only requires enough space for the input wide character. This does not break much in practice because when users supply buffers smaller than MB_CUR_MAX (e.g. in ncurses), they compute and dynamically allocate the buffer, which results in enough spare space (thanks to usable_size in malloc and padding in alloca) that no actual buffer overflow occurs. However when the code is built with _FORTIFY_SOURCE, it runs into the hard check against MB_CUR_MAX in __wcrtomb_chk and hence fails. It wasn't evident until now since dynamic allocations would result in wcrtomb not being fortified but since _FORTIFY_SOURCE=3, that limitation is gone, resulting in such code failing. To fix this problem, introduce an internal buffer that is MB_LEN_MAX long and use that to perform the conversion and then copy the resultant bytes into the destination buffer. Also move the fortification check into the main implementation, which checks the result after conversion and aborts if the resultant byte count is greater than the destination buffer size. One complication is that applications that assume the MB_CUR_MAX limitation to be gone may not be able to run safely on older glibcs if they use static destination buffers smaller than MB_CUR_MAX; dynamic allocations will always have enough spare space that no actual overruns will occur. One alternative to fixing this is to bump symbol version to prevent them from running on older glibcs but that seems too strict a constraint. Instead, since these users will only have made this decision on reading the manual, I have put a note in the manual warning them about the pitfalls of having static buffers smaller than MB_CUR_MAX and running them on older glibc. Benchmarking: The wcrtomb microbenchmark shows significant increases in maximum execution time for all locales, ranging from 10x for ar_SA.UTF-8 to 1.5x-2x for nearly everything else. The mean execution time however saw practically no impact, with some results even being quicker, indicating that cache locality has a much bigger role in the overhead. Given that the additional copy uses a temporary buffer inside wcrtomb, it's likely that a hot path will end up putting that buffer (which is responsible for the additional overhead) in a similar place on stack, giving the necessary cache locality to negate the overhead. However in situations where wcrtomb ends up getting called at wildly different spots on the call stack (or is on different call stacks, e.g. with threads or different execution contexts) and is still a hotspot, the performance lag will be visible. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> --- Changes from v2: - Updated manual blurb to refer to "this version" so that it is backport-clean. - Changed copy logic to inline copies of up to 4 bytes. It is 1-7% faster that the previous inlining of only 1 and 2 bytes and a good 10%+ faster than using plain memcpy. 4 bytes is the maximum length for byte encodings in charmaps shipped in glibc. Changes from v1: - Fixed nits suggested by Paul Eggert. debug/tst-fortify.c | 7 ++++++- debug/wcrtomb_chk.c | 8 ++----- include/wchar.h | 4 ++++ manual/charset.texi | 11 +++++----- wcsmbs/wcrtomb.c | 51 ++++++++++++++++++++++++++++++++++++++------- 5 files changed, 61 insertions(+), 20 deletions(-) diff --git a/debug/tst-fortify.c b/debug/tst-fortify.c index 03c9867714..8e94643bf2 100644 --- a/debug/tst-fortify.c +++ b/debug/tst-fortify.c @@ -1478,10 +1478,15 @@ do_test (void) character which has a multibyte representation which does not fit. */ CHK_FAIL_START - char smallbuf[2]; + char smallbuf[1]; if (wcrtomb (smallbuf, L'\x100', &s) != 2) FAIL (); CHK_FAIL_END + + /* Same input with a large enough buffer and we're good. */ + char bigenoughbuf[2]; + if (wcrtomb (bigenoughbuf, L'\x100', &s) != 2) + FAIL (); #endif wchar_t wenough[10]; diff --git a/debug/wcrtomb_chk.c b/debug/wcrtomb_chk.c index 8b6d026560..28c3ea0d2d 100644 --- a/debug/wcrtomb_chk.c +++ b/debug/wcrtomb_chk.c @@ -1,4 +1,5 @@ /* Copyright (C) 2005-2022 Free Software Foundation, Inc. + Copyright The GNU Toolchain Authors. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -25,10 +26,5 @@ size_t __wcrtomb_chk (char *s, wchar_t wchar, mbstate_t *ps, size_t buflen) { - /* We do not have to implement the full wctomb semantics since we - know that S cannot be NULL when we come here. */ - if (buflen < MB_CUR_MAX) - __chk_fail (); - - return __wcrtomb (s, wchar, ps); + return __wcrtomb_internal (s, wchar, ps, buflen); } diff --git a/include/wchar.h b/include/wchar.h index 4267985625..db83297bca 100644 --- a/include/wchar.h +++ b/include/wchar.h @@ -172,6 +172,10 @@ libc_hidden_proto (__mbrtowc) libc_hidden_proto (__mbrlen) extern size_t __wcrtomb (char *__restrict __s, wchar_t __wc, __mbstate_t *__restrict __ps) attribute_hidden; +extern size_t __wcrtomb_internal (char *__restrict __s, wchar_t __wc, + __mbstate_t *__restrict __ps, + size_t __s_size) + attribute_hidden; extern size_t __mbsrtowcs (wchar_t *__restrict __dst, const char **__restrict __src, size_t __len, __mbstate_t *__restrict __ps) diff --git a/manual/charset.texi b/manual/charset.texi index a9b5cb4a37..427db3bc80 100644 --- a/manual/charset.texi +++ b/manual/charset.texi @@ -883,11 +883,12 @@ the string @var{s}. This includes all bytes representing shift sequences. One word about the interface of the function: there is no parameter -specifying the length of the array @var{s}. Instead the function -assumes that there are at least @code{MB_CUR_MAX} bytes available since -this is the maximum length of any byte sequence representing a single -character. So the caller has to make sure that there is enough space -available, otherwise buffer overruns can occur. +specifying the length of the array @var{s}, so the caller has to make sure +that there is enough space available, otherwise buffer overruns can occur. +This version of @theglibc{} does not assume that @var{s} is at least +@var{MB_CUR_MAX} bytes long, but programs that need to run on @glibcadj{} +versions that have this assumption documented in the manual must comply +with this limit. @pindex wchar.h @code{wcrtomb} was introduced in @w{Amendment 1} to @w{ISO C90} and is diff --git a/wcsmbs/wcrtomb.c b/wcsmbs/wcrtomb.c index e17438989f..d68dcdac82 100644 --- a/wcsmbs/wcrtomb.c +++ b/wcsmbs/wcrtomb.c @@ -1,4 +1,5 @@ /* Copyright (C) 1996-2022 Free Software Foundation, Inc. + Copyright The GNU Toolchain Authors. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -20,6 +21,7 @@ #include <errno.h> #include <gconv.h> #include <stdlib.h> +#include <string.h> #include <wchar.h> #include <wcsmbsload.h> @@ -34,7 +36,7 @@ static mbstate_t state; size_t -__wcrtomb (char *s, wchar_t wc, mbstate_t *ps) +__wcrtomb_internal (char *s, wchar_t wc, mbstate_t *ps, size_t s_size) { char buf[MB_LEN_MAX]; struct __gconv_step_data data; @@ -52,14 +54,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) /* A first special case is if S is NULL. This means put PS in the initial state. */ if (s == NULL) - { - s = buf; - wc = L'\0'; - } + wc = L'\0'; /* Tell where we want to have the result. */ - data.__outbuf = (unsigned char *) s; - data.__outbufend = (unsigned char *) s + MB_CUR_MAX; + data.__outbuf = (unsigned char *) buf; + data.__outbufend = (unsigned char *) buf + sizeof buf; /* Get the conversion functions. */ fcts = get_gconv_fcts (_NL_CURRENT_DATA (LC_CTYPE)); @@ -101,7 +100,37 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) if (status == __GCONV_OK || status == __GCONV_EMPTY_INPUT || status == __GCONV_FULL_OUTPUT) - result = data.__outbuf - (unsigned char *) s; + { + result = data.__outbuf - (unsigned char *) buf; + + if (s != NULL) + { + if (result > s_size) + __chk_fail (); + + /* The maximum byte size in all charmaps is 4, so inline copies up to + 4 bytes. One must benchmark before bumping this in future because + the returns for inlining will diminish as the number of branches + increases. */ + switch (result) + { + case 4: + s[3] = buf[3]; + /* Fall through. */ + case 3: + s[2] = buf[2]; + /* Fall through. */ + case 2: + s[1] = buf[1]; + /* Fall through. */ + case 1: + s[0] = buf[0]; + break; + default: + memcpy (s, buf, result); + } + } + } else { result = (size_t) -1; @@ -110,5 +139,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) return result; } + +size_t +__wcrtomb (char *s, wchar_t wc, mbstate_t *ps) +{ + return __wcrtomb_internal (s, wc, ps, (size_t) -1); +} weak_alias (__wcrtomb, wcrtomb) libc_hidden_weak (wcrtomb) -- 2.35.1 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant 2022-05-12 13:15 ` [PATCH v3] " Siddhesh Poyarekar @ 2022-05-13 4:56 ` Paul Eggert 2022-05-13 5:28 ` Paul Eggert 2022-05-13 8:18 ` [PATCH v3] " Siddhesh Poyarekar 0 siblings, 2 replies; 38+ messages in thread From: Paul Eggert @ 2022-05-13 4:56 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: adhemerval.zanella, fweimer, dickey, libc-alpha On 5/12/22 06:15, Siddhesh Poyarekar wrote: > + switch (result) > + { > + case 4: > + s[3] = buf[3]; > + /* Fall through. */ > + case 3: > + s[2] = buf[2]; > + /* Fall through. */ > + case 2: > + s[1] = buf[1]; > + /* Fall through. */ > + case 1: > + s[0] = buf[0]; > + break; > + default: > + memcpy (s, buf, result); > + } For me with GCC 12.1 -O2 on x86-64, the above code generates 2 comparisons and 3 conditional branches in the common case where RESULT is 1. Plus, GCC generates a jmp from the end of case 3 to the start of case 2 (which precedes case 3 in the machine code), which is a bit odd. How about something like the following instead? This generates machine code with only one conditional branch - the one that decides whether to call memcpy - and this branch is never taken with glibc-supplied charmaps. (I'm assuming RESULT is at least 1.) s[0] = buf[0]; if (2 <= result && result <= 4) { s[1] = buf[1]; memcpy (&s[result != 2], &buf[result != 2], 2); s[result - 1] = buf[result - 1]; } else memcpy (s, buf, result); Hope you don't mind my bikeshedding. (PATCH v3 looks correct as-is, for what it's worth.) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant 2022-05-13 4:56 ` Paul Eggert @ 2022-05-13 5:28 ` Paul Eggert 2022-05-13 11:31 ` Siddhesh Poyarekar 2022-05-13 8:18 ` [PATCH v3] " Siddhesh Poyarekar 1 sibling, 1 reply; 38+ messages in thread From: Paul Eggert @ 2022-05-13 5:28 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: fweimer, libc-alpha, dickey On 5/12/22 21:56, Paul Eggert wrote: > Hope you don't mind my bikeshedding. Better yet, this: s[0] = buf[0]; if (2 <= result && result <= 4) { s[1] = buf[1]; memcpy (&s[result - 2], &buf[result - 2], 2); } else memcpy (s, buf, result); On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is only 9 straight-line instructions, counting the compare insn and the conditional-branch insn that is never taken. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant 2022-05-13 5:28 ` Paul Eggert @ 2022-05-13 11:31 ` Siddhesh Poyarekar 2022-05-13 11:38 ` Florian Weimer 0 siblings, 1 reply; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-13 11:31 UTC (permalink / raw) To: Paul Eggert; +Cc: fweimer, libc-alpha, dickey On 13/05/2022 10:58, Paul Eggert wrote: > On 5/12/22 21:56, Paul Eggert wrote: >> Hope you don't mind my bikeshedding. > > Better yet, this: > > s[0] = buf[0]; > if (2 <= result && result <= 4) > { > s[1] = buf[1]; > memcpy (&s[result - 2], &buf[result - 2], 2); > } > else > memcpy (s, buf, result); > > On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is only 9 > straight-line instructions, counting the compare insn and the > conditional-branch insn that is never taken. > Sorry I missed this one. I tried it and with gcc 11 it seems to produce worse code, merging the two memcpys instead of inlining the first one. Siddhesh ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant 2022-05-13 11:31 ` Siddhesh Poyarekar @ 2022-05-13 11:38 ` Florian Weimer 2022-05-13 11:51 ` Siddhesh Poyarekar 2022-05-13 12:30 ` Adhemerval Zanella 0 siblings, 2 replies; 38+ messages in thread From: Florian Weimer @ 2022-05-13 11:38 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: Paul Eggert, libc-alpha, dickey * Siddhesh Poyarekar: > On 13/05/2022 10:58, Paul Eggert wrote: >> On 5/12/22 21:56, Paul Eggert wrote: >>> Hope you don't mind my bikeshedding. >> Better yet, this: >> s[0] = buf[0]; >> if (2 <= result && result <= 4) >> { >> s[1] = buf[1]; >> memcpy (&s[result - 2], &buf[result - 2], 2); >> } >> else >> memcpy (s, buf, result); >> On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is >> only 9 straight-line instructions, counting the compare insn and the >> conditional-branch insn that is never taken. >> > > Sorry I missed this one. I tried it and with gcc 11 it seems to > produce worse code, merging the two memcpys instead of inlining the > first one. Can we please use the simplified code? “char buf[MB_LEN_MAX];” already gives GCC a strong hint this is a short memcpy. If it is beneficial on some architectures to inline such short, but still variable memcpy calls, we should teach the compiler how to do it, and not do it manually. And wcrtomb would for sure benefit more from a fast path for ASCII and UTF-8. 8-p (Not that I think there are many users out there, given glibc's historically poor performance in the multibyte/wide string conversion space.) Thanks, Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant 2022-05-13 11:38 ` Florian Weimer @ 2022-05-13 11:51 ` Siddhesh Poyarekar 2022-05-13 12:55 ` Florian Weimer 2022-05-13 12:30 ` Adhemerval Zanella 1 sibling, 1 reply; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-13 11:51 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha, dickey On 13/05/2022 17:08, Florian Weimer via Libc-alpha wrote: > * Siddhesh Poyarekar: > >> On 13/05/2022 10:58, Paul Eggert wrote: >>> On 5/12/22 21:56, Paul Eggert wrote: >>>> Hope you don't mind my bikeshedding. >>> Better yet, this: >>> s[0] = buf[0]; >>> if (2 <= result && result <= 4) >>> { >>> s[1] = buf[1]; >>> memcpy (&s[result - 2], &buf[result - 2], 2); >>> } >>> else >>> memcpy (s, buf, result); >>> On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is >>> only 9 straight-line instructions, counting the compare insn and the >>> conditional-branch insn that is never taken. >>> >> >> Sorry I missed this one. I tried it and with gcc 11 it seems to >> produce worse code, merging the two memcpys instead of inlining the >> first one. > > Can we please use the simplified code? So just a memcpy call? I could push that and attempt to micro-optimize if there are reports of slowdown due to this. Siddhesh ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant 2022-05-13 11:51 ` Siddhesh Poyarekar @ 2022-05-13 12:55 ` Florian Weimer 0 siblings, 0 replies; 38+ messages in thread From: Florian Weimer @ 2022-05-13 12:55 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: libc-alpha, dickey * Siddhesh Poyarekar: > On 13/05/2022 17:08, Florian Weimer via Libc-alpha wrote: >> * Siddhesh Poyarekar: >> >>> On 13/05/2022 10:58, Paul Eggert wrote: >>>> On 5/12/22 21:56, Paul Eggert wrote: >>>>> Hope you don't mind my bikeshedding. >>>> Better yet, this: >>>> s[0] = buf[0]; >>>> if (2 <= result && result <= 4) >>>> { >>>> s[1] = buf[1]; >>>> memcpy (&s[result - 2], &buf[result - 2], 2); >>>> } >>>> else >>>> memcpy (s, buf, result); >>>> On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is >>>> only 9 straight-line instructions, counting the compare insn and the >>>> conditional-branch insn that is never taken. >>>> >>> >>> Sorry I missed this one. I tried it and with gcc 11 it seems to >>> produce worse code, merging the two memcpys instead of inlining the >>> first one. >> Can we please use the simplified code? > > So just a memcpy call? I could push that and attempt to > micro-optimize if there are reports of slowdown due to this. Yes, I would prefer that. Thanks, Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant 2022-05-13 11:38 ` Florian Weimer 2022-05-13 11:51 ` Siddhesh Poyarekar @ 2022-05-13 12:30 ` Adhemerval Zanella 2022-05-13 13:42 ` Siddhesh Poyarekar 2022-05-13 13:45 ` [committed] " Siddhesh Poyarekar 1 sibling, 2 replies; 38+ messages in thread From: Adhemerval Zanella @ 2022-05-13 12:30 UTC (permalink / raw) To: libc-alpha On 13/05/2022 08:38, Florian Weimer via Libc-alpha wrote: > * Siddhesh Poyarekar: > >> On 13/05/2022 10:58, Paul Eggert wrote: >>> On 5/12/22 21:56, Paul Eggert wrote: >>>> Hope you don't mind my bikeshedding. >>> Better yet, this: >>> s[0] = buf[0]; >>> if (2 <= result && result <= 4) >>> { >>> s[1] = buf[1]; >>> memcpy (&s[result - 2], &buf[result - 2], 2); >>> } >>> else >>> memcpy (s, buf, result); >>> On x86-64 with GCC 12.1 -O2 and a glibc-supplied charmap, this is >>> only 9 straight-line instructions, counting the compare insn and the >>> conditional-branch insn that is never taken. >>> >> >> Sorry I missed this one. I tried it and with gcc 11 it seems to >> produce worse code, merging the two memcpys instead of inlining the >> first one. > > Can we please use the simplified code? > > “char buf[MB_LEN_MAX];” already gives GCC a strong hint this is a short > memcpy. If it is beneficial on some architectures to inline such short, > but still variable memcpy calls, we should teach the compiler how to do > it, and not do it manually. Totally agreed. > > And wcrtomb would for sure benefit more from a fast path for ASCII and > UTF-8. 8-p And it does make sene. > > (Not that I think there are many users out there, given glibc's > historically poor performance in the multibyte/wide string conversion > space.) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant 2022-05-13 12:30 ` Adhemerval Zanella @ 2022-05-13 13:42 ` Siddhesh Poyarekar 2022-05-13 17:58 ` Paul Eggert 2022-05-13 13:45 ` [committed] " Siddhesh Poyarekar 1 sibling, 1 reply; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-13 13:42 UTC (permalink / raw) To: Adhemerval Zanella, libc-alpha On 13/05/2022 18:00, Adhemerval Zanella via Libc-alpha wrote: >> Can we please use the simplified code? >> >> “char buf[MB_LEN_MAX];” already gives GCC a strong hint this is a short >> memcpy. If it is beneficial on some architectures to inline such short, >> but still variable memcpy calls, we should teach the compiler how to do >> it, and not do it manually. > > Totally agreed. > Ugh, both of you just kill all the fun. I'll push a fix with just the memcpy ;) Thanks, Siddhesh ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant 2022-05-13 13:42 ` Siddhesh Poyarekar @ 2022-05-13 17:58 ` Paul Eggert 0 siblings, 0 replies; 38+ messages in thread From: Paul Eggert @ 2022-05-13 17:58 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: Adhemerval Zanella, libc-alpha On 5/13/22 06:42, Siddhesh Poyarekar wrote: > Ugh, both of you just kill all the fun. I'll push a fix with just the > memcpy ;) Just as well, as the code I sent before is slow for ASCII anyway. I shouldn't let that stand as my suggestion, even if just for fun. > with gcc 11 it seems to produce worse code, merging the two memcpys instead of inlining the first one. Here's a better version if someone ever wants to tune this stuff. In this version there's only one memcpy so GCC won't merge it. On x86-64 with GCC 12.1 -O2, if RESULT<=2 this executes 6 instructions (including 1 comparison and 1 conditional branch); if 3<=RESULT<=4 this executes 8 instructions total (including 2 comparisons and 2 conditional branches). if (result <= 2) { s[0] = buf[0]; s[result - 1] = buf[result - 1]; } else if (result <= 4) { s[0] = buf[0]; s[1] = buf[1]; s[result - 2] = buf[result - 2]; s[result - 1] = buf[result - 1]; } else memcpy (s, buf, result); There are of course other variants.... ^ permalink raw reply [flat|nested] 38+ messages in thread
* [committed] wcrtomb: Make behavior POSIX compliant 2022-05-13 12:30 ` Adhemerval Zanella 2022-05-13 13:42 ` Siddhesh Poyarekar @ 2022-05-13 13:45 ` Siddhesh Poyarekar 1 sibling, 0 replies; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-13 13:45 UTC (permalink / raw) To: libc-alpha The GNU implementation of wcrtomb assumes that there are at least MB_CUR_MAX bytes available in the destination buffer passed to wcrtomb as the first argument. This is not compatible with the POSIX definition, which only requires enough space for the input wide character. This does not break much in practice because when users supply buffers smaller than MB_CUR_MAX (e.g. in ncurses), they compute and dynamically allocate the buffer, which results in enough spare space (thanks to usable_size in malloc and padding in alloca) that no actual buffer overflow occurs. However when the code is built with _FORTIFY_SOURCE, it runs into the hard check against MB_CUR_MAX in __wcrtomb_chk and hence fails. It wasn't evident until now since dynamic allocations would result in wcrtomb not being fortified but since _FORTIFY_SOURCE=3, that limitation is gone, resulting in such code failing. To fix this problem, introduce an internal buffer that is MB_LEN_MAX long and use that to perform the conversion and then copy the resultant bytes into the destination buffer. Also move the fortification check into the main implementation, which checks the result after conversion and aborts if the resultant byte count is greater than the destination buffer size. One complication is that applications that assume the MB_CUR_MAX limitation to be gone may not be able to run safely on older glibcs if they use static destination buffers smaller than MB_CUR_MAX; dynamic allocations will always have enough spare space that no actual overruns will occur. One alternative to fixing this is to bump symbol version to prevent them from running on older glibcs but that seems too strict a constraint. Instead, since these users will only have made this decision on reading the manual, I have put a note in the manual warning them about the pitfalls of having static buffers smaller than MB_CUR_MAX and running them on older glibc. Benchmarking: The wcrtomb microbenchmark shows significant increases in maximum execution time for all locales, ranging from 10x for ar_SA.UTF-8 to 1.5x-2x for nearly everything else. The mean execution time however saw practically no impact, with some results even being quicker, indicating that cache locality has a much bigger role in the overhead. Given that the additional copy uses a temporary buffer inside wcrtomb, it's likely that a hot path will end up putting that buffer (which is responsible for the additional overhead) in a similar place on stack, giving the necessary cache locality to negate the overhead. However in situations where wcrtomb ends up getting called at wildly different spots on the call stack (or is on different call stacks, e.g. with threads or different execution contexts) and is still a hotspot, the performance lag will be visible. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> --- debug/tst-fortify.c | 7 ++++++- debug/wcrtomb_chk.c | 8 ++------ include/wchar.h | 4 ++++ manual/charset.texi | 11 ++++++----- wcsmbs/wcrtomb.c | 31 +++++++++++++++++++++++-------- 5 files changed, 41 insertions(+), 20 deletions(-) diff --git a/debug/tst-fortify.c b/debug/tst-fortify.c index 03c9867714..8e94643bf2 100644 --- a/debug/tst-fortify.c +++ b/debug/tst-fortify.c @@ -1478,10 +1478,15 @@ do_test (void) character which has a multibyte representation which does not fit. */ CHK_FAIL_START - char smallbuf[2]; + char smallbuf[1]; if (wcrtomb (smallbuf, L'\x100', &s) != 2) FAIL (); CHK_FAIL_END + + /* Same input with a large enough buffer and we're good. */ + char bigenoughbuf[2]; + if (wcrtomb (bigenoughbuf, L'\x100', &s) != 2) + FAIL (); #endif wchar_t wenough[10]; diff --git a/debug/wcrtomb_chk.c b/debug/wcrtomb_chk.c index 8b6d026560..28c3ea0d2d 100644 --- a/debug/wcrtomb_chk.c +++ b/debug/wcrtomb_chk.c @@ -1,4 +1,5 @@ /* Copyright (C) 2005-2022 Free Software Foundation, Inc. + Copyright The GNU Toolchain Authors. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -25,10 +26,5 @@ size_t __wcrtomb_chk (char *s, wchar_t wchar, mbstate_t *ps, size_t buflen) { - /* We do not have to implement the full wctomb semantics since we - know that S cannot be NULL when we come here. */ - if (buflen < MB_CUR_MAX) - __chk_fail (); - - return __wcrtomb (s, wchar, ps); + return __wcrtomb_internal (s, wchar, ps, buflen); } diff --git a/include/wchar.h b/include/wchar.h index 4267985625..db83297bca 100644 --- a/include/wchar.h +++ b/include/wchar.h @@ -172,6 +172,10 @@ libc_hidden_proto (__mbrtowc) libc_hidden_proto (__mbrlen) extern size_t __wcrtomb (char *__restrict __s, wchar_t __wc, __mbstate_t *__restrict __ps) attribute_hidden; +extern size_t __wcrtomb_internal (char *__restrict __s, wchar_t __wc, + __mbstate_t *__restrict __ps, + size_t __s_size) + attribute_hidden; extern size_t __mbsrtowcs (wchar_t *__restrict __dst, const char **__restrict __src, size_t __len, __mbstate_t *__restrict __ps) diff --git a/manual/charset.texi b/manual/charset.texi index a9b5cb4a37..427db3bc80 100644 --- a/manual/charset.texi +++ b/manual/charset.texi @@ -883,11 +883,12 @@ the string @var{s}. This includes all bytes representing shift sequences. One word about the interface of the function: there is no parameter -specifying the length of the array @var{s}. Instead the function -assumes that there are at least @code{MB_CUR_MAX} bytes available since -this is the maximum length of any byte sequence representing a single -character. So the caller has to make sure that there is enough space -available, otherwise buffer overruns can occur. +specifying the length of the array @var{s}, so the caller has to make sure +that there is enough space available, otherwise buffer overruns can occur. +This version of @theglibc{} does not assume that @var{s} is at least +@var{MB_CUR_MAX} bytes long, but programs that need to run on @glibcadj{} +versions that have this assumption documented in the manual must comply +with this limit. @pindex wchar.h @code{wcrtomb} was introduced in @w{Amendment 1} to @w{ISO C90} and is diff --git a/wcsmbs/wcrtomb.c b/wcsmbs/wcrtomb.c index e17438989f..c0cce3792f 100644 --- a/wcsmbs/wcrtomb.c +++ b/wcsmbs/wcrtomb.c @@ -1,4 +1,5 @@ /* Copyright (C) 1996-2022 Free Software Foundation, Inc. + Copyright The GNU Toolchain Authors. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -20,6 +21,7 @@ #include <errno.h> #include <gconv.h> #include <stdlib.h> +#include <string.h> #include <wchar.h> #include <wcsmbsload.h> @@ -34,7 +36,7 @@ static mbstate_t state; size_t -__wcrtomb (char *s, wchar_t wc, mbstate_t *ps) +__wcrtomb_internal (char *s, wchar_t wc, mbstate_t *ps, size_t s_size) { char buf[MB_LEN_MAX]; struct __gconv_step_data data; @@ -52,14 +54,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) /* A first special case is if S is NULL. This means put PS in the initial state. */ if (s == NULL) - { - s = buf; - wc = L'\0'; - } + wc = L'\0'; /* Tell where we want to have the result. */ - data.__outbuf = (unsigned char *) s; - data.__outbufend = (unsigned char *) s + MB_CUR_MAX; + data.__outbuf = (unsigned char *) buf; + data.__outbufend = (unsigned char *) buf + sizeof buf; /* Get the conversion functions. */ fcts = get_gconv_fcts (_NL_CURRENT_DATA (LC_CTYPE)); @@ -101,7 +100,17 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) if (status == __GCONV_OK || status == __GCONV_EMPTY_INPUT || status == __GCONV_FULL_OUTPUT) - result = data.__outbuf - (unsigned char *) s; + { + result = data.__outbuf - (unsigned char *) buf; + + if (s != NULL) + { + if (result > s_size) + __chk_fail (); + + memcpy (s, buf, result); + } + } else { result = (size_t) -1; @@ -110,5 +119,11 @@ __wcrtomb (char *s, wchar_t wc, mbstate_t *ps) return result; } + +size_t +__wcrtomb (char *s, wchar_t wc, mbstate_t *ps) +{ + return __wcrtomb_internal (s, wc, ps, (size_t) -1); +} weak_alias (__wcrtomb, wcrtomb) libc_hidden_weak (wcrtomb) -- 2.35.1 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant 2022-05-13 4:56 ` Paul Eggert 2022-05-13 5:28 ` Paul Eggert @ 2022-05-13 8:18 ` Siddhesh Poyarekar 1 sibling, 0 replies; 38+ messages in thread From: Siddhesh Poyarekar @ 2022-05-13 8:18 UTC (permalink / raw) To: Paul Eggert; +Cc: adhemerval.zanella, fweimer, dickey, libc-alpha On 13/05/2022 10:26, Paul Eggert wrote: > On 5/12/22 06:15, Siddhesh Poyarekar wrote: >> + switch (result) >> + { >> + case 4: >> + s[3] = buf[3]; >> + /* Fall through. */ >> + case 3: >> + s[2] = buf[2]; >> + /* Fall through. */ >> + case 2: >> + s[1] = buf[1]; >> + /* Fall through. */ >> + case 1: >> + s[0] = buf[0]; >> + break; >> + default: >> + memcpy (s, buf, result); >> + } > > For me with GCC 12.1 -O2 on x86-64, the above code generates 2 > comparisons and 3 conditional branches in the common case where RESULT > is 1. Plus, GCC generates a jmp from the end of case 3 to the start of > case 2 (which precedes case 3 in the machine code), which is a bit odd. It seems to hit actual performance less, presumably because those branches get predicted correctly most times. However we could simply elide the check for the first byte like you've done. > How about something like the following instead? This generates machine > code with only one conditional branch - the one that decides whether to > call memcpy - and this branch is never taken with glibc-supplied > charmaps. (I'm assuming RESULT is at least 1.) > > s[0] = buf[0]; > if (2 <= result && result <= 4) > { > s[1] = buf[1]; > memcpy (&s[result != 2], &buf[result != 2], 2); > s[result - 1] = buf[result - 1]; > } > else > memcpy (s, buf, result); This produces redundant loads and stores, which hits 2, 3, 4 byte copies much worse; I reckon the condition in the subscript ends up preventing the compiler from eliminating them. If result == 1 is a safe assumption (as it should be I think) then we could hoist it out of the switch like so: s[0] = buf[0]; switch (result) { case 4: s[3] = buf[3]; /* Fall through. */ case 3: s[2] = buf[2]; /* Fall through. */ case 2: s[1] = buf[1]; /* Fall through. */ case 1: break; default: memcpy (s, buf, result); } This improves things ever so slightly for the 1 and 2 byte locale tests in the microbenchmark, while keeping things the same for the rest. There's still a redundant byte copy in the > 4 bytes case, but that should be nothing compared to the PLT indirection. What do you think? Thanks, Siddhesh ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2022-05-13 17:58 UTC | newest] Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-07 6:26 [RFC] _FORTIFY_SOURCE strictness Siddhesh Poyarekar 2022-04-07 10:16 ` Andreas Schwab 2022-04-08 3:24 ` Siddhesh Poyarekar 2022-04-08 2:26 ` Paul Eggert 2022-04-08 3:32 ` Siddhesh Poyarekar 2022-04-08 5:37 ` Florian Weimer 2022-04-08 6:02 ` Siddhesh Poyarekar 2022-04-08 21:07 ` Paul Eggert 2022-04-11 8:02 ` Siddhesh Poyarekar 2022-05-05 18:43 ` [PATCH 0/2] More compliant wcrtomb Siddhesh Poyarekar 2022-05-05 18:43 ` [PATCH 1/2] benchtests: Add wcrtomb microbenchmark Siddhesh Poyarekar 2022-05-06 9:10 ` Florian Weimer 2022-05-06 12:49 ` [committed] " Siddhesh Poyarekar 2022-05-06 12:50 ` [PATCH 1/2] " Adhemerval Zanella 2022-05-06 12:59 ` Siddhesh Poyarekar 2022-05-06 13:20 ` Adhemerval Zanella 2022-05-06 13:26 ` Siddhesh Poyarekar 2022-05-06 13:36 ` Siddhesh Poyarekar 2022-05-06 13:46 ` Adhemerval Zanella 2022-05-05 18:43 ` [PATCH 2/2] wcrtomb: Make behavior POSIX compliant Siddhesh Poyarekar 2022-05-06 9:25 ` Paul Eggert 2022-05-06 13:40 ` Adhemerval Zanella 2022-05-06 13:46 ` Siddhesh Poyarekar 2022-05-06 14:04 ` [PATCH v2] " Siddhesh Poyarekar 2022-05-09 13:22 ` Adhemerval Zanella 2022-05-09 13:35 ` Siddhesh Poyarekar 2022-05-12 13:15 ` [PATCH v3] " Siddhesh Poyarekar 2022-05-13 4:56 ` Paul Eggert 2022-05-13 5:28 ` Paul Eggert 2022-05-13 11:31 ` Siddhesh Poyarekar 2022-05-13 11:38 ` Florian Weimer 2022-05-13 11:51 ` Siddhesh Poyarekar 2022-05-13 12:55 ` Florian Weimer 2022-05-13 12:30 ` Adhemerval Zanella 2022-05-13 13:42 ` Siddhesh Poyarekar 2022-05-13 17:58 ` Paul Eggert 2022-05-13 13:45 ` [committed] " Siddhesh Poyarekar 2022-05-13 8:18 ` [PATCH v3] " Siddhesh Poyarekar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).