From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by sourceware.org (Postfix) with ESMTPS id C523A3952491 for ; Fri, 6 May 2022 12:50:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C523A3952491 Received: by mail-oi1-x22e.google.com with SMTP id r1so7362295oie.4 for ; Fri, 06 May 2022 05:50:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=GRH3eaINbB38lgJEa4yQObbiFxyB3LjqmNd6wKWjd3s=; b=uOrdUgxnT1aT8IO7tcRL1EXEp4o1iQh4b+S074KVhxhd08p4Ld6tRFZXWXPwpSoM7e ytc9xGH31xdQyruBERar42UcNuU7a3MfYp95UMPavPgVyZaynlMpSiHgs8BrGNQNkdk7 275vXsrXzFtxvKEfhe0uEiUV1JM/rspAyROkyMHuIBu3J+VwEBtRcz+6h7zuOegThTmZ ibTLbKFkRGrgdqb/307n1cTHsCy1H5HnX2Cekp1hSmazAQgH9geEwNukqxNVp0Y1XTLU bprY7Ugn2DEglagbYa651k1E+ptoCxjYW3lhKOTJYATn8BPKTPAzlDvhkuwjWGS8RlSo rfyg== X-Gm-Message-State: AOAM533lB5xYHgF7TBc5ZQFX7QBtubZBegRfNGXo6Gx8IRxRMJZpa/cL eVv3REmb/OvoKmteBvPoB496nvwjxnJ4mw== X-Google-Smtp-Source: ABdhPJzBYgUt8wZI228MIhKsrOGuQU1KthfzoIjWm9lp7GrxrcX2pG1XzIeP7VMee7nQUNwGGk17qQ== X-Received: by 2002:a05:6808:6da:b0:325:9a3d:463c with SMTP id m26-20020a05680806da00b003259a3d463cmr1298130oih.206.1651841409495; Fri, 06 May 2022 05:50:09 -0700 (PDT) Received: from ?IPV6:2804:431:c7cb:726:53f7:4ba7:4810:79df? ([2804:431:c7cb:726:53f7:4ba7:4810:79df]) by smtp.gmail.com with ESMTPSA id n131-20020acaef89000000b00325cda1ffafsm1566083oih.46.2022.05.06.05.50.07 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 06 May 2022 05:50:08 -0700 (PDT) Message-ID: Date: Fri, 6 May 2022 09:50:06 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH 1/2] benchtests: Add wcrtomb microbenchmark Content-Language: en-US To: libc-alpha@sourceware.org References: <20220505184348.3357550-1-siddhesh@sourceware.org> <20220505184348.3357550-2-siddhesh@sourceware.org> From: Adhemerval Zanella In-Reply-To: <20220505184348.3357550-2-siddhesh@sourceware.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_50, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 May 2022 12:50:13 -0000 On 05/05/2022 15:43, Siddhesh Poyarekar via Libc-alpha wrote: > Add a simple benchmark that measures wcrtomb performance with various > locales with 1-4 byte characters. > > Signed-off-by: Siddhesh Poyarekar > --- > benchtests/Makefile | 1 + > benchtests/bench-wcrtomb.c | 140 +++++++++++++++++++++++++++++++++++++ > 2 files changed, 141 insertions(+) > create mode 100644 benchtests/bench-wcrtomb.c > > diff --git a/benchtests/Makefile b/benchtests/Makefile > index 149d87e22e..de9de5cf58 100644 > --- a/benchtests/Makefile > +++ b/benchtests/Makefile > @@ -171,6 +171,7 @@ ifeq (no,$(cross-compiling)) > wcsmbs-benchset := \ > wcpcpy \ > wcpncpy \ > + wcrtomb \ > wcscat \ > wcschr \ > wcschrnul \ > diff --git a/benchtests/bench-wcrtomb.c b/benchtests/bench-wcrtomb.c > new file mode 100644 > index 0000000000..6cef69cdbf > --- /dev/null > +++ b/benchtests/bench-wcrtomb.c > @@ -0,0 +1,140 @@ > +/* Measure wcrtomb function. > + Copyright The GNU Toolchain Authors. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > +#include > + > +#include "bench-timing.h" > +#include "json-lib.h" > + > +#define NITERS 100000 > + > +struct test_inputs > +{ > + const char *locale; > + const wchar_t *input_chars; > +}; > + > +/* The inputs represent different types of characters, e.g. RTL, 1 byte, 2 > + byte, 3 byte and 4 byte chars. The exact number of inputs per locale > + doesn't really matter because we're not looking to compare performance > + between locales. */ > +struct test_inputs inputs[] = > +{ > + /* RTL. */ > + {"ar_SA.UTF-8", > + L",-.،؟ـًُّ٠٢٣٤ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي"}, > + > + /* Various mixes of 1 and 2 byte chars. */ > + {"cs_CZ.UTF-8", > + L",.aAábcCčdDďeEéÉěĚfFghHiIíJlLmMnNňŇoóÓpPqQrřsSšŠTťuUúÚůŮvVWxyýz"}, > + > + {"el_GR.UTF-8", > + L",.αΑβγδΔεΕζηΗθΘιΙκΚλμΜνΝξοΟπΠρΡσΣςτυΥφΦχψω"}, > + > + {"en_GB.UTF-8", > + L",.aAāĀæÆǽǣǢbcCċdDðÐeEēĒfFgGġhHiIīĪlLmMnNoōpPqQrsSTuUūŪvVwxyȝzþÞƿǷ"}, > + > + {"fr_FR.UTF-8", > + L",.aAàâbcCçdDeEéèêëfFghHiIîïjlLmMnNoOôœpPqQrRsSTuUùûvVwxyz"}, > + > + {"he_IL.UTF-8", > + L"',.ִאבגדהוזחטיכךלמםנןסעפףצץקרשת"}, > + > + /* Devanagari, Japanese, 3-byte chars. */ > + {"hi_IN.UTF-8", > + L"(।ं०४५७अआइईउऎएओऔकखगघचछजञटडढणतथदधनपफ़बभमयरलवशषसहािीुूृेैोौ्"}, > + > + {"ja_JP.UTF-8", > + L".ー0123456789あアいイうウえエおオかカがきキぎくクぐけケげこコごさサざ"}, > + > + /* More mixtures of 1 and 2 byte chars. */ > + {"ru_RU.UTF-8", > + L",.аАбвВгдДеЕёЁжЖзЗийЙкКлЛмМнНоОпПрстТуУфФхХЦчшШщъыЫьэЭюЮя"}, > + > + {"sr_RS.UTF-8", > + L",.aAbcCćčdDđĐeEfgGhHiIlLmMnNoOpPqQrsSšŠTuUvVxyzZž"}, > + > + {"sv_SE.UTF-8", > + L",.aAåÅäÄæÆbBcCdDeEfFghHiIjlLmMnNoOöÖpPqQrsSTuUvVwxyz"}, > + > + /* Chinese, 3-byte chars */ > + {"zh_CN.UTF-8", > + L"一七三下不与世両並中串主乱予事二五亡京人今仕付以任企伎会伸住佐体作使"}, > + > + /* 4-byte chars, because smileys are the universal language and we want to > + ensure optimal performance with them 😊. */ > + {"en_US.UTF-8", > + L"😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡"} > +}; Could you use use hexadecimal character escape in tests? Although gcc handle multiple -fexec-charset, trying to build it with a different compiler usually emits a lot of warnings. > + > +char buf[MB_LEN_MAX]; > +size_t ret; > + > +int > +main (int argc, char **argv) > +{ > + const size_t inputs_len = sizeof (inputs) / sizeof (struct test_inputs); > + > + json_ctx_t json_ctx; > + json_init (&json_ctx, 0, stdout); > + json_document_begin (&json_ctx); > + > + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); > + json_attr_object_begin (&json_ctx, "functions"); > + json_attr_object_begin (&json_ctx, "wcrtomb"); > + > + for (size_t i = 0; i < inputs_len; i++) > + { > + json_attr_object_begin (&json_ctx, inputs[i].locale); > + setlocale (LC_ALL, inputs[i].locale); > + > + timing_t min = 0x7fffffffffffffff, max = 0, total = 0; > + const wchar_t *inp = inputs[i].input_chars; > + const size_t len = wcslen (inp); > + mbstate_t s; > + > + memset (&s, '\0', sizeof (s)); > + > + for (size_t n = 0; n < NITERS; n++) > + { > + timing_t start, end, elapsed; > + > + TIMING_NOW (start); > + for (size_t j = 0; j < len; j++) > + ret = wcrtomb (buf, inp[j], &s); > + TIMING_NOW (end); > + TIMING_DIFF (elapsed, start, end); > + if (min > elapsed) > + min = elapsed; > + if (max < elapsed) > + max = elapsed; > + TIMING_ACCUM (total, elapsed); > + } > + json_attr_double (&json_ctx, "max", max); > + json_attr_double (&json_ctx, "min", min); > + json_attr_double (&json_ctx, "mean", total / NITERS); > + json_attr_object_end (&json_ctx); > + } > + > + json_attr_object_end (&json_ctx); > + json_attr_object_end (&json_ctx); > + json_document_end (&json_ctx); > +}