public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Bruno Haible <bruno@clisp.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: Florian Weimer <fweimer@redhat.com>,
	bug-gnulib@gnu.org, libc-alpha@sourceware.org,
	Adhemerval Zanella <adhemerval.zanella@linaro.org>
Subject: Re: dealing with non-ASCII-safe encodings
Date: Sat, 06 Mar 2021 21:17:57 +0100	[thread overview]
Message-ID: <403769359.ktsG0tK06S@omega> (raw)
In-Reply-To: <0f088c6a-3255-33b8-e177-b9ac91b86c84@cs.ucla.edu>

Paul Eggert wrote:
> However, my worry is that good support for non-ASCII-safe encodings like 
> Shift-JIS is hard to do, and that any such support we'd add to 
> Gnulib/coreutils/etc. would not only increase maintenance costs and 
> reduce runtime performance

Shift_JIS is not the only non-ASCII-safe encoding; GB18030, BIG5, BIG5-HKSCS,
and GBK are as well, and among these GB18030 is used as locale encoding
in China. Therefore it is important for programs to support these locale
encodings.

Gnulib has the support for it:

  - It has replacement functions that operate correctly with these locale
    encodings:
      strstr, c_strstr -> mbsstr
      strchr -> mbschr
      strrchr -> mbsrchr
      strspn -> mbsspn
      strcspn -> mbscspn
      strpbrk -> mbspbrk
      strsep -> mbssep
      strtok_r -> mbstok_r

  - It has warnings (through _GL_WARN_ON_USE) for uses of the functions
    that are not OK for non-ASCII-safe encodings.

  - It has modules mbchar, mbiter, mbfile for iterating through the
    multibyte characters of a string or file, that work for all locale
    encodings.

Yes, it does reduce the performance to use these safer functions.
I have shown in the past, through coreutils patches, how to accommodate
both a "fast path" and a "safe path" in the same binary.

Bruno


  reply	other threads:[~2021-03-06 20:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-04 20:25 [PATCH 1/2] posix: User scratch_buffer on fnmatch Adhemerval Zanella
2021-01-04 20:25 ` [PATCH 2/2] posix: Remove alloca usage for internal fnmatch implementation Adhemerval Zanella
2021-03-08 12:59   ` Florian Weimer
2021-10-20 15:12     ` Adhemerval Zanella
2021-10-21  9:54       ` Florian Weimer
2021-01-04 20:35 ` [PATCH 1/2] posix: User scratch_buffer on fnmatch Florian Weimer
2021-01-05 13:07   ` Adhemerval Zanella
2021-01-13 19:25     ` Paul Eggert
2021-01-13 19:39       ` Florian Weimer
2021-01-13 23:36         ` Bruno Haible
2021-01-14 10:00           ` Florian Weimer
2021-03-06 17:18             ` Paul Eggert
2021-03-06 20:17               ` Bruno Haible [this message]
2021-01-14 11:44       ` Adhemerval Zanella
2021-01-15  6:56         ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=403769359.ktsG0tK06S@omega \
    --to=bruno@clisp.org \
    --cc=adhemerval.zanella@linaro.org \
    --cc=bug-gnulib@gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).