public inbox for cygwin-cvs@sourceware.org
help / color / mirror / Atom feed
From: Corinna Vinschen <corinna@sourceware.org>
To: cygwin-cvs@sourceware.org
Subject: [newlib-cygwin/main] Cygwin: mbrtowi: define replacement for mbrtowc, returning UTF-32 value
Date: Tue, 14 Feb 2023 12:09:53 +0000 (GMT)	[thread overview]
Message-ID: <20230214120953.B55003858D1E@sourceware.org> (raw)

https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;h=60c25da90d015f27c5697c6db7ab0557585d09aa

commit 60c25da90d015f27c5697c6db7ab0557585d09aa
Author:     Corinna Vinschen <corinna@vinschen.de>
AuthorDate: Tue Feb 14 12:20:20 2023 +0100
Commit:     Corinna Vinschen <corinna@vinschen.de>
CommitDate: Tue Feb 14 12:20:20 2023 +0100

    Cygwin: mbrtowi: define replacement for mbrtowc, returning UTF-32 value
    
    Given how UTF-16 isn't capable to hold all Unicode chars in a single
    wchar_t, we need a function returning a wint_t value representing
    a UTF-32 value for comparison functions.  Fortunately the important
    wide character functions like towupper/towlower, isw<class>, iswctype,
    etc, already take wint_t values and newlib handles them as UTF-32.
    
    If only we had switched wchar_t to 32 bit way back when... sigh.
    
    Signed-off-by: Corinna Vinschen <corinna@vinschen.de>

Diff:
---
 winsup/cygwin/local_includes/wchar.h |  4 ++++
 winsup/cygwin/strfuncs.cc            | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/winsup/cygwin/local_includes/wchar.h b/winsup/cygwin/local_includes/wchar.h
index b2ddd457568f..3d746c29b9bf 100644
--- a/winsup/cygwin/local_includes/wchar.h
+++ b/winsup/cygwin/local_includes/wchar.h
@@ -39,6 +39,10 @@ extern wctomb_f __utf8_wctomb;
 
 #define __WCTOMB (__get_current_locale ()->wctomb)
 
+/* replacement function for mbrtowc, returning a wint_t representing
+   a UTF-32 value. Defined in strfuncs.cc */
+extern wint_t mbrtowi (wint_t *, const char *, size_t, mbstate_t *);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/winsup/cygwin/strfuncs.cc b/winsup/cygwin/strfuncs.cc
index 0ab2290539a8..0b9d8ac1f639 100644
--- a/winsup/cygwin/strfuncs.cc
+++ b/winsup/cygwin/strfuncs.cc
@@ -112,6 +112,38 @@ transform_chars_af_unix (PWCHAR out, const char *path, __socklen_t len)
   return out;
 }
 
+/* replacement function for mbrtowc, returning a wint_t representing
+   a UTF-32 value. */
+extern "C" wint_t
+mbrtowi (wint_t *pwi, const char *s, size_t n, mbstate_t *ps)
+{
+  size_t len, len2;
+  wchar_t w1, w2;
+
+  len = mbrtowc (&w1, s, n, ps);
+  if (len == (size_t) -1 || len == (size_t) -2)
+    return len;
+  *pwi = w1;
+  /* Convert surrogate pair to wint_t value */
+  if (len > 0 && w1 >= 0xd800 && w1 <= 0xdbff)
+    {
+      s += len;
+      n -= len;
+      len2 = mbrtowc (&w2, s, n, ps);
+      if (len2 > 0 && w2 >= 0xdc00 && w2 <= 0xdfff)
+	{
+	  len += len2;
+	  *pwi = (((w1 & 0x3ff) << 10) | (w2 & 0x3ff)) + 0x10000;
+	}
+      else
+	{
+	  len = (size_t) -1;
+	  errno = EILSEQ;
+	}
+    }
+  return len;
+}
+
 /* The SJIS, JIS and eucJP conversion in newlib does not use UTF as
    wchar_t character representation.  That's unfortunate for us since
    we require UTF for the OS.  What we do here is to have our own

                 reply	other threads:[~2023-02-14 12:09 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230214120953.B55003858D1E@sourceware.org \
    --to=corinna@sourceware.org \
    --cc=cygwin-cvs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).