public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
From: Ben Boeckel <ben.boeckel@kitware.com>
To: gcc-patches@gcc.gnu.org
Cc: Ben Boeckel <ben.boeckel@kitware.com>,
	jason@redhat.com, nathan@acm.org, fortran@gcc.gnu.org,
	gcc@gcc.gnu.org, brad.king@kitware.com
Subject: [PATCH v5 2/5] libcpp: add a function to determine UTF-8 validity of a C string
Date: Wed, 25 Jan 2023 16:06:33 -0500	[thread overview]
Message-ID: <20230125210636.2960049-3-ben.boeckel@kitware.com> (raw)
In-Reply-To: <20230125210636.2960049-1-ben.boeckel@kitware.com>

This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

libcpp/

	* charset.cc: Add `_cpp_valid_utf8_str` which determines whether
	a C string is valid UTF-8 or not.
	* internal.h: Add prototype for `_cpp_valid_utf8_str`.

Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
---
 libcpp/charset.cc | 20 ++++++++++++++++++++
 libcpp/internal.h |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index f7ae12ea5a2..616be9d02ee 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1868,6 +1868,26 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+/*  Detect whether a C-string is a valid UTF-8-encoded set of bytes. Returns
+    `false` if any contained byte sequence encodes an invalid Unicode codepoint
+    or is not a valid UTF-8 sequence. Returns `true` otherwise. */
+
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen (name);
+  cppchar_t cp;
+
+  while (*in)
+    {
+      if (one_utf8_to_cppchar (&in, &len, &cp))
+	return false;
+    }
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
    in the execution character set of a numeric escape; write it into the
    string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index 9724676a8cd..48520901b2d 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 			     struct normalize_state *nst,
 			     cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
 					  unsigned char *, size_t, size_t,
-- 
2.39.0


  parent reply	other threads:[~2023-01-25 21:06 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 21:06 [PATCH v5 0/5] P1689R5 support Ben Boeckel
2023-01-25 21:06 ` [PATCH v5 1/5] libcpp: reject codepoints above 0x10FFFF Ben Boeckel
2023-02-13 15:53   ` Jason Merrill
2023-05-12 14:26     ` Ben Boeckel
2023-01-25 21:06 ` Ben Boeckel [this message]
2023-10-23 15:16   ` [PATCH v5 2/5] libcpp: add a function to determine UTF-8 validity of a C string David Malcolm
2023-10-23 15:24     ` Jason Merrill
2023-10-23 15:28       ` David Malcolm
2023-01-25 21:06 ` [PATCH v5 3/5] p1689r5: initial support Ben Boeckel
2023-02-14 21:50   ` Jason Merrill
2023-05-12 14:24     ` Ben Boeckel
2023-06-19 21:33       ` Jason Merrill
2023-06-20 16:51         ` Ben Boeckel
2023-06-20 19:46     ` Ben Boeckel
2023-06-23 18:31       ` Jason Merrill
2023-06-25 17:08         ` Ben Boeckel
2023-01-25 21:06 ` [PATCH v5 4/5] c++modules: report imported CMI files as dependencies Ben Boeckel
2023-02-13 18:33   ` Jason Merrill
2023-05-12 14:26     ` Ben Boeckel
2023-06-22 21:21   ` Jason Merrill
2023-06-23  2:45     ` Ben Boeckel
2023-06-23 12:12       ` Nathan Sidwell
2023-06-25 16:36         ` Ben Boeckel
2023-07-18 20:52           ` Jason Merrill
2023-07-18 21:12             ` Nathan Sidwell
2023-07-19  0:01             ` Ben Boeckel
2023-07-19 21:11               ` Nathan Sidwell
2023-07-20  0:47                 ` Ben Boeckel
2023-07-20 21:00                   ` Nathan Sidwell
2023-07-21 14:57                     ` Ben Boeckel
2023-07-21 20:23                       ` Nathan Sidwell
2023-07-24  0:26                         ` Ben Boeckel
2023-07-28  1:13                           ` Jason Merrill
2023-07-29 14:25                             ` Ben Boeckel
2023-01-25 21:06 ` [PATCH v5 5/5] c++modules: report module mapper files as a dependency Ben Boeckel
2023-06-23 14:44   ` Jason Merrill
2023-06-25 16:42     ` Ben Boeckel
2023-02-02 14:04 ` [PATCH v5 0/5] P1689R5 support Ben Boeckel
2023-02-02 20:24 ` Harald Anlauf
2023-02-03  4:00   ` Ben Boeckel
2023-02-03  4:07 ` Andrew Pinski
2023-02-03  8:58   ` Jonathan Wakely
2023-02-03  9:10     ` Jonathan Wakely
2023-02-03 14:52       ` Ben Boeckel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230125210636.2960049-3-ben.boeckel@kitware.com \
    --to=ben.boeckel@kitware.com \
    --cc=brad.king@kitware.com \
    --cc=fortran@gcc.gnu.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=gcc@gcc.gnu.org \
    --cc=jason@redhat.com \
    --cc=nathan@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).