From: Martin Sebor <msebor@gmail.com>
To: Carlos O'Donell <carlos@redhat.com>,
libc-alpha <libc-alpha@sourceware.org>,
Martin Sebor <msebor@redhat.com>
Subject: Re: [PATCH] mbstowcs: Document, test, and fix null pointer dst semantics.
Date: Fri, 22 May 2020 09:56:14 -0600 [thread overview]
Message-ID: <47cf807c-6e34-e89c-d0bc-198e72d7efc2@gmail.com> (raw)
In-Reply-To: <63ea09de-e943-c26e-c821-093ba303d76b@redhat.com>
On 5/21/20 8:15 PM, Carlos O'Donell via Libc-alpha wrote:
> Martin,
>
> The following patch does 3 things:
>
> * Documents the mbstowcs behaviour of accepting a first argument as NULL.
> * Relaxes the stdlib.h markup to allow this documented behaviour.
> * Adds a regression test to the testsuite for mbstowcs specifically to
> cover the first argument NULL test case.
>
> The regression test fails without the stdlib.h fix, and passes after.
>
> No other regressions on x86_64.
>
> I'm looking for a "Reviewed-by: Martin Sebor <msebor@redhat.com>" from you
> to indicate that what I've done makes sense and that you've reviewed
> this patch and that I've answered all of your technical questions.
>
> Thanks for the engaging discussion on documentation.
Looks good.
Thanks for fixing up the mbstowcs attribute. We might put it back
if/when the attribute implementation changes to avoid assuming that
nonzero size necessarily implies nonnull pointer.
Martin
>
> 8< --- 8< --- 8<
> The function mbstowcs, by an XSI extension to POSIX, accepts a null
> pointer for the destination wchar_t array. This API behaviour allows
> you to use the function to compute the length of the required wchar_t
> array i.e. does the conversion without storing it and returns the
> number of wide characters required.
>
> We remove the __write_only__ markup for the first argument because it
> is not true since the destination may be a null pointer, and so the
> length argument may not apply. We remove the markup otherwise the new
> test case cannot be compiled with -Werror=nonnull.
>
> We add a new test case for mbstowcs which exercises the destination is
> a null pointer behaviour which we have now explicitly documented.
>
> The mbsrtowcs and mbsnrtowcs behave similarly, and mbsrtowcs is
> documented as doing this in C11, even if the standard doesn't come out
> and call out this specific use case. We add one note to each of
> mbsrtowcs and mbsnrtowcs to call out that they support a null pointer
> for the destination.
>
> The wcsrtombs function behaves similarly but in the other way around
> and allows you to use a null destination pointer to compute how many
> bytes you would need to convert the wide character input. We document
> this particular case also, but leave wcsnrtombs as a references to
> wcsrtombs, so the reader must still read the details of the semantics
> for wcsrtombs.
> ---
> manual/charset.texi | 24 +++++++++++++++++----
> stdlib/stdlib.h | 2 +-
> wcsmbs/Makefile | 2 +-
> wcsmbs/tst-mbstowcs.c | 50 +++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 72 insertions(+), 6 deletions(-)
> create mode 100644 wcsmbs/tst-mbstowcs.c
>
> diff --git a/manual/charset.texi b/manual/charset.texi
> index 9fd0166115..c909cb88b7 100644
> --- a/manual/charset.texi
> +++ b/manual/charset.texi
> @@ -1026,6 +1026,10 @@ stores in the pointer pointed to by @var{src} either a null pointer (if
> the NUL byte in the input string was reached) or the address of the byte
> following the last converted multibyte character.
>
> +Like @code{mbstowcs} the @var{dst} parameter may be a null pointer and
> +the function can be used to count the number of wide characters that
> +would be required.
> +
> @pindex wchar.h
> @code{mbsrtowcs} was introduced in @w{Amendment 1} to @w{ISO C90} and is
> declared in @file{wchar.h}.
> @@ -1101,10 +1105,11 @@ successfully converted.
>
> Except in the case of an encoding error the return value of the
> @code{wcsrtombs} function is the number of bytes in all the multibyte
> -character sequences stored in @var{dst}. Before returning, the state in
> -the object pointed to by @var{ps} (or the internal object in case
> -@var{ps} is a null pointer) is updated to reflect the state after the
> -last conversion. The state is the initial shift state in case the
> +character sequences which were or would have been (if @var{dst} was
> +not a null) stored in @var{dst}. Before returning, the state in the
> +object pointed to by @var{ps} (or the internal object in case @var{ps}
> +is a null pointer) is updated to reflect the state after the last
> +conversion. The state is the initial shift state in case the
> terminating NUL wide character was converted.
>
> @pindex wchar.h
> @@ -1131,6 +1136,10 @@ string @code{*@var{src}} need not be NUL-terminated. But if a NUL byte
> is found within the @var{nmc} first bytes of the string, the conversion
> stops there.
>
> +Like @code{mbstowcs} the @var{dst} parameter may be a null pointer and
> +the function can be used to count the number of wide characters that
> +would be required.
> +
> This function is a GNU extension. It is meant to work around the
> problems mentioned above. Now it is possible to convert a buffer with
> multibyte character text piece by piece without having to care about
> @@ -1465,6 +1474,13 @@ mbstowcs_alloc (const char *string)
> @}
> @end smallexample
>
> +If @var{wstring} is a null pointer then no output is written and the
> +conversion proceeds as above, and the result is returned. In practice
> +such behaviour is useful for calculating the exact number of wide
> +characters required to convert @var{string}. This behaviour of accepting
> +a null pointer for @var{wstring} is an @w{XSI} extension to the requirements
> +in @w{ISO C} and @w{POSIX}.
> +
> @end deftypefun
>
> @deftypefun size_t wcstombs (char *@var{string}, const wchar_t *@var{wstring}, size_t @var{size})
> diff --git a/stdlib/stdlib.h b/stdlib/stdlib.h
> index dd779bd740..f971df4247 100644
> --- a/stdlib/stdlib.h
> +++ b/stdlib/stdlib.h
> @@ -932,7 +932,7 @@ extern int wctomb (char *__s, wchar_t __wchar) __THROW;
> /* Convert a multibyte string to a wide char string. */
> extern size_t mbstowcs (wchar_t *__restrict __pwcs,
> const char *__restrict __s, size_t __n) __THROW
> - __attr_access ((__write_only__, 1, 3)) __attr_access ((__read_only__, 2));
> + __attr_access ((__read_only__, 2));
> /* Convert a wide char string to multibyte string. */
> extern size_t wcstombs (char *__restrict __s,
> const wchar_t *__restrict __pwcs, size_t __n)
> diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
> index f02167fa58..e638e45522 100644
> --- a/wcsmbs/Makefile
> +++ b/wcsmbs/Makefile
> @@ -52,7 +52,7 @@ tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \
> tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \
> tst-wcstod-round test-char-types tst-fgetwc-after-eof \
> tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \
> - $(addprefix test-,$(strop-tests))
> + $(addprefix test-,$(strop-tests)) tst-mbstowcs
>
> include ../Rules
>
> diff --git a/wcsmbs/tst-mbstowcs.c b/wcsmbs/tst-mbstowcs.c
> new file mode 100644
> index 0000000000..a803d826a2
> --- /dev/null
> +++ b/wcsmbs/tst-mbstowcs.c
> @@ -0,0 +1,50 @@
> +/* Test basic mbstowcs including wstring == NULL.
> + Copyright (C) 2020 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#include <stdlib.h>
> +#include <string.h>
> +#include <support/check.h>
> +
> +static int
> +do_test (void)
> +{
> + char string[] = { '1', '2', '3' , '4', '5', '\0' };
> + size_t len = strlen (string);
> + wchar_t wstring[] = { L'1', L'2', L'3', L'4', L'5', L'\0' };
> +#define NUM_WCHAR 6
> + wchar_t wout[NUM_WCHAR];
> + size_t result;
> +
> + /* The input ASCII string in the C/POSIX locale must convert
> + to the matching WSTRING. */
> + result = mbstowcs (wout, string, NUM_WCHAR);
> + TEST_VERIFY (result == (NUM_WCHAR - 1));
> + TEST_COMPARE_BLOB (wstring, sizeof(wchar_t) * (NUM_WCHAR - 1),
> + wout, sizeof(wchar_t) * result);
> +
> + /* The input ASCII string in the C/POSIX locale must be the
> + same length when using mbstowcs to compute the length of
> + the string required in the conversion. Using mbstowcs
> + in this way is an XSI extension to POSIX. */
> + result = mbstowcs (NULL, string, len);
> + TEST_VERIFY (result == len);
> +
> + return 0;
> +}
> +
> +#include <support/test-driver.c>
>
next prev parent reply other threads:[~2020-05-22 15:56 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-22 2:15 Carlos O'Donell
2020-05-22 11:25 ` Florian Weimer
2020-06-01 15:50 ` [PATCH v2] mbstowcs: Document, test, and fix null pointer dst semantics (Bug 25219) Carlos O'Donell
2020-06-01 16:17 ` Florian Weimer
2020-06-01 16:28 ` Carlos O'Donell
2020-06-01 16:35 ` Florian Weimer
2020-06-01 16:57 ` [PATCH v3] " Carlos O'Donell
2020-06-01 17:07 ` Florian Weimer
2020-06-01 17:11 ` Carlos O'Donell
2020-05-22 15:56 ` Martin Sebor [this message]
2020-05-22 16:15 ` [PATCH] mbstowcs: Document, test, and fix null pointer dst semantics Matheus Castanho
2020-05-22 16:35 ` Carlos O'Donell
2020-05-22 17:06 ` Florian Weimer
2020-05-25 14:10 ` Matheus Castanho
2020-06-01 13:40 ` Matheus Castanho
2020-06-01 15:11 ` Carlos O'Donell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47cf807c-6e34-e89c-d0bc-198e72d7efc2@gmail.com \
--to=msebor@gmail.com \
--cc=carlos@redhat.com \
--cc=libc-alpha@sourceware.org \
--cc=msebor@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).