public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Carlos O'Donell <carlos@redhat.com>,
	libc-alpha <libc-alpha@sourceware.org>,
	Martin Sebor <msebor@redhat.com>
Subject: Re: [PATCH] mbstowcs: Document, test, and fix null pointer dst semantics.
Date: Fri, 22 May 2020 09:56:14 -0600	[thread overview]
Message-ID: <47cf807c-6e34-e89c-d0bc-198e72d7efc2@gmail.com> (raw)
In-Reply-To: <63ea09de-e943-c26e-c821-093ba303d76b@redhat.com>

On 5/21/20 8:15 PM, Carlos O'Donell via Libc-alpha wrote:
> Martin,
> 
> The following patch does 3 things:
> 
> * Documents the mbstowcs behaviour of accepting a first argument as NULL.
> * Relaxes the stdlib.h markup to allow this documented behaviour.
> * Adds a regression test to the testsuite for mbstowcs specifically to
>    cover the first argument NULL test case.
> 
> The regression test fails without the stdlib.h fix, and passes after.
> 
> No other regressions on x86_64.
> 
> I'm looking for a "Reviewed-by: Martin Sebor <msebor@redhat.com>" from you
> to indicate that what I've done makes sense and that you've reviewed
> this patch and that I've answered all of your technical questions.
> 
> Thanks for the engaging discussion on documentation.

Looks good.

Thanks for fixing up the mbstowcs attribute.  We might put it back
if/when the attribute implementation changes to avoid assuming that
nonzero size necessarily implies nonnull pointer.

Martin


> 
> 8< --- 8< --- 8<
> The function mbstowcs, by an XSI extension to POSIX, accepts a null
> pointer for the destination wchar_t array.  This API behaviour allows
> you to use the function to compute the length of the required wchar_t
> array i.e. does the conversion without storing it and returns the
> number of wide characters required.
> 
> We remove the __write_only__ markup for the first argument because it
> is not true since the destination may be a null pointer, and so the
> length argument may not apply.  We remove the markup otherwise the new
> test case cannot be compiled with -Werror=nonnull.
> 
> We add a new test case for mbstowcs which exercises the destination is
> a null pointer behaviour which we have now explicitly documented.
> 
> The mbsrtowcs and mbsnrtowcs behave similarly, and mbsrtowcs is
> documented as doing this in C11, even if the standard doesn't come out
> and call out this specific use case.  We add one note to each of
> mbsrtowcs and mbsnrtowcs to call out that they support a null pointer
> for the destination.
> 
> The wcsrtombs function behaves similarly but in the other way around
> and allows you to use a null destination pointer to compute how many
> bytes you would need to convert the wide character input.  We document
> this particular case also, but leave wcsnrtombs as a references to
> wcsrtombs, so the reader must still read the details of the semantics
> for wcsrtombs.
> ---
>   manual/charset.texi   | 24 +++++++++++++++++----
>   stdlib/stdlib.h       |  2 +-
>   wcsmbs/Makefile       |  2 +-
>   wcsmbs/tst-mbstowcs.c | 50 +++++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 72 insertions(+), 6 deletions(-)
>   create mode 100644 wcsmbs/tst-mbstowcs.c
> 
> diff --git a/manual/charset.texi b/manual/charset.texi
> index 9fd0166115..c909cb88b7 100644
> --- a/manual/charset.texi
> +++ b/manual/charset.texi
> @@ -1026,6 +1026,10 @@ stores in the pointer pointed to by @var{src} either a null pointer (if
>   the NUL byte in the input string was reached) or the address of the byte
>   following the last converted multibyte character.
>   
> +Like @code{mbstowcs} the @var{dst} parameter may be a null pointer and
> +the function can be used to count the number of wide characters that
> +would be required.
> +
>   @pindex wchar.h
>   @code{mbsrtowcs} was introduced in @w{Amendment 1} to @w{ISO C90} and is
>   declared in @file{wchar.h}.
> @@ -1101,10 +1105,11 @@ successfully converted.
>   
>   Except in the case of an encoding error the return value of the
>   @code{wcsrtombs} function is the number of bytes in all the multibyte
> -character sequences stored in @var{dst}.  Before returning, the state in
> -the object pointed to by @var{ps} (or the internal object in case
> -@var{ps} is a null pointer) is updated to reflect the state after the
> -last conversion.  The state is the initial shift state in case the
> +character sequences which were or would have been (if @var{dst} was
> +not a null) stored in @var{dst}.  Before returning, the state in the
> +object pointed to by @var{ps} (or the internal object in case @var{ps}
> +is a null pointer) is updated to reflect the state after the last
> +conversion.  The state is the initial shift state in case the
>   terminating NUL wide character was converted.
>   
>   @pindex wchar.h
> @@ -1131,6 +1136,10 @@ string @code{*@var{src}} need not be NUL-terminated.  But if a NUL byte
>   is found within the @var{nmc} first bytes of the string, the conversion
>   stops there.
>   
> +Like @code{mbstowcs} the @var{dst} parameter may be a null pointer and
> +the function can be used to count the number of wide characters that
> +would be required.
> +
>   This function is a GNU extension.  It is meant to work around the
>   problems mentioned above.  Now it is possible to convert a buffer with
>   multibyte character text piece by piece without having to care about
> @@ -1465,6 +1474,13 @@ mbstowcs_alloc (const char *string)
>   @}
>   @end smallexample
>   
> +If @var{wstring} is a null pointer then no output is written and the
> +conversion proceeds as above, and the result is returned. In practice
> +such behaviour is useful for calculating the exact number of wide
> +characters required to convert @var{string}.  This behaviour of accepting
> +a null pointer for @var{wstring} is an @w{XSI} extension to the requirements
> +in @w{ISO C} and @w{POSIX}.
> +
>   @end deftypefun
>   
>   @deftypefun size_t wcstombs (char *@var{string}, const wchar_t *@var{wstring}, size_t @var{size})
> diff --git a/stdlib/stdlib.h b/stdlib/stdlib.h
> index dd779bd740..f971df4247 100644
> --- a/stdlib/stdlib.h
> +++ b/stdlib/stdlib.h
> @@ -932,7 +932,7 @@ extern int wctomb (char *__s, wchar_t __wchar) __THROW;
>   /* Convert a multibyte string to a wide char string.  */
>   extern size_t mbstowcs (wchar_t *__restrict  __pwcs,
>   			const char *__restrict __s, size_t __n) __THROW
> -    __attr_access ((__write_only__, 1, 3)) __attr_access ((__read_only__, 2));
> +    __attr_access ((__read_only__, 2));
>   /* Convert a wide char string to multibyte string.  */
>   extern size_t wcstombs (char *__restrict __s,
>   			const wchar_t *__restrict __pwcs, size_t __n)
> diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
> index f02167fa58..e638e45522 100644
> --- a/wcsmbs/Makefile
> +++ b/wcsmbs/Makefile
> @@ -52,7 +52,7 @@ tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \
>   	 tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \
>   	 tst-wcstod-round test-char-types tst-fgetwc-after-eof \
>   	 tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \
> -	 $(addprefix test-,$(strop-tests))
> +	 $(addprefix test-,$(strop-tests)) tst-mbstowcs
>   
>   include ../Rules
>   
> diff --git a/wcsmbs/tst-mbstowcs.c b/wcsmbs/tst-mbstowcs.c
> new file mode 100644
> index 0000000000..a803d826a2
> --- /dev/null
> +++ b/wcsmbs/tst-mbstowcs.c
> @@ -0,0 +1,50 @@
> +/* Test basic mbstowcs including wstring == NULL.
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <stdlib.h>
> +#include <string.h>
> +#include <support/check.h>
> +
> +static int
> +do_test (void)
> +{
> +  char string[] = { '1', '2', '3' , '4', '5', '\0' };
> +  size_t len = strlen (string);
> +  wchar_t wstring[] = { L'1', L'2', L'3', L'4', L'5', L'\0' };
> +#define NUM_WCHAR 6
> +  wchar_t wout[NUM_WCHAR];
> +  size_t result;
> +
> +  /* The input ASCII string in the C/POSIX locale must convert
> +     to the matching WSTRING.  */
> +  result = mbstowcs (wout, string, NUM_WCHAR);
> +  TEST_VERIFY (result == (NUM_WCHAR - 1));
> +  TEST_COMPARE_BLOB (wstring, sizeof(wchar_t) * (NUM_WCHAR - 1),
> +		     wout, sizeof(wchar_t) * result);
> +
> +  /* The input ASCII string in the C/POSIX locale must be the
> +     same length when using mbstowcs to compute the length of
> +     the string required in the conversion.  Using mbstowcs
> +     in this way is an XSI extension to POSIX.  */
> +  result = mbstowcs (NULL, string, len);
> +  TEST_VERIFY (result == len);
> +
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>
> 


  parent reply	other threads:[~2020-05-22 15:56 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-22  2:15 Carlos O'Donell
2020-05-22 11:25 ` Florian Weimer
2020-06-01 15:50   ` [PATCH v2] mbstowcs: Document, test, and fix null pointer dst semantics (Bug 25219) Carlos O'Donell
2020-06-01 16:17     ` Florian Weimer
2020-06-01 16:28       ` Carlos O'Donell
2020-06-01 16:35         ` Florian Weimer
2020-06-01 16:57           ` [PATCH v3] " Carlos O'Donell
2020-06-01 17:07             ` Florian Weimer
2020-06-01 17:11               ` Carlos O'Donell
2020-05-22 15:56 ` Martin Sebor [this message]
2020-05-22 16:15 ` [PATCH] mbstowcs: Document, test, and fix null pointer dst semantics Matheus Castanho
2020-05-22 16:35   ` Carlos O'Donell
2020-05-22 17:06     ` Florian Weimer
2020-05-25 14:10       ` Matheus Castanho
2020-06-01 13:40 ` Matheus Castanho
2020-06-01 15:11   ` Carlos O'Donell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47cf807c-6e34-e89c-d0bc-198e72d7efc2@gmail.com \
    --to=msebor@gmail.com \
    --cc=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=msebor@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).