public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Tom Honermann <tom@honermann.net>
To: Carlos O'Donell <carlos@redhat.com>,
	Andreas Schwab <schwab@suse.de>,
	libc-alpha <libc-alpha@sourceware.org>,
	Rafal Luzynski <digitalfreak@lingonborough.com>
Subject: Re: [PATCH] Reset converter state after second wchar_t output (Bug 25734)
Date: Mon, 30 Mar 2020 13:40:22 -0400	[thread overview]
Message-ID: <691be68d-2969-1076-75a9-4185c5de5f69@honermann.net> (raw)
In-Reply-To: <dfe8e44b-5d4c-b77a-846b-5e3a5f93e8c0@redhat.com>

On 3/28/20 4:37 PM, Carlos O'Donell wrote:
> On 3/27/20 5:18 PM, Carlos O'Donell wrote:
>> Andreas,
>>
>> Does this look good to you?
>>
>> I admit it is odd that the converter returns 0-bytes consumed on the
>> second conversion, it could be enhanced to correctly show that we're
>> really consume 1-byte at a time, but that's another bug.
>>
> Tom,
>
> Thanks for your review!
>
> If you want you can give a "Reviewed-by:"

Thanks, Carlos!

Reviewed-by: Tom Honermann<tom@honermann.net>

Tom.

>
> Andreas,
>
> I'd like to see a review from you given your experience with BIG5.
>
> v2
> - Fixed typo.
> - Added mbsinit test (thanks Tom).
> - Fixed logical flaw in test loop in do_test.
>
>> 8< --- 8< --- 8< ---
>  From 40353453e97c321b20e683939c919e6d997fffcb Mon Sep 17 00:00:00 2001
> From: Carlos O'Donell <carlos@redhat.com>
> Date: Fri, 27 Mar 2020 17:03:36 -0400
> Subject: [PATCH] Reset converter state after second wchar_t output (Bug 25734)
>
> An input BIG5-HKSCS character may be converted into at most 2 whcar_t
> characters. After outputting the second whcar_t character (which was
> saved in the converter state) we must reset the state. If we fail
> to reset the state we will be stuck continually copying that
> character to the output even if we have further input to consider.
>
> We add a new test case that covers the 4 BIG5-HKSCS characters
> that may become 2 wchar_t characters.
> ---
>   iconvdata/Makefile                        |   6 +-
>   iconvdata/big5hkscs.c                     |   3 +
>   iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c | 160 ++++++++++++++++++++++
>   3 files changed, 168 insertions(+), 1 deletion(-)
>   create mode 100644 iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
>
> diff --git a/iconvdata/Makefile b/iconvdata/Makefile
> index c83962f351..71ddef09db 100644
> --- a/iconvdata/Makefile
> +++ b/iconvdata/Makefile
> @@ -73,7 +73,7 @@ modules.so := $(addsuffix .so, $(modules))
>   ifeq (yes,$(build-shared))
>   tests = bug-iconv1 bug-iconv2 tst-loading tst-e2big tst-iconv4 bug-iconv4 \
>   	tst-iconv6 bug-iconv5 bug-iconv6 tst-iconv7 bug-iconv8 bug-iconv9 \
> -	bug-iconv10 bug-iconv11 bug-iconv12
> +	bug-iconv10 bug-iconv11 bug-iconv12 tst-iconv-big5-hkscs-to-2ucs4
>   ifeq ($(have-thread-library),yes)
>   tests += bug-iconv3
>   endif
> @@ -340,3 +340,7 @@ tst-tables-clean:
>   
>   $(objpfx)gconv-modules: gconv-modules
>   	cat $(sysdeps-gconv-modules) $^ > $@
> +
> +# Test requires BIG5HKSCS.
> +$(objpfx)tst-iconv-big5-hkscs-to-2ucs4.out: $(objpfx)gconv-modules \
> +			  $(addprefix $(objpfx),BIG5HKSCS.so)
> diff --git a/iconvdata/big5hkscs.c b/iconvdata/big5hkscs.c
> index 01fcfeba76..ef325119b1 100644
> --- a/iconvdata/big5hkscs.c
> +++ b/iconvdata/big5hkscs.c
> @@ -17895,6 +17895,9 @@ static struct
>   	else								      \
>   	  ++inptr;							      \
>         }									      \
> +    else								      \
> +      /* Clear the queue and proceed to output the saved character.  */	      \
> +      *statep = 0;							      \
>   									      \
>       put32 (outptr, ch);							      \
>       outptr += 4;							      \
> diff --git a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
> new file mode 100644
> index 0000000000..8389adebf2
> --- /dev/null
> +++ b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
> @@ -0,0 +1,160 @@
> +/* Verify the BIG5HKSCS outputs that generate 2 wchar_t's (Bug 25734).
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <stdio.h>
> +#include <string.h>
> +#include <locale.h>
> +#include <wchar.h>
> +#include <support/check.h>
> +#include <support/support.h>
> +
> +/* A few BIG5-HKSCS characters map in two unicode code points.
> +   They are:
> +   /x88/x62 => <U00CA><U0304>
> +   /x88/x64 => <U00CA><U030C>
> +   /x88/xa3 => <U00EA><U0304>
> +   /x88/xa5 => <U00EA><U030C>
> +   Each of these is special cased in iconvdata/big5hkscs.c.
> +   This test ensures that we correctly reset the shift state after
> +   outputting any of these characters.  We do this by converting
> +   each them followed by converting an ASCII character.  If we fail
> +   to reset the shift state (bug 25734) then we'll see the last
> +   character in the queue output again.  */
> +
> +/* Each test has  name, input bytes, and expected wide character
> +   output.  */
> +struct testdata {
> +  const char *name;
> +  const char input[3];
> +  wchar_t expected[3];
> +};
> +
> +/* In BIG5-HKSCS (2008) there are 4 characters that generate multiple
> +   wide characters.  */
> +struct testdata tests[4] = {
> +  /* <H-8862>X => <U+00CA><U+0304>X */
> +  { "<H-8862>", "\x88\x62\x58", { 0x00CA, 0x0304, 0x0058 } },
> +  /* <H-8864>X => <U+00CA><U+030C>X */
> +  { "<H-8864>", "\x88\x64\x58", { 0x00CA, 0x030C, 0x0058 } },
> +  /* <H-88A3>X => <U+00EA><U+0304>X */
> +  { "<H-88A3>", "\x88\xa3\x58", { 0x00EA, 0x0304, 0x0058 } },
> +  /* <H-88A5>X => <U+00EA><U+030C>X */
> +  { "<H-88A5>", "\x88\xa5\x58", { 0x00EA, 0x030C, 0x0058 } }
> +};
> +
> +/* Each test is of the form:
> +   - Translate first code sequence (two bytes)
> +   - Translate second (zero bytes)
> +   - Translate the third (one byte).  */
> +static int
> +check_conversion (struct testdata test)
> +{
> +  int err = 0;
> +  wchar_t wc;
> +  mbstate_t st;
> +  size_t ret;
> +  const char *mbs = test.input;
> +  int consumed = 0;
> +  /* Input is always 3 bytes long.  */
> +  int inlen = 3;
> +
> +  memset (&st, 0, sizeof (st));
> +  /* First conversion: Consumes first 2 bytes.  */
> +  ret = mbrtowc (&wc, mbs, inlen - consumed, &st);
> +  if (ret != 2)
> +    {
> +      printf ("error: First conversion consumed only %zd bytes.\n", ret);
> +      err++;
> +    }
> +  /* Advance the two consumed bytes.  */
> +  mbs += ret;
> +  consumed += ret;
> +  if (wc != test.expected[0])
> +    {
> +      printf ("error: Result of first conversion was wrong.\n");
> +      err++;
> +    }
> +  /* Second conversion: Consumes 0 bytes.  */
> +  ret = mbrtowc (&wc, mbs, inlen - consumed, &st);
> +  if (ret != 0)
> +    {
> +      printf ("error: Second conversion consumed only %zd bytes.\n", ret);
> +      err++;
> +    }
> +  /* Advance the zero consumed bytes.  */
> +  mbs += ret;
> +  consumed += ret;
> +  if (wc != test.expected[1])
> +    {
> +      printf ("error: Result of second conversion was wrong.\n");
> +      err++;
> +    }
> +  /* After the second conversion the state of the converter should be
> +     in the initial state.  It is in the initial state because the two
> +     input BIG5-HKSCS bytes have been consumed and the 2 wchar_t's have
> +     been output.  */
> +  if (mbsinit (&st) == 0)
> +    {
> +      printf ("error: Converter not in initial state.\n");
> +      err++;
> +    }
> +  /* Third conversion: Consumes 1 byte (it's an ASCII character).  */
> +  ret = mbrtowc (&wc, mbs, inlen - consumed, &st);
> +  if (ret != 1)
> +    {
> +      printf ("error: Third conversion consumed only %zd bytes.\n", ret);
> +      err++;
> +    }
> +  /* Advance the one byte.  */
> +  mbs += ret;
> +  consumed += ret;
> +  if (wc != test.expected[2])
> +    {
> +      printf ("error: Result of third conversion was wrong.\n");
> +      err++;
> +    }
> +  /* Return 0 if we saw no errors.  */
> +  return err;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  int err = 0;
> +  int ret;
> +  /* Testing BIG5-HKSCS.  */
> +  xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS");
> +
> +  /* Run all the special conversions.  */
> +  for (int i = 0; i < (sizeof (tests) / sizeof (struct testdata)); i++)
> +    {
> +      printf ("Running test for %s\n", tests[i].name);
> +      ret = check_conversion (tests[i]);
> +      if (ret > 0)
> +	printf ("Test %s failed.\n", tests[i].name);
> +      err += ret;
> +    }
> +
> +  /* Fail if any conversion had an error.  */
> +  if (err > 0)
> +    FAIL_EXIT1 ("One or more conversions failed.");
> +
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>



  parent reply	other threads:[~2020-03-30 17:40 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27 21:18 Carlos O'Donell
2020-03-28 20:37 ` Carlos O'Donell
2020-03-28 20:51   ` Florian Weimer
2020-03-28 23:42     ` Carlos O'Donell
2020-03-30 17:40   ` Tom Honermann [this message]
2020-04-01 17:34     ` Tom Honermann
2020-04-04 18:41       ` [PATCH v2] " Carlos O'Donell
2020-04-07  4:20         ` Tom Honermann
2020-04-14 19:03         ` Tom Honermann
2020-04-14 19:56           ` Carlos O'Donell
2020-04-16  2:41             ` Carlos O'Donell
2020-04-16  3:08               ` Tom Honermann
2020-04-16  3:10                 ` Carlos O'Donell
2020-03-30 12:11 ` [PATCH] " Andreas Schwab
2020-03-30 14:19   ` Florian Weimer
2020-03-30 14:28     ` Andreas Schwab
2020-03-30 14:32       ` Florian Weimer
2020-03-30 14:36       ` Carlos O'Donell
2020-03-30 15:28         ` Andreas Schwab
2020-03-30 17:52           ` Carlos O'Donell
2020-03-30 14:34   ` Carlos O'Donell
2020-03-30 17:36     ` Tom Honermann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=691be68d-2969-1076-75a9-4185c5de5f69@honermann.net \
    --to=tom@honermann.net \
    --cc=carlos@redhat.com \
    --cc=digitalfreak@lingonborough.com \
    --cc=libc-alpha@sourceware.org \
    --cc=schwab@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).