public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Complete GB18030 charmap
@ 2012-05-09 14:33 Andreas Schwab
  2012-05-09 14:46 ` Carlos O'Donell
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Schwab @ 2012-05-09 14:33 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 249 bytes --]

	[BZ #11837]
	* iconvdata/gb18030.c: Update tables.
	(BODY for FROM_LOOP): Update.  Handle two-byte encoded non-BMP
	characters specially.
	(BODY for TO_LOOP): Add encoding of missing ranges.

	[BZ #11837]
	* charmaps/GB18030: Add missing entries.


[-- Attachment #2: Type: application/x-bzip2, Size: 458428 bytes --]

[-- Attachment #3: Type: text/plain, Size: 162 bytes --]


-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Complete GB18030 charmap
  2012-05-09 14:33 [PATCH] Complete GB18030 charmap Andreas Schwab
@ 2012-05-09 14:46 ` Carlos O'Donell
  2012-05-09 21:08   ` Andreas Schwab
  0 siblings, 1 reply; 13+ messages in thread
From: Carlos O'Donell @ 2012-05-09 14:46 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-alpha

On Wed, May 9, 2012 at 10:32 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>        [BZ #11837]
>        * iconvdata/gb18030.c: Update tables.
>        (BODY for FROM_LOOP): Update.  Handle two-byte encoded non-BMP
>        characters specially.
>        (BODY for TO_LOOP): Add encoding of missing ranges.
>
>        [BZ #11837]
>        * charmaps/GB18030: Add missing entries.

Thanks for the updated patch.

I'm doing a build and will be testing some conversions to ensure this works.

I'm very new to the iconv code.

How did you develop the patch?

What testing did you do with this patch?

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Complete GB18030 charmap
  2012-05-09 14:46 ` Carlos O'Donell
@ 2012-05-09 21:08   ` Andreas Schwab
  2012-05-09 22:06     ` Carlos O'Donell
  2016-02-05 20:07     ` Florian Weimer
  0 siblings, 2 replies; 13+ messages in thread
From: Andreas Schwab @ 2012-05-09 21:08 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

"Carlos O'Donell" <carlos@systemhalted.org> writes:

> How did you develop the patch?

From ICU
(http://source.icu-project.org/repos/icu/data/trunk/charset/source/gb18030
and
http://source.icu-project.org/repos/icu/icu/trunk/source/data/mappings/gb18030.ucm)

> What testing did you do with this patch?

tst-tables.sh tests for consistency.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Complete GB18030 charmap
  2012-05-09 21:08   ` Andreas Schwab
@ 2012-05-09 22:06     ` Carlos O'Donell
  2012-05-09 22:36       ` Andreas Schwab
  2016-02-05 20:07     ` Florian Weimer
  1 sibling, 1 reply; 13+ messages in thread
From: Carlos O'Donell @ 2012-05-09 22:06 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-alpha

On Wed, May 9, 2012 at 5:08 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> "Carlos O'Donell" <carlos@systemhalted.org> writes:
>
>> How did you develop the patch?
>
> From ICU
> (http://source.icu-project.org/repos/icu/data/trunk/charset/source/gb18030
> and
> http://source.icu-project.org/repos/icu/icu/trunk/source/data/mappings/gb18030.ucm)

Thanks for this pointer. I've added a reference to ICU in the wiki
section on locales.

>> What testing did you do with this patch?
>
> tst-tables.sh tests for consistency.

Does the truncation of GB18030 in iconvdata/tst-table.sh still mean
that all unicode scalar values, as required, are tested for
conversion?
~~~
...
# When the charset is GB18030, truncate this table because for this encoding,
# the tst-table-from and tst-table-to programs scan the Unicode BMP only.
if test ${charset} = GB18030; then
  grep '0x....$' < ${objpfx}tst-${charset}.charmap.table \
    > ${objpfx}tst-${charset}.truncated.table
  mv ${objpfx}tst-${charset}.truncated.table
${objpfx}tst-${charset}.charmap.table
fi
...
~~~

My worry is that our testing doesn't test everything that is required
to verify GB18030 is correct.

Given the grep above I think we miss out testing the upper range e.g.
0x10000-0x10FFFF

Removing the grep I get:
~~~
This might take a while
Testing GB18030 *** FAILED ***
~~~

I'm not an expert *at all*, but I don't get a warm and fuzzy feeling
that we are testing everything for GB18030.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Complete GB18030 charmap
  2012-05-09 22:06     ` Carlos O'Donell
@ 2012-05-09 22:36       ` Andreas Schwab
  2012-05-09 23:43         ` Carlos O'Donell
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Schwab @ 2012-05-09 22:36 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

"Carlos O'Donell" <carlos@systemhalted.org> writes:

> # the tst-table-from and tst-table-to programs scan the Unicode BMP only.
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^

That's why.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Complete GB18030 charmap
  2012-05-09 22:36       ` Andreas Schwab
@ 2012-05-09 23:43         ` Carlos O'Donell
  2012-05-10  8:10           ` Andreas Schwab
  0 siblings, 1 reply; 13+ messages in thread
From: Carlos O'Donell @ 2012-05-09 23:43 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-alpha

On Wed, May 9, 2012 at 6:36 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> "Carlos O'Donell" <carlos@systemhalted.org> writes:
>
>> # the tst-table-from and tst-table-to programs scan the Unicode BMP only.
>                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^
>
> That's why.

There are two test cases reported in the issue.

The patch doesn't fix the first test case:

printf "\xf0\xa0\xb3\x90\n" > bz11837-t1.txt
GCONV_PATH=/home/carlos/build/glibc/iconvdata ./elf/ld-linux.so.2
--library-path /home/carlos/build/glibc:/home/carlos/build/glibc/elf:/home/carlos/build/glibc/dlfcn
./iconv/iconv_prog -t GB18030 ./bz11837-t1.txt
./iconv/iconv_prog: illegal input sequence at position 0

It does fix the second:

printf '\x00\x00\xc5\x0B' > bz11837-t2.txt
GCONV_PATH=/home/carlos/build/glibc/iconvdata ./elf/ld-linux.so.2
--library-path /home/carlos/build/glibc:/home/carlos/build/glibc/elf:/home/carlos/build/glibc/dlfcn
./iconv/iconv_prog -f UCS-4BE -t GB18030 ./bz11837-t2.txt > t2-out.txt
hexdump t2-out.txt
0000000 3283 36da
0000004

Do we know if these test cases are correct?

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Complete GB18030 charmap
  2012-05-09 23:43         ` Carlos O'Donell
@ 2012-05-10  8:10           ` Andreas Schwab
  2012-05-10 12:40             ` Carlos O'Donell
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Schwab @ 2012-05-10  8:10 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

"Carlos O'Donell" <carlos@systemhalted.org> writes:

> printf "\xf0\xa0\xb3\x90\n" > bz11837-t1.txt
> GCONV_PATH=/home/carlos/build/glibc/iconvdata ./elf/ld-linux.so.2
> --library-path /home/carlos/build/glibc:/home/carlos/build/glibc/elf:/home/carlos/build/glibc/dlfcn
> ./iconv/iconv_prog -t GB18030 ./bz11837-t1.txt
> ./iconv/iconv_prog: illegal input sequence at position 0

WORKSFORME.

$ printf "\xf0\xa0\xb3\x90\n" | ./testrun.sh iconv/iconv_prog -t GB18030 | od -tx1z
0000000 95 34 ce 36 0a                                   >.4.6.<
0000005

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Complete GB18030 charmap
  2012-05-10  8:10           ` Andreas Schwab
@ 2012-05-10 12:40             ` Carlos O'Donell
  0 siblings, 0 replies; 13+ messages in thread
From: Carlos O'Donell @ 2012-05-10 12:40 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-alpha

On Thu, May 10, 2012 at 4:09 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> "Carlos O'Donell" <carlos@systemhalted.org> writes:
>
>> printf "\xf0\xa0\xb3\x90\n" > bz11837-t1.txt
>> GCONV_PATH=/home/carlos/build/glibc/iconvdata ./elf/ld-linux.so.2
>> --library-path /home/carlos/build/glibc:/home/carlos/build/glibc/elf:/home/carlos/build/glibc/dlfcn
>> ./iconv/iconv_prog -t GB18030 ./bz11837-t1.txt
>> ./iconv/iconv_prog: illegal input sequence at position 0
>
> WORKSFORME.
>
> $ printf "\xf0\xa0\xb3\x90\n" | ./testrun.sh iconv/iconv_prog -t GB18030 | od -tx1z
> 0000000 95 34 ce 36 0a                                   >.4.6.<
> 0000005

Darn. I've got one regression in bug22 which looks like a compiler
issue in Ubuntu and that might be effecting this case.

OK, I feel like we've done enough smoke testing of this patch.

Please check this in tomorrow if I don't find anything else wrong :-)

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Complete GB18030 charmap
  2012-05-09 21:08   ` Andreas Schwab
  2012-05-09 22:06     ` Carlos O'Donell
@ 2016-02-05 20:07     ` Florian Weimer
  2016-02-08 19:59       ` Andreas Schwab
  1 sibling, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2016-02-05 20:07 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Carlos O'Donell, libc-alpha

* Andreas Schwab:

> "Carlos O'Donell" <carlos@systemhalted.org> writes:
>
>> How did you develop the patch?
>
>From ICU
> (http://source.icu-project.org/repos/icu/data/trunk/charset/source/gb18030
> and
> http://source.icu-project.org/repos/icu/icu/trunk/source/data/mappings/gb18030.ucm)

Andreas,

there are discrepancies between the ICU table at

<http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-2000.xml>

and localedata/charmaps/GB18030, see bug 19575.

Do you know which version is correct?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Complete GB18030 charmap
  2016-02-05 20:07     ` Florian Weimer
@ 2016-02-08 19:59       ` Andreas Schwab
  0 siblings, 0 replies; 13+ messages in thread
From: Andreas Schwab @ 2016-02-08 19:59 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Carlos O'Donell, libc-alpha

Florian Weimer <fw@deneb.enyo.de> writes:

> there are discrepancies between the ICU table at
>
> <http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-2000.xml>
>
> and localedata/charmaps/GB18030, see bug 19575.
>
> Do you know which version is correct?

GB18030 follows GB 18030-2005.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Complete GB18030 charmap
@ 2012-05-13 20:00 Bruno Haible
  0 siblings, 0 replies; 13+ messages in thread
From: Bruno Haible @ 2012-05-13 20:00 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

Andreas Schwab wrote:
> > How did you develop the patch?
>
> From ICU
> (http://source.icu-project.org/repos/icu/data/trunk/charset/source/gb18030
> and
> http://source.icu-project.org/repos/icu/icu/trunk/source/data/mappings/gb18030.ucm)
>
> > What testing did you do with this patch?
>
> tst-tables.sh tests for consistency.

I have also tested this patch, checking the conversion table in both
directions (extracted through the attached programs) against the one that
will be used in the next release of libiconv.

The patch is perfect. It contains irreversible mappings
in the multibyte -> Unicode direction, for backward compatibility:

0x95329031	U+20087
0x95329033	U+20089
0x95329730	U+200CC
0x9536B937	U+215D7
0x9630BA35	U+2298F
0x9635B630	U+241FE

GNU libiconv also contains the following irreversible mappings
in the multibyte -> Unicode direction, also for backward compatibility:

0x82359037	U+9FB4
0x82359038	U+9FB5
0x82359039	U+9FB6
0x82359130	U+9FB7
0x82359131	U+9FB8
0x82359132	U+9FB9
0x82359133	U+9FBA
0x82359134	U+9FBB
0x84318236	U+FE10
0x84318237	U+FE11
0x84318238	U+FE12
0x84318239	U+FE13
0x84318330	U+FE14
0x84318331	U+FE15
0x84318332	U+FE16
0x84318333	U+FE17
0x84318334	U+FE18
0x84318335	U+FE19

These byte sequences can be contained in text files that were created
with previous versions of libiconv. But glibc did not support these byte
sequences (I tested glibc 2.9 and 2.11), therefore it is not really needed
that the glibc converter contains them.

In other words, thanks Andreas for having cleaned up the long standing
issues with this converter!

Bruno


[-- Attachment #2: table-from.c --]
[-- Type: text/x-csrc, Size: 5534 bytes --]

/* Copyright (C) 2000-2002, 2004-2005 Free Software Foundation, Inc.
   This file is part of the GNU LIBICONV Library.

   The GNU LIBICONV Library is free software; you can redistribute it
   and/or modify it under the terms of the GNU Library General Public
   License as published by the Free Software Foundation; either version 2
   of the License, or (at your option) any later version.

   The GNU LIBICONV Library is distributed in the hope that it will be
   useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU LIBICONV Library; see the file COPYING.LIB.
   If not, see <http://www.gnu.org/licenses/>.  */

/* Create a table from CHARSET to Unicode. */

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iconv.h>
#include <errno.h>

/* If nonzero, ignore conversions outside Unicode plane 0. */
static int bmp_only;

static const char* hexbuf (unsigned char buf[], unsigned int buflen)
{
  static char msg[50];
  switch (buflen) {
    case 1: sprintf(msg,"0x%02X",buf[0]); break;
    case 2: sprintf(msg,"0x%02X%02X",buf[0],buf[1]); break;
    case 3: sprintf(msg,"0x%02X%02X%02X",buf[0],buf[1],buf[2]); break;
    case 4: sprintf(msg,"0x%02X%02X%02X%02X",buf[0],buf[1],buf[2],buf[3]); break;
    default: abort();
  }
  return msg;
}

static int try (iconv_t cd, unsigned char buf[], unsigned int buflen, unsigned int* out)
{
  const char* inbuf = (const char*) buf;
  size_t inbytesleft = buflen;
  char* outbuf = (char*) out;
  size_t outbytesleft = 3*sizeof(unsigned int);
  size_t result;
  iconv(cd,NULL,NULL,NULL,NULL);
  result = iconv(cd,(char**)&inbuf,&inbytesleft,&outbuf,&outbytesleft);
  if (result != (size_t)(-1))
    result = iconv(cd,NULL,NULL,&outbuf,&outbytesleft);
  if (result == (size_t)(-1)) {
    if (errno == EILSEQ) {
      return -1;
    } else if (errno == EINVAL) {
      return 0;
    } else {
      int saved_errno = errno;
      fprintf(stderr,"%s: iconv error: ",hexbuf(buf,buflen));
      errno = saved_errno;
      perror("");
      exit(1);
    }
  } else if (result > 0) /* ignore conversions with transliteration */ {
    return -1;
  } else {
    if (inbytesleft != 0) {
      fprintf(stderr,"%s: inbytes = %ld, outbytes = %ld\n",hexbuf(buf,buflen),(long)(buflen-inbytesleft),(long)(3*sizeof(unsigned int)-outbytesleft));
      exit(1);
    }
    return (3*sizeof(unsigned int)-outbytesleft)/sizeof(unsigned int);
  }
}

/* Returns the out[] buffer as a Unicode value, formatted as 0x%04X. */
static const char* ucs4_decode (const unsigned int* out, unsigned int outlen)
{
  static char hexbuf[21];
  char* p = hexbuf;
  while (outlen > 0) {
    if (p > hexbuf)
      *p++ = ' ';
    sprintf (p, "0x%04X", out[0]);
    out += 1; outlen -= 1;
    if (bmp_only && strlen(p) > 6)
      return NULL;
    p += strlen(p);
  }
  return hexbuf;
}

int main (int argc, char* argv[])
{
  const char* charset;
  iconv_t cd;
  int search_depth;

  if (argc != 2) {
    fprintf(stderr,"Usage: table-from charset\n");
    exit(1);
  }
  charset = argv[1];

  cd = iconv_open("UCS-4LE",charset);
  if (cd == (iconv_t)(-1)) {
    perror("iconv_open");
    exit(1);
  }

  /* When testing UTF-8, stop at 0x10000, otherwise the output file gets too
     big. */
  bmp_only = (strcmp(charset,"UTF-8") == 0);
  search_depth = (strcmp(charset,"UTF-8") == 0 ? 3 : 4);

  {
    unsigned int out[3];
    unsigned char buf[4];
    unsigned int i0, i1, i2, i3;
    int result;
    for (i0 = 0; i0 < 0x100; i0++) {
      buf[0] = i0;
      result = try(cd,buf,1,out);
      if (result < 0) {
      } else if (result > 0) {
        const char* unicode = ucs4_decode(out,result);
        if (unicode != NULL)
          printf("0x%02X\t%s\n",i0,unicode);
      } else {
        for (i1 = 0; i1 < 0x100; i1++) {
          buf[1] = i1;
          result = try(cd,buf,2,out);
          if (result < 0) {
          } else if (result > 0) {
            const char* unicode = ucs4_decode(out,result);
            if (unicode != NULL)
              printf("0x%02X%02X\t%s\n",i0,i1,unicode);
          } else {
            for (i2 = 0; i2 < 0x100; i2++) {
              buf[2] = i2;
              result = try(cd,buf,3,out);
              if (result < 0) {
              } else if (result > 0) {
                const char* unicode = ucs4_decode(out,result);
                if (unicode != NULL)
                  printf("0x%02X%02X%02X\t%s\n",i0,i1,i2,unicode);
              } else if (search_depth > 3) {
                for (i3 = 0; i3 < 0x100; i3++) {
                  buf[3] = i3;
                  result = try(cd,buf,4,out);
                  if (result < 0) {
                  } else if (result > 0) {
                    const char* unicode = ucs4_decode(out,result);
                    if (unicode != NULL)
                      printf("0x%02X%02X%02X%02X\t%s\n",i0,i1,i2,i3,unicode);
                  } else {
                    fprintf(stderr,"%s: incomplete byte sequence\n",hexbuf(buf,4));
                    exit(1);
                  }
                }
              }
            }
          }
        }
      }
    }
  }

  if (iconv_close(cd) < 0) {
    perror("iconv_close");
    exit(1);
  }

  if (ferror(stdin) || ferror(stdout) || fclose(stdout)) {
    fprintf(stderr,"I/O error\n");
    exit(1);
  }

  exit(0);
}

[-- Attachment #3: table-to.c --]
[-- Type: text/x-csrc, Size: 3177 bytes --]

/* Copyright (C) 2000-2002, 2004-2005 Free Software Foundation, Inc.
   This file is part of the GNU LIBICONV Library.

   The GNU LIBICONV Library is free software; you can redistribute it
   and/or modify it under the terms of the GNU Library General Public
   License as published by the Free Software Foundation; either version 2
   of the License, or (at your option) any later version.

   The GNU LIBICONV Library is distributed in the hope that it will be
   useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU LIBICONV Library; see the file COPYING.LIB.
   If not, see <http://www.gnu.org/licenses/>.  */

/* Create a table from Unicode to CHARSET. */

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iconv.h>
#include <errno.h>

int main (int argc, char* argv[])
{
  const char* charset;
  iconv_t cd;
  int bmp_only;

  if (argc != 2) {
    fprintf(stderr,"Usage: table-to charset\n");
    exit(1);
  }
  charset = argv[1];

  cd = iconv_open(charset,"UCS-4LE");
  if (cd == (iconv_t)(-1)) {
    perror("iconv_open");
    exit(1);
  }

  /* When testing UTF-8, stop at 0x10000, otherwise the output file gets too
     big. */
  bmp_only = (strcmp(charset,"UTF-8") == 0);

  {
    unsigned int i;
    unsigned char buf[10];
    for (i = 0; i < (bmp_only ? 0x10000 : 0x110000); i++) {
      unsigned int in = i;
      const char* inbuf = (const char*) &in;
      size_t inbytesleft = sizeof(unsigned int);
      char* outbuf = (char*)buf;
      size_t outbytesleft = sizeof(buf);
      size_t result;
      size_t result2 = 0;
      iconv(cd,NULL,NULL,NULL,NULL);
      result = iconv(cd,(char**)&inbuf,&inbytesleft,&outbuf,&outbytesleft);
      if (result != (size_t)(-1))
        result2 = iconv(cd,NULL,NULL,&outbuf,&outbytesleft);
      if (result == (size_t)(-1) || result2 == (size_t)(-1)) {
        if (errno != EILSEQ) {
          int saved_errno = errno;
          fprintf(stderr,"0x%02X: iconv error: ",i);
          errno = saved_errno;
          perror("");
          exit(1);
        }
      } else if (result == 0) /* ignore conversions with transliteration */ {
        if (inbytesleft == 0 && outbytesleft < sizeof(buf)) {
          unsigned int jmax = sizeof(buf) - outbytesleft;
          unsigned int j;
          printf("0x");
          for (j = 0; j < jmax; j++)
            printf("%02X",buf[j]);
          printf("\t0x%04X\n",i);
        } else if (inbytesleft == 0 && i >= 0xe0000 && i < 0xe0080) {
          /* Language tags may silently be dropped. */
        } else {
          fprintf(stderr,"0x%02X: inbytes = %ld, outbytes = %ld\n",i,(long)(sizeof(unsigned int)-inbytesleft),(long)(sizeof(buf)-outbytesleft));
          exit(1);
        }
      }
    }
  }

  if (iconv_close(cd) < 0) {
    perror("iconv_close");
    exit(1);
  }

  if (ferror(stdin) || ferror(stdout) || fclose(stdout)) {
    fprintf(stderr,"I/O error\n");
    exit(1);
  }

  exit(0);
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Complete GB18030 charmap
  2012-02-04 18:03 Andreas Schwab
@ 2012-02-06  1:28 ` Carlos O'Donell
  0 siblings, 0 replies; 13+ messages in thread
From: Carlos O'Donell @ 2012-02-06  1:28 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-alpha

On Sat, Feb 4, 2012 at 11:59 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> 2012-02-04  Andreas Schwab  <schwab@linux-m68k.org>
>
>        [BZ #11837]
>        * iconvdata/gb18030.c: Update tables.
>        (BODY for FROM_LOOP): Update.  Handle two-byte encoded non-BMP
>        characters specially.
>        (BODY for TO_LOOP): Add encoding of missing ranges.
>
> localedata/:
>        [BZ #11837]
>        * charmaps/GB18030: Add missing characters.

I'm not a locale expert, and therefore I can't comment on the patch,
but if it fixes BZ#11837 then this is a good thing :-)

Is there any reason you haven't commented in BZ #11837 to indicate you
plan to fix the BZ?

I'd like to see us develop some nicer BZ etiquette which includes
updating issues if you're working on them.

Who has the knowledge to be able to review your patch?

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] Complete GB18030 charmap
@ 2012-02-04 18:03 Andreas Schwab
  2012-02-06  1:28 ` Carlos O'Donell
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Schwab @ 2012-02-04 18:03 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 318 bytes --]

2012-02-04  Andreas Schwab  <schwab@linux-m68k.org>

	[BZ #11837]
	* iconvdata/gb18030.c: Update tables.
	(BODY for FROM_LOOP): Update.  Handle two-byte encoded non-BMP
	characters specially.
	(BODY for TO_LOOP): Add encoding of missing ranges.

localedata/:
	[BZ #11837]
	* charmaps/GB18030: Add missing characters.


[-- Attachment #2: x.bz2 --]
[-- Type: application/x-bzip, Size: 456917 bytes --]

[-- Attachment #3: Type: text/plain, Size: 162 bytes --]


-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-02-08 19:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-09 14:33 [PATCH] Complete GB18030 charmap Andreas Schwab
2012-05-09 14:46 ` Carlos O'Donell
2012-05-09 21:08   ` Andreas Schwab
2012-05-09 22:06     ` Carlos O'Donell
2012-05-09 22:36       ` Andreas Schwab
2012-05-09 23:43         ` Carlos O'Donell
2012-05-10  8:10           ` Andreas Schwab
2012-05-10 12:40             ` Carlos O'Donell
2016-02-05 20:07     ` Florian Weimer
2016-02-08 19:59       ` Andreas Schwab
  -- strict thread matches above, loose matches on Subject: below --
2012-05-13 20:00 Bruno Haible
2012-02-04 18:03 Andreas Schwab
2012-02-06  1:28 ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).