[Bug localedata/16067] New: int_curr_symbol processed incorrectly

public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed

* [Bug localedata/16067] New: int_curr_symbol processed incorrectly
@ 2013-10-20 21:51 van.de.bugger at gmail dot com
  2014-05-21 13:33 ` [Bug localedata/16067] " myllynen at redhat dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: van.de.bugger at gmail dot com @ 2013-10-20 21:51 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=16067

            Bug ID: 16067
           Summary: int_curr_symbol processed incorrectly
           Product: glibc
           Version: 2.17
            Status: NEW
          Severity: minor
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: van.de.bugger at gmail dot com
                CC: libc-locales at sourceware dot org

http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html:

> int_curr_symbol
>    The international currency symbol. The operand shall be a four-character
> string, with the first three characters containing the alphabetic
> international currency symbol. The international currency symbol should be
> chosen in accordance with those specified in the ISO 4217 standard. The fourth
> character shall be the character used to separate the international currency
> symbol from the monetary quantity.

Note: they require 4-CHARACTER string.

In my custom locale I have a line:

> LC_MONETARY
> int_curr_symbol     "RUB<U00A0>"

U00A0 is a non-breaking space. It is one character.

When building such a locale, localedef issues a warning:

> LC_MONETARY: value of field `int_curr_symbol' has wrong length

Being used, it prodices wrong result:

> RUB�10 999.95

Note the wrong character between currency name and quantity. Looks like
localedef uses only 4 BYTES of int_curr_symbol, while it should use 4
CHARACTERS.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/16067] int_curr_symbol processed incorrectly
  2013-10-20 21:51 [Bug localedata/16067] New: int_curr_symbol processed incorrectly van.de.bugger at gmail dot com
@ 2014-05-21 13:33 ` myllynen at redhat dot com
  2014-06-13 12:41 ` fweimer at redhat dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: myllynen at redhat dot com @ 2014-05-21 13:33 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=16067

Marko Myllynen <myllynen at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |myllynen at redhat dot com

--- Comment #1 from Marko Myllynen <myllynen at redhat dot com> ---
Using Unicode code points instead of letters should solve your issue:

localhost:~> grep int_curr_symbol te_ST
int_curr_symbol      "<U0052><U0055><U0042><U0020>"
localhost:~> export I18NPATH=./locale-test/
localhost:~> export LOCPATH=./locale-test/ 
localhost:~> mkdir -p $LOCPATH
localhost:~> localedef --no-archive -f UTF-8 -i te_ST $I18NPATH/te_ST.UTF-8    
localhost:~> LC_ALL=te_ST.UTF-8 locale -k int_curr_symbol
int_curr_symbol="RUB "

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/16067] int_curr_symbol processed incorrectly
  2013-10-20 21:51 [Bug localedata/16067] New: int_curr_symbol processed incorrectly van.de.bugger at gmail dot com
  2014-05-21 13:33 ` [Bug localedata/16067] " myllynen at redhat dot com
@ 2014-06-13 12:41 ` fweimer at redhat dot com
  2014-10-28 11:32 ` van.de.bugger at gmail dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 12:41 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=16067

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/16067] int_curr_symbol processed incorrectly
  2013-10-20 21:51 [Bug localedata/16067] New: int_curr_symbol processed incorrectly van.de.bugger at gmail dot com
  2014-05-21 13:33 ` [Bug localedata/16067] " myllynen at redhat dot com
  2014-06-13 12:41 ` fweimer at redhat dot com
@ 2014-10-28 11:32 ` van.de.bugger at gmail dot com
  2014-10-28 11:32 ` myllynen at redhat dot com
  2016-04-19 19:33 ` [Bug localedata/16067] localedef: int_curr_symbol: non-ASCII data " vapier at gentoo dot org
  4 siblings, 0 replies; 6+ messages in thread
From: van.de.bugger at gmail dot com @ 2014-10-28 11:32 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=16067

--- Comment #2 from van.de.bugger at gmail dot com ---
> Using Unicode code points instead of letters should solve your issue:
> 
> localhost:~> grep int_curr_symbol te_ST
> int_curr_symbol      "<U0052><U0055><U0042><U0020>"

You are wrong. You use regular space U+0020. Use no-break space U+00A0 to
reproduce the bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/16067] int_curr_symbol processed incorrectly
  2013-10-20 21:51 [Bug localedata/16067] New: int_curr_symbol processed incorrectly van.de.bugger at gmail dot com
                   ` (2 preceding siblings ...)
  2014-10-28 11:32 ` van.de.bugger at gmail dot com
@ 2014-10-28 11:32 ` myllynen at redhat dot com
  2016-04-19 19:33 ` [Bug localedata/16067] localedef: int_curr_symbol: non-ASCII data " vapier at gentoo dot org
  4 siblings, 0 replies; 6+ messages in thread
From: myllynen at redhat dot com @ 2014-10-28 11:32 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=16067

--- Comment #3 from Marko Myllynen <myllynen at redhat dot com> ---
(In reply to van.de.bugger from comment #2)
> > Using Unicode code points instead of letters should solve your issue:
> > 
> > localhost:~> grep int_curr_symbol te_ST
> > int_curr_symbol      "<U0052><U0055><U0042><U0020>"
> 
> You use regular space U+0020. Use no-break space U+00A0 to reproduce the bug.

Ah, correct, then I indeed get the same result as you do.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/16067] localedef: int_curr_symbol: non-ASCII data processed incorrectly
  2013-10-20 21:51 [Bug localedata/16067] New: int_curr_symbol processed incorrectly van.de.bugger at gmail dot com
                   ` (3 preceding siblings ...)
  2014-10-28 11:32 ` myllynen at redhat dot com
@ 2016-04-19 19:33 ` vapier at gentoo dot org
  4 siblings, 0 replies; 6+ messages in thread
From: vapier at gentoo dot org @ 2016-04-19 19:33 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=16067

Mike Frysinger <vapier at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|int_curr_symbol processed   |localedef: int_curr_symbol:
                   |incorrectly                 |non-ASCII data processed
                   |                            |incorrectly

--- Comment #4 from Mike Frysinger <vapier at gentoo dot org> ---
the code uses strlen which counts bytes rather than characters.  when you use
chars higher than 0x7e, then it takes two bytes to represent it, and then it
pushes from 4 to >4.

locales/programs/ld-monetary.c:
      if (strlen (monetary->int_curr_symbol) != 4)

probably should be using mbstowcs, but then it'd mean we'd dependent upon the
current locale in order to process the inputs correctly.  which we need to sort
out anyways though in order to convert the inputs to UTF-8.

i don't know why it's rendering incorrectly for you though ... that looks like
it's also chopping at 4 bytes, but i'm not seeing code that does that.  it
should be loaded as an arbitrary string of arbitrary length.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-04-19 19:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-20 21:51 [Bug localedata/16067] New: int_curr_symbol processed incorrectly van.de.bugger at gmail dot com
2014-05-21 13:33 ` [Bug localedata/16067] " myllynen at redhat dot com
2014-06-13 12:41 ` fweimer at redhat dot com
2014-10-28 11:32 ` van.de.bugger at gmail dot com
2014-10-28 11:32 ` myllynen at redhat dot com
2016-04-19 19:33 ` [Bug localedata/16067] localedef: int_curr_symbol: non-ASCII data " vapier at gentoo dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).