public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug stdio/30883] New: with a field width, print/sprintf may output an additional space character in multibyte locales like ps_AF
@ 2023-09-25  0:10 vincent-srcware at vinc17 dot net
  2023-09-25 12:39 ` [Bug stdio/30883] " fweimer at redhat dot com
  2023-09-25 13:09 ` vincent-srcware at vinc17 dot net
  0 siblings, 2 replies; 3+ messages in thread
From: vincent-srcware at vinc17 dot net @ 2023-09-25  0:10 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30883

            Bug ID: 30883
           Summary: with a field width, print/sprintf may output an
                    additional space character in multibyte locales like
                    ps_AF
           Product: glibc
           Version: 2.37
            Status: UNCONFIRMED
          Severity: critical
          Priority: P2
         Component: stdio
          Assignee: unassigned at sourceware dot org
          Reporter: vincent-srcware at vinc17 dot net
  Target Milestone: ---

The ISO C standard says about the field width: "If the converted value has
fewer characters than the field width, it is padded with spaces (by default) on
the left (or right, if the left adjustment flag, described later, has been
given) to the field width."

By "characters", it is meant bytes, even in multibyte locales like UTF-8 (glibc
normally follows that, as this can be seen with %s). But with %g, a multibyte
decimal-point character yields the output of (at least) an additional space in
the padding. Since the field width can be a way to fix/limit the size of the
output, this can trigger a buffer overflow.

Example:

#include <stdio.h>
#include <float.h>
#include <string.h>
#include <locale.h>

static void f (void)
{
  char s[256];
  double x = .1;
  printf ("[%8g]\n", x);
  sprintf (s, "%8g", x);
  printf ("%zu\n", strlen (s));
}

int main (void)
{
  f ();
  setlocale (LC_ALL, "ps_AF");
  f ();
  return 0;
}

With the ps_AF locale available, where the decimal-point character is U+066B
ARABIC DECIMAL SEPARATOR, encoded as D9 AB (UTF-8), this gives on my Debian
machine:

[     0.1]
8
[     0٫1]
9

With x being 0.1, the %g output should normally fit in 8 bytes, but in the
ps_AF locale, 9 bytes are output for %g!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug stdio/30883] with a field width, print/sprintf may output an additional space character in multibyte locales like ps_AF
  2023-09-25  0:10 [Bug stdio/30883] New: with a field width, print/sprintf may output an additional space character in multibyte locales like ps_AF vincent-srcware at vinc17 dot net
@ 2023-09-25 12:39 ` fweimer at redhat dot com
  2023-09-25 13:09 ` vincent-srcware at vinc17 dot net
  1 sibling, 0 replies; 3+ messages in thread
From: fweimer at redhat dot com @ 2023-09-25 12:39 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30883

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://sourceware.org/bugz
                   |                            |illa/show_bug.cgi?id=28943
                 CC|                            |fweimer at redhat dot com

--- Comment #1 from Florian Weimer <fweimer at redhat dot com> ---
See bug 28943.

Your output shows why the current behavior makes sense: It achieves column
alignment. Padding based on byte width makes little sense for multi-byte
locales.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug stdio/30883] with a field width, print/sprintf may output an additional space character in multibyte locales like ps_AF
  2023-09-25  0:10 [Bug stdio/30883] New: with a field width, print/sprintf may output an additional space character in multibyte locales like ps_AF vincent-srcware at vinc17 dot net
  2023-09-25 12:39 ` [Bug stdio/30883] " fweimer at redhat dot com
@ 2023-09-25 13:09 ` vincent-srcware at vinc17 dot net
  1 sibling, 0 replies; 3+ messages in thread
From: vincent-srcware at vinc17 dot net @ 2023-09-25 13:09 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30883

--- Comment #2 from Vincent Lefèvre <vincent-srcware at vinc17 dot net> ---
(In reply to Florian Weimer from comment #1)
> Your output shows why the current behavior makes sense: It achieves column
> alignment. Padding based on byte width makes little sense for multi-byte
> locales.

This is a point of view, but this is not what was decided for ISO C.

And note that glibc already behaves differently for %s, as I've said:

  printf ("[%8s]\n", "éèê");
  printf ("[%8s]\n", "eee");

gives

[  éèê]
[     eee]

(both 8 bytes). There is no alignment.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-09-25 13:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-25  0:10 [Bug stdio/30883] New: with a field width, print/sprintf may output an additional space character in multibyte locales like ps_AF vincent-srcware at vinc17 dot net
2023-09-25 12:39 ` [Bug stdio/30883] " fweimer at redhat dot com
2023-09-25 13:09 ` vincent-srcware at vinc17 dot net

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).