public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: "Richard W.M. Jones" <rjones@redhat.com>
To: binutils@sourceware.org
Cc: rjones@redhat.com
Subject: [PATCH v2] binutils/windmc: Parse input correctly on big endian hosts
Date: Wed, 24 Jan 2024 12:25:23 +0000	[thread overview]
Message-ID: <20240124122523.384659-2-rjones@redhat.com> (raw)
In-Reply-To: <20240124122523.384659-1-rjones@redhat.com>

On big endian hosts (eg. s390x) the windmc tool fails to parse even
trivial files:

  $ cat test.mc
  ;
  $ ./binutils/windmc ./test.mc
  In test.mc at line 1: parser: syntax error.
  In test.mc at line 1: fatal: syntax error.

The tool starts by reading the input as Windows CP1252 and then
converting it internally into an array of UTF-16LE, which it then
processes as an array of unsigned short (typedef unichar).

There are lots of ways this is wrong, but in the specific case of big
endian machines the little endian pairs of bytes are byte-swapped.

For example, the ';' character in the input above is first converted
to UTF16-LE byte sequence { 0x3b, 0x00 }, which is then cast to
unsigned short.  On a big endian machine the first unichar appears to
be 0x3b00.  The lexer is unable to recognize this as the comment
character ((unichar)';') and so parsing fails.

The simple fix is to convert the input to UTF-16BE on big endian
machines (and do the reverse conversion when writing the output).

Fixes: https://sourceware.org/bugzilla/show_bug.cgi?id=31283
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
---
 binutils/configure.ac |  2 ++
 binutils/winduni.c    | 16 ++++++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/binutils/configure.ac b/binutils/configure.ac
index b03e36c9e0e..dac72c1bdd4 100644
--- a/binutils/configure.ac
+++ b/binutils/configure.ac
@@ -31,6 +31,8 @@ AC_PROG_CC
 AC_GNU_SOURCE
 AC_USE_SYSTEM_EXTENSIONS
 
+AC_C_BIGENDIAN
+
 LT_INIT
 ACX_LARGEFILE
 
diff --git a/binutils/winduni.c b/binutils/winduni.c
index 5b659764948..f19de4f8cb3 100644
--- a/binutils/winduni.c
+++ b/binutils/winduni.c
@@ -771,7 +771,13 @@ wind_MultiByteToWideChar (rc_uint_type cp, const char *mb,
 
   if (!mb || !iconv_name)
     return 0;
-  iconv_t cd = iconv_open ("UTF-16LE", iconv_name);
+  iconv_t cd = iconv_open (
+#if WORDS_BIGENDIAN
+			   "UTF-16BE",
+#else
+			   "UTF-16LE",
+#endif
+			   iconv_name);
 
   while (1)
     {
@@ -844,7 +850,13 @@ wind_WideCharToMultiByte (rc_uint_type cp, const unichar *u, char *mb, rc_uint_t
 
   if (!u || !iconv_name)
     return 0;
-  iconv_t cd = iconv_open (iconv_name, "UTF-16LE");
+  iconv_t cd = iconv_open (iconv_name,
+#if WORDS_BIGENDIAN
+			   "UTF-16BE"
+#else
+			   "UTF-16LE"
+#endif
+			   );
 
   while (1)
     {
-- 
2.39.3


  reply	other threads:[~2024-01-24 12:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24 12:25 Richard W.M. Jones
2024-01-24 12:25 ` Richard W.M. Jones [this message]
2024-01-25 15:18   ` Nick Clifton
2024-01-25 15:52     ` Richard W.M. Jones
2024-01-26 10:08       ` Nick Clifton
2024-01-26 10:12         ` Richard W.M. Jones
2024-01-26 11:51           ` Nick Clifton
2024-02-08  9:50             ` Alan Modra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240124122523.384659-2-rjones@redhat.com \
    --to=rjones@redhat.com \
    --cc=binutils@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).