public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] binutils/windmc: Parse input correctly on big endian hosts
@ 2024-01-24 11:20 Richard W.M. Jones
  2024-01-24 11:55 ` Richard W.M. Jones
  0 siblings, 1 reply; 2+ messages in thread
From: Richard W.M. Jones @ 2024-01-24 11:20 UTC (permalink / raw)
  To: binutils; +Cc: rjones

On big endian hosts (eg. s390x) the windmc tool fails to parse even
trivial files:

  $ cat test.mc
  ;
  $ ./binutils/windmc ./test.mc
  In test.mc at line 1: parser: syntax error.
  In test.mc at line 1: fatal: syntax error.

The tool starts by reading the input as Windows CP1252 and then
converting it internally into an array of UTF-16LE, which it then
processes as an array of unsigned short (typedef unichar).

There are lots of ways this is wrong, but in the specific case of big
endian machines the little endian pairs of bytes are byte-swapped.

For example, the ';' character in the input above is first converted
to UTF16-LE byte sequence { 0x3b, 0x00 }, which is then cast to
unsigned short.  On a big endian machine the first unichar appears to
be 0x3b00.  The lexer is unable to recognize this as the comment
character ((unichar)';') and so parsing fails.

The simple fix is to convert the input to UTF-16BE on big endian
machines (and do the reverse conversion when writing the output).

Fixes: https://sourceware.org/bugzilla/show_bug.cgi?id=31283
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
---
 binutils/winduni.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/binutils/winduni.c b/binutils/winduni.c
index 5b659764948..9406ca577a9 100644
--- a/binutils/winduni.c
+++ b/binutils/winduni.c
@@ -45,6 +45,19 @@
 #include <iconv.h>
 #endif
 
+/* Best effort attempt to find out if we're running on a big endian
+ * system (only old ppc64 and s390x these days).  Assume little endian
+ * if the macros are not defined.  On big endian, unichar will be a
+ * big endian short.
+ */
+#if defined(__BYTE_ORDER__) && \
+    defined(__ORDER_BIG_ENDIAN__) && \
+    __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define IS_LITTLE_ENDIAN 0
+#else
+#define IS_LITTLE_ENDIAN 1
+#endif
+
 static rc_uint_type wind_WideCharToMultiByte (rc_uint_type, const unichar *, char *, rc_uint_type);
 static rc_uint_type wind_MultiByteToWideChar (rc_uint_type, const char *, unichar *, rc_uint_type);
 static int unichar_isascii (const unichar *, rc_uint_type);
@@ -771,7 +784,8 @@ wind_MultiByteToWideChar (rc_uint_type cp, const char *mb,
 
   if (!mb || !iconv_name)
     return 0;
-  iconv_t cd = iconv_open ("UTF-16LE", iconv_name);
+  iconv_t cd = iconv_open (IS_LITTLE_ENDIAN ? "UTF-16LE" : "UTF-16BE",
+			   iconv_name);
 
   while (1)
     {
@@ -844,7 +858,8 @@ wind_WideCharToMultiByte (rc_uint_type cp, const unichar *u, char *mb, rc_uint_t
 
   if (!u || !iconv_name)
     return 0;
-  iconv_t cd = iconv_open (iconv_name, "UTF-16LE");
+  iconv_t cd = iconv_open (iconv_name,
+			   IS_LITTLE_ENDIAN ? "UTF-16LE" : "UTF-16BE");
 
   while (1)
     {
-- 
2.43.0


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] binutils/windmc: Parse input correctly on big endian hosts
  2024-01-24 11:20 [PATCH] binutils/windmc: Parse input correctly on big endian hosts Richard W.M. Jones
@ 2024-01-24 11:55 ` Richard W.M. Jones
  0 siblings, 0 replies; 2+ messages in thread
From: Richard W.M. Jones @ 2024-01-24 11:55 UTC (permalink / raw)
  To: binutils

On Wed, Jan 24, 2024 at 11:20:05AM +0000, Richard W.M. Jones wrote:
> +/* Best effort attempt to find out if we're running on a big endian
> + * system (only old ppc64 and s390x these days).  Assume little endian
> + * if the macros are not defined.  On big endian, unichar will be a
> + * big endian short.
> + */
> +#if defined(__BYTE_ORDER__) && \
> +    defined(__ORDER_BIG_ENDIAN__) && \
> +    __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> +#define IS_LITTLE_ENDIAN 0
> +#else
> +#define IS_LITTLE_ENDIAN 1
> +#endif

I just found a thread about how to find host endianness:

https://sourceware.org/pipermail/binutils/2024-January/131506.html

so I'll rework this patch, except v2 soon.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-01-24 11:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-24 11:20 [PATCH] binutils/windmc: Parse input correctly on big endian hosts Richard W.M. Jones
2024-01-24 11:55 ` Richard W.M. Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).