From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 2C327385E00F for ; Wed, 24 Jan 2024 11:20:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2C327385E00F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2C327385E00F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706095221; cv=none; b=OOVESAwXIItvYlA0OHN+oSbwRoqMEr/Cn2fIkI1PkSHKKgipKGrDD5CDrMxikDDXQ+Np+6nWvUwcdwUzCPvyUDy4oCQhUaoX2IosH9h0no502BNvcDEwn9SUhb3PqLtjvEZ/n9M+5FrzX7QXFf85sSRfmSytxM29Y6ENXqv7apc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706095221; c=relaxed/simple; bh=Jo4kE2H9Zo32adl5tDV4UtX53ucCQzJ1gJ/QPtfr3us=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Ej9/ku1azmBYyug2IdvvOWfdQB1qSr4P9bpkDCd35NyDQATiKEritnEG5e5ssLofMPE1CtHFgdvZ51Omci4s2q3WiH0KwnlW0lzLJv+aKhqo5eCNEgHj58mRIFCTukhLBblmN3O2O5dI9tnZLmlxsmAuUzale6yZGH4E5FCqf/w= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706095218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=HpJ83DG0wwFoM9j+yltNRh3niEkNtn1iCY19MeSg/yQ=; b=acrmlnl/JW0xSAkVQto+zgg75W+6qBkoXnBBjBN8pv5FqDGc03iOEbANRfe4TkvDJ85zvm A2gke5N1mw2/wB42hXcSDjgeyxJsx/3flW4exopCd4pThzqgqAPx3hhf0DXfj1UeqJqGi+ N4reLEmJbkonrhzg/zmpSavydicNmPo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-441-m_lZmBy3MRi9Huc3INC87w-1; Wed, 24 Jan 2024 06:20:17 -0500 X-MC-Unique: m_lZmBy3MRi9Huc3INC87w-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0369285A58A for ; Wed, 24 Jan 2024 11:20:17 +0000 (UTC) Received: from hush.home.annexia.org (unknown [10.42.28.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 668BA51D5; Wed, 24 Jan 2024 11:20:16 +0000 (UTC) From: "Richard W.M. Jones" To: binutils@sourceware.org Cc: rjones@redhat.com Subject: [PATCH] binutils/windmc: Parse input correctly on big endian hosts Date: Wed, 24 Jan 2024 11:20:05 +0000 Message-ID: <20240124112014.2675193-1-rjones@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On big endian hosts (eg. s390x) the windmc tool fails to parse even trivial files: $ cat test.mc ; $ ./binutils/windmc ./test.mc In test.mc at line 1: parser: syntax error. In test.mc at line 1: fatal: syntax error. The tool starts by reading the input as Windows CP1252 and then converting it internally into an array of UTF-16LE, which it then processes as an array of unsigned short (typedef unichar). There are lots of ways this is wrong, but in the specific case of big endian machines the little endian pairs of bytes are byte-swapped. For example, the ';' character in the input above is first converted to UTF16-LE byte sequence { 0x3b, 0x00 }, which is then cast to unsigned short. On a big endian machine the first unichar appears to be 0x3b00. The lexer is unable to recognize this as the comment character ((unichar)';') and so parsing fails. The simple fix is to convert the input to UTF-16BE on big endian machines (and do the reverse conversion when writing the output). Fixes: https://sourceware.org/bugzilla/show_bug.cgi?id=31283 Signed-off-by: Richard W.M. Jones --- binutils/winduni.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/binutils/winduni.c b/binutils/winduni.c index 5b659764948..9406ca577a9 100644 --- a/binutils/winduni.c +++ b/binutils/winduni.c @@ -45,6 +45,19 @@ #include #endif +/* Best effort attempt to find out if we're running on a big endian + * system (only old ppc64 and s390x these days). Assume little endian + * if the macros are not defined. On big endian, unichar will be a + * big endian short. + */ +#if defined(__BYTE_ORDER__) && \ + defined(__ORDER_BIG_ENDIAN__) && \ + __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ +#define IS_LITTLE_ENDIAN 0 +#else +#define IS_LITTLE_ENDIAN 1 +#endif + static rc_uint_type wind_WideCharToMultiByte (rc_uint_type, const unichar *, char *, rc_uint_type); static rc_uint_type wind_MultiByteToWideChar (rc_uint_type, const char *, unichar *, rc_uint_type); static int unichar_isascii (const unichar *, rc_uint_type); @@ -771,7 +784,8 @@ wind_MultiByteToWideChar (rc_uint_type cp, const char *mb, if (!mb || !iconv_name) return 0; - iconv_t cd = iconv_open ("UTF-16LE", iconv_name); + iconv_t cd = iconv_open (IS_LITTLE_ENDIAN ? "UTF-16LE" : "UTF-16BE", + iconv_name); while (1) { @@ -844,7 +858,8 @@ wind_WideCharToMultiByte (rc_uint_type cp, const unichar *u, char *mb, rc_uint_t if (!u || !iconv_name) return 0; - iconv_t cd = iconv_open (iconv_name, "UTF-16LE"); + iconv_t cd = iconv_open (iconv_name, + IS_LITTLE_ENDIAN ? "UTF-16LE" : "UTF-16BE"); while (1) { -- 2.43.0