From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ibuclaw@gdcproject.org>
Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171])
 by sourceware.org (Postfix) with ESMTPS id E5AA8386181A
 for <gcc-patches@gcc.gnu.org>; Thu,  3 Sep 2020 21:02:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E5AA8386181A
Received: from smtp2.mailbox.org (smtp2.mailbox.org [80.241.60.241])
 (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits))
 (No client certificate requested)
 by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4BjCv6180LzQlKl;
 Thu,  3 Sep 2020 23:02:58 +0200 (CEST)
X-Virus-Scanned: amavisd-new at heinlein-support.de
Received: from smtp2.mailbox.org ([80.241.60.241])
 by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de
 [80.241.56.125]) (amavisd-new, port 10030)
 with ESMTP id 84Wf_5k1LAme; Thu,  3 Sep 2020 23:02:54 +0200 (CEST)
Date: Thu, 03 Sep 2020 23:02:50 +0200
From: Iain Buclaw <ibuclaw@gdcproject.org>
Subject: Re: ubsan: d-demangle.c:214 signed integer overflow
To: Alan Modra <amodra@gmail.com>, gcc-patches@gcc.gnu.org
Cc: Ian Lance Taylor <ian@airs.com>
References: <20200903130116.GQ15695@bubble.grove.modra.org>
In-Reply-To: <20200903130116.GQ15695@bubble.grove.modra.org>
MIME-Version: 1.0
Message-Id: <1599163400.8r2ly1k30n.astroid@galago.none>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-MBO-SPAM-Probability: 
X-Rspamd-Score: -4.69 / 15.00 / 15.00
X-Rspamd-Queue-Id: A58A917AA
X-Rspamd-UID: 6951b2
X-Spam-Status: No, score=-15.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,
 RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Sep 2020 21:03:02 -0000

Excerpts from Alan Modra's message of September 3, 2020 3:01 pm:
> Running the libiberty testsuite
> ./test-demangle < libiberty/testsuite/d-demangle-expected
> libiberty/d-demangle.c:214:14: runtime error: signed integer overflow: 92=
2337203 * 10 cannot be represented in type 'long int'
>=20
> On looking at silencing ubsan, I found a real bug in dlang_number.
> For a 32-bit long, some overflows won't be detected.  For example,
> 21474836480.  Why?  Well 214748364 * 10 is 0x7FFFFFF8 (no overflow so
> far).  Adding 8 gives 0x80000000 (which does overflow but there is no
> test for that overflow in the code).  Then multiplying 0x80000000 * 10
> =3D 0x500000000 =3D 0 won't be caught by the multiplication overflow test=
.
> The same holds for a 64-bit long using similarly crafted digit
> sequences.
>=20
> This patch replaces the mod 10 test with a simpler limit test, and
> similarly the mod 26 test in dlang_decode_backref.
>=20
> About the limit test:
>   val * 10 + digit > ULONG_MAX is the condition for overflow
> ie.
>   val * 10 > ULONG_MAX - digit
> or
>   val > (ULONG_MAX - digit) / 10
> or assuming the largest digit
>   val > (ULONG_MAX - 9) / 10
>=20
> I resisted the aesthetic appeal of simplifying this further to
>   val > -10UL / 10
> since -1UL for ULONG_MAX is only correct for 2's complement numbers.
>=20
> Passes all the libiberty tests, on both 32-bit and 64-bit hosts.  OK
> to apply?
>=20

Thanks Alan, change seems reasonable, however on giving it a mull over,
I see that the largest number that dlang_number would need to be able to
handle is UINT_MAX.  These two tests which decode a wchar value are
representative of that (first is valid, second invalid).

#
--format=3Ddlang
_D4test21__T3funVwi4294967295Z3funFNaNbNiNfZv
test.fun!('\Uffffffff').fun()
#
--format=3Ddlang
_D4test21__T3funVwi4294967296Z3funFNaNbNiNfZv
_D4test21__T3funVwi4294967296Z3funFNaNbNiNfZv

I'm fine with creating a new PR and dealing with the above in a separate
change though, as it will require a few more replacements to adjust the
result parameter type to 'unsigned' or 'long long'. =20

Iain.


> 	* d-demangle.c: Include limits.h.
> 	(ULONG_MAX): Provide fall-back definition.
> 	(dlang_number): Simplify and correct overflow test.  Only
> 	write *ret on returning non-NULL.
> 	(dlang_decode_backref): Likewise.
>=20
> diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
> index f2d6946eca..59e6ae007a 100644
> --- a/libiberty/d-demangle.c
> +++ b/libiberty/d-demangle.c
> @@ -31,6 +31,9 @@ If not, see <http://www.gnu.org/licenses/>.  */
>  #ifdef HAVE_CONFIG_H
>  #include "config.h"
>  #endif
> +#ifdef HAVE_LIMITS_H
> +#include <limits.h>
> +#endif
> =20
>  #include "safe-ctype.h"
> =20
> @@ -45,6 +48,10 @@ If not, see <http://www.gnu.org/licenses/>.  */
>  #include <demangle.h>
>  #include "libiberty.h"
> =20
> +#ifndef ULONG_MAX
> +#define	ULONG_MAX	(~0UL)
> +#endif
> +
>  /* A mini string-handling package */
> =20
>  typedef struct string		/* Beware: these aren't required to be */
> @@ -207,24 +214,24 @@ dlang_number (const char *mangled, long *ret)
>    if (mangled =3D=3D NULL || !ISDIGIT (*mangled))
>      return NULL;
> =20
> -  (*ret) =3D 0;
> +  unsigned long val =3D 0;
> =20
>    while (ISDIGIT (*mangled))
>      {
> -      (*ret) *=3D 10;
> -
> -      /* If an overflow occured when multiplying by ten, the result
> -	 will not be a multiple of ten.  */
> -      if ((*ret % 10) !=3D 0)
> +      /* Check for overflow.  Yes, we return NULL here for some digits
> +	 that don't overflow "val * 10 + digit", but that doesn't
> +	 matter given the later "(long) val < 0" test.  */
> +      if (val > (ULONG_MAX - 9) / 10)
>  	return NULL;
> =20
> -      (*ret) +=3D mangled[0] - '0';
> +      val =3D val * 10 + mangled[0] - '0';
>        mangled++;
>      }
> =20
> -  if (*mangled =3D=3D '\0' || *ret < 0)
> +  if (*mangled =3D=3D '\0' || (long) val < 0)
>      return NULL;
> =20
> +  *ret =3D val;
>    return mangled;
>  }
> =20
> @@ -294,24 +301,24 @@ dlang_decode_backref (const char *mangled, long *re=
t)
>  	    [A-Z] NumberBackRef
>  	    ^
>     */
> -  (*ret) =3D 0;
> +  unsigned long val =3D 0;
> =20
>    while (ISALPHA (*mangled))
>      {
> -      (*ret) *=3D 26;
> +      /* Check for overflow.  */
> +      if (val > (ULONG_MAX - 25) / 26)
> +	break;
> =20
> -      /* If an overflow occured when multiplying by 26, the result
> -	 will not be a multiple of 26.  */
> -      if ((*ret % 26) !=3D 0)
> -	return NULL;
> +      val *=3D 26;
> =20
>        if (mangled[0] >=3D 'a' && mangled[0] <=3D 'z')
>  	{
> -	  (*ret) +=3D mangled[0] - 'a';
> +	  val +=3D mangled[0] - 'a';
> +	  *ret =3D val;
>  	  return mangled + 1;
>  	}
> =20
> -      (*ret) +=3D mangled[0] - 'A';
> +      val +=3D mangled[0] - 'A';
>        mangled++;
>      }
> =20
> --=20
> Alan Modra
> Australia Development Lab, IBM
>=20