From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ibuclaw@gdcproject.org>
Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org
 [IPv6:2001:67c:2050::465:202])
 by sourceware.org (Postfix) with ESMTPS id E3CD83857819
 for <gcc-patches@gcc.gnu.org>; Fri,  4 Sep 2020 11:22:33 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E3CD83857819
Received: from smtp1.mailbox.org (smtp1.mailbox.org [80.241.60.240])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest
 SHA256) (No client certificate requested)
 by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4BjZyw3ZffzQl8P;
 Fri,  4 Sep 2020 13:22:32 +0200 (CEST)
X-Virus-Scanned: amavisd-new at heinlein-support.de
Received: from smtp1.mailbox.org ([80.241.60.240])
 by spamfilter01.heinlein-hosting.de (spamfilter01.heinlein-hosting.de
 [80.241.56.115]) (amavisd-new, port 10030)
 with ESMTP id kAbSHl4Sptj5; Fri,  4 Sep 2020 13:22:28 +0200 (CEST)
Date: Fri, 04 Sep 2020 13:22:22 +0200
From: Iain Buclaw <ibuclaw@gdcproject.org>
Subject: Re: ubsan: d-demangle.c:214 signed integer overflow
To: Alan Modra <amodra@gmail.com>
Cc: gcc-patches@gcc.gnu.org, Ian Lance Taylor <ian@airs.com>
References: <20200903130116.GQ15695@bubble.grove.modra.org>
 <1599163400.8r2ly1k30n.astroid@galago.none>
 <20200904005939.GR15695@bubble.grove.modra.org>
In-Reply-To: <20200904005939.GR15695@bubble.grove.modra.org>
MIME-Version: 1.0
Message-Id: <1599207109.qqwjbgoeso.astroid@galago.none>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-MBO-SPAM-Probability: 
X-Rspamd-Score: -3.63 / 15.00 / 15.00
X-Rspamd-Queue-Id: 1D2B3272
X-Rspamd-UID: a7cd74
X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Sep 2020 11:22:36 -0000

Excerpts from Alan Modra's message of September 4, 2020 2:59 am:
> On Thu, Sep 03, 2020 at 11:02:50PM +0200, Iain Buclaw wrote:
>> Excerpts from Alan Modra's message of September 3, 2020 3:01 pm:
>> > Running the libiberty testsuite
>> > ./test-demangle < libiberty/testsuite/d-demangle-expected
>> > libiberty/d-demangle.c:214:14: runtime error: signed integer overflow:=
 922337203 * 10 cannot be represented in type 'long int'
>> >=20
>> > On looking at silencing ubsan, I found a real bug in dlang_number.
>> > For a 32-bit long, some overflows won't be detected.  For example,
>> > 21474836480.  Why?  Well 214748364 * 10 is 0x7FFFFFF8 (no overflow so
>> > far).  Adding 8 gives 0x80000000 (which does overflow but there is no
>> > test for that overflow in the code).  Then multiplying 0x80000000 * 10
>> > =3D 0x500000000 =3D 0 won't be caught by the multiplication overflow t=
est.
>> > The same holds for a 64-bit long using similarly crafted digit
>> > sequences.
>> >=20
>> > This patch replaces the mod 10 test with a simpler limit test, and
>> > similarly the mod 26 test in dlang_decode_backref.
>> >=20
>> > About the limit test:
>> >   val * 10 + digit > ULONG_MAX is the condition for overflow
>> > ie.
>> >   val * 10 > ULONG_MAX - digit
>> > or
>> >   val > (ULONG_MAX - digit) / 10
>> > or assuming the largest digit
>> >   val > (ULONG_MAX - 9) / 10
>> >=20
>> > I resisted the aesthetic appeal of simplifying this further to
>> >   val > -10UL / 10
>> > since -1UL for ULONG_MAX is only correct for 2's complement numbers.
>> >=20
>> > Passes all the libiberty tests, on both 32-bit and 64-bit hosts.  OK
>> > to apply?
>> >=20
>>=20
>> Thanks Alan, change seems reasonable, however on giving it a mull over,
>> I see that the largest number that dlang_number would need to be able to
>> handle is UINT_MAX.  These two tests which decode a wchar value are
>> representative of that (first is valid, second invalid).
>>=20
>> #
>> --format=3Ddlang
>> _D4test21__T3funVwi4294967295Z3funFNaNbNiNfZv
>> test.fun!('\Uffffffff').fun()
>> #
>> --format=3Ddlang
>> _D4test21__T3funVwi4294967296Z3funFNaNbNiNfZv
>> _D4test21__T3funVwi4294967296Z3funFNaNbNiNfZv
>=20
> Hmm, OK, on a 32-bit host those both won't be handled due to the
> "(long) val < 0" test.
>=20
> Do you really want the last one to fail on a 64-bit host, ie. not
> produce "test.fun!('\U100000000').fun()"?  I'm guessing you do, but
> I'd like that confirmed before posting a followup patch.
>=20

Actually I meant dchar (char32_t in C++11), and yes, it should fail on a
64-bit host, as there is no such thing as a 64-bit character type.
Though arguably, that could be handled by the caller in
dlang_parse_integer.

Looking at the places where dlang_number is used, I see:

- Get length of next symbol (unlikely to be bigger than UCHAR_MAX).
- Get length of embedded template symbol (can get very long, but rarely
  to the point where USHRT_MAX is exceeded).
- Get number of elements in an encoded tuple, string, or array literal
  (anything bigger than INT_MAX not allowed at compile-time).
- Get an encoded boolean value (0 or 1).
- Get an encoded character value (max valid value is 0x0010FFFF, but as
  dchar's an unsigned 32-bit type, might as well allow up to UINT_MAX).

So I think UINT_MAX is a reasonable artificial limit.

>> I'm fine with creating a new PR and dealing with the above in a separate
>> change though, as it will require a few more replacements to adjust the
>> result parameter type to 'unsigned' or 'long long'. =20
>=20
> Ha!  When I wrote the original patch I changed most of the "long"
> parameters and variables to "unsigned long", because it was clear from
> the code (and my understanding of name mangling) that the values were
> in fact always unsigned.  I also changed "int" to "size_t" where
> appropriate, not that you need to handle strings larger than 2G in
> length but simply that size_t is the correct type to use with string
> functions, malloc, memcpy and so on.  I decided to remove all those
> changes before posting as they were just tidies, I thought.
>=20

Yeah, signedness is handled outside of dlang_number, so only need to
treat all numbers as unsigned for the purpose of decoding.

Apologies that you seem to have opened a can of worms here. :-)

Iain.