From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Hbo6=FA=gmail.com=ma.uecker@sourceware.org>
Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434])
	by sourceware.org (Postfix) with ESMTPS id A36E43858D33
	for <gcc@gcc.gnu.org>; Sat, 16 Sep 2023 08:36:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A36E43858D33
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-wr1-x434.google.com with SMTP id ffacd0b85a97d-317c3ac7339so2546575f8f.0
        for <gcc@gcc.gnu.org>; Sat, 16 Sep 2023 01:36:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1694853417; x=1695458217; darn=gcc.gnu.org;
        h=mime-version:user-agent:content-transfer-encoding:in-reply-to:date
         :cc:to:from:subject:message-id:from:to:cc:subject:date:message-id
         :reply-to;
        bh=UDjMuodBmDCxC2CRHe2Ok/RTO53Yvwk9DkqLTVtlt3M=;
        b=P8714rdTYq3sW8Adf4VattAIg8DUBsfCQWl1XOlKGaK5mkF8deRrZus1crynsBbg99
         YISckz/n+qcuV8YjKjWZ0UKaOnakrB7nLS7zWSR+Nl4MF/n6NWIjioSqt8hXdhnJCsco
         1/n9MHIKL8BsdckV+y+aR1jxz53QGrbTasfcmPC5vqabxpTYiWTiV7btWgeuryJ6wp8F
         Lkb8Xhc/bnSXfIAGX7fcho966IJjcmuhAVKxknTbQzqmUiy+IJor1pgHmY19fXZvmu5y
         yYJDVmhNuR/fa2bigqqV82JWd1J1oHNkE7A8b3PT7njVllYWjumS3xhWDnH2K4+ei0RA
         8Xdg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1694853417; x=1695458217;
        h=mime-version:user-agent:content-transfer-encoding:in-reply-to:date
         :cc:to:from:subject:message-id:x-gm-message-state:from:to:cc:subject
         :date:message-id:reply-to;
        bh=UDjMuodBmDCxC2CRHe2Ok/RTO53Yvwk9DkqLTVtlt3M=;
        b=XC+uNA9Ta8hrahD7ynddASzc/KmcgMowEag01f21YbY4TGKo/phBmKXClLBBKfO2H5
         XSp5s96KS7v7qlQ5+iG24qn2tox4xzvdoZN95Jzkcf71WkkHQ6MeLxN/RgHgu0tgqn4q
         /DyG4OW73i0xVoncxxj960ej4rqV/dG/2SGtvvAKfikToUpWEzmaiapFKRpgIEpDVkNJ
         AQR2pKHb6dzTO+yZMRjQ2Sjc9cBsRCEZJFTMgLOl6N63QyrpUKhAJYTeq3WUaQIWQzGh
         NzrTmYgJMLHihYwisIomyGOIc8zrLQ2RYff5epoOcjt3kHkX9ir2RYeasQtrd8WDdHUl
         CeJg==
X-Gm-Message-State: AOJu0Yw/1ypB3KH0/mu/ezPC/Mk74ErFtYRihiqcsulZ4NyzzW71XQWa
	vSwKv7AUYRDKVz+Uc3vaBgs=
X-Google-Smtp-Source: AGHT+IGzbERr9lA7zfllOFIYckix74EZzMaLX2T3iDiZExdaOR2lGyybU6uVze59hjNrAcJqp3m5oQ==
X-Received: by 2002:a5d:6b90:0:b0:31a:d4e1:ea30 with SMTP id n16-20020a5d6b90000000b0031ad4e1ea30mr3137654wrx.17.1694853416774;
        Sat, 16 Sep 2023 01:36:56 -0700 (PDT)
Received: from 2a02-8388-e203-9700-eddb-fb4f-5189-911d.cable.dynamic.v6.surfer.at (2a02-8388-e203-9700-eddb-fb4f-5189-911d.cable.dynamic.v6.surfer.at. [2a02:8388:e203:9700:eddb:fb4f:5189:911d])
        by smtp.gmail.com with ESMTPSA id k8-20020a5d4288000000b00317a29af4b2sm6545691wrq.68.2023.09.16.01.36.55
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 16 Sep 2023 01:36:56 -0700 (PDT)
Message-ID: <228be10b14dabe4e8216f9f93da99f629328b3e0.camel@gmail.com>
Subject: Re: Question on -fwrapv and -fwrapv-pointer
From: Martin Uecker <ma.uecker@gmail.com>
To: keescook@chromium.org
Cc: gcc@gcc.gnu.org
Date: Sat, 16 Sep 2023 10:36:52 +0200
In-Reply-To: <202309151107.C2F7D9A01@keescook>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
User-Agent: Evolution 3.46.4-2 
MIME-Version: 1.0
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc.gcc.gnu.org>


(moved to gcc@)

> On Fri, Sep 15, 2023 at 08:18:28AM -0700, Andrew Pinski wrote:
> > On Fri, Sep 15, 2023 at 8:12=E2=80=AFAM Qing Zhao <qing.zhao@oracle.com=
> wrote:
> > >
> > >
> > >
> > > > On Sep 15, 2023, at 3:43 AM, Xi Ruoyao <xry111@xry111.site> wrote:
> > > >
> > > > On Thu, 2023-09-14 at 21:41 +0000, Qing Zhao wrote:
> > > >>>> CLANG already provided -fsanitize=3Dunsigned-integer-overflow. G=
CC
> > > >>>> might need to do the same.
> > > >>>
> > > >>> NO. There is no such thing as unsigned integer overflow. That opt=
ion
> > > >>> is badly designed and the GCC community has rejected a few times =
now
> > > >>> having that sanitizer before. It is bad form to have a sanitizer =
for
> > > >>> well defined code.
> > > >>
> > > >> Even though unsigned integer overflow is well defined, it might be
> > > >> unintentional, shall we warn user about this?
> > > >
> > > > *Everything* could be unintentional and should be warned then.  GCC=
 is a
> > > > compiler, not an advanced AI educating the programmers.
> > >
> > > Well, you are right in some sense. -:)
> > >
> > > However, overflow is one important source for security flaws, it=E2=
=80=99s important  for compilers to detect
> > > overflows in the programs in general.
> >=20
> > Except it is NOT an overflow. Rather it is wrapping. That is a big
> > point here. unsigned wraps and does NOT overflow. Yes there is a major
> > difference.
>=20
> Right, yes. I will try to pick my language very carefully. :)
>=20
> The practical problem I am trying to solve in the 30 million lines of
> Linux kernel code is that of catching arithmetic wrap-around. The
> problem is one of evolving the code -- I can't just drop -fwrapv and
> -fwrapv-pointer because it's not possible to fix all the cases at once.
> (And we really don't want to reintroduce undefined behavior.)
>=20
> So, for signed, pointer, and unsigned types, we need:
>=20
> a) No arithmetic UB -- everything needs to have deterministic behavior.
>    The current solution here is "-fno-strict-overflow", which eliminates
>    the UB and makes sure everything wraps.
>=20
> b) A way to run-time warn/trap on overflow/underflow/wrap-around. This
>    would work with -fsanitize=3D[signed-integer|pointer]-overflow except
>    due to "a)" we always wrap. And there isn't currently coverage like
>    this for unsigned (in GCC).
>=20
> Our problem is that the kernel is filled with a mix of places where there
> is intended wrap-around and unintended wrap-around. We can chip away at
> fixing the intended wrap-around that we can find with static analyzers,
> etc, but at the end of the day there is a long tail of finding the places
> where intended wrap-around is hiding. But when the refactoring is
> sufficiently completely, we can move the wrap-around warning to a trap,
> and the kernel will not longer have this class of security flaw.
>=20
> As a real-world example, here is a bug where a u8 wraps around causing
> an under-allocation that allowed for a heap overwrite:
>=20
> https://git.kernel.org/linus/6311071a0562
> https://elixir.bootlin.com/linux/v6.5/source/net/wireless/nl80211.c#L5422
>=20
> If there were more than 255 elements in a linked list, the allocation
> would be too small, and the second loop would write past the end of the
> allocation. This is a pretty classic allocation underflow and linear
> heap write overflow security flaw. (And it would be trivially stopped by
> trapping on the u8 wrap around.)
>=20
> So, I want to be able to catch that at run-time. But we also have code
> doing things like "if (ulong + offset < ulong) { ... }":
>=20
> https://elixir.bootlin.com/linux/v6.5/source/drivers/crypto/axis/artpec6_=
crypto.c#L1187
>=20
> This is easy for a static analyzer to find and we can replace it with a
> non-wrapping test (e.g. __builtin_add_overflow()), but we'll not find
> them all immediately, especially for the signed and pointer cases.
>=20
> So, I need to retain the "everything wraps" behavior while still being
> able to detect when it happens.


Hi Kees,

I have a couple of questions:

Currently, my thinking was that you would use signed integers
if you want the usual integer arithmetic rules we know from
elementary school and if you overflow this is clearly a bug=C2=A0
you can diagnose with UBsan.

There are people who think that signed overflow should be
defined to wrap, but I think this would be a severe
mistake because then code would start to rely on it, which
makes it then difficult to differentiate between bugs and
intended uses (e.g. the unfortunate situation you have=C2=A0
with the kernel).

I assume you want to combine UBSan plus wrapping for
production use?  Or only for testing?   Or in other words:
why would testing UBSan and production with wrapping
not be sufficient to find and fix all bugs?=C2=A0

Wrapping would not be correct because it may lead to
logic errors or use-after-free etc.  I assume it is still
preferred because it more deterministic than whatever comes
out of the optimizer assuming that overflow has UB. Is this
the reasoning applied here?


For unsigned the intended use case is modulo arithmetic
where wrapping is the correct behavior. At least, this
is what I thought so far.. This seems also to be the=C2=A0
position of the overall GCC community rejecting=C2=A0
-fsanitize=3Dunsigned-integer-overflow.

But then there are also people like Richard Seacord that
have the position that one should use=C2=A0 "unsigned" for=C2=A0
every quantity which can not be negative, which implies
that then the modulo use case becomes the exception.

Related to this I have the following question: In the=C2=A0
bug you refer to above, an unsigned variable was used
for something that is not meant for modulo arithmetic.
Is this done in the kernel to save space?  I.e. because
127 would not be enough as maximum but going to i16 takes
to much space? or is this for compatibility with some
on-the-wire protocol?=20

Would an attribute for variables help that tells the
compiler that if stores to the variable wrap around
then this is not intended and this is an error?=20

u8 x [[nowrap]];

x =3D 256; // can be diagnosed ?


Martin=20