From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by sourceware.org (Postfix) with ESMTPS id 3E1283858C60 for ; Mon, 18 Sep 2023 08:17:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3E1283858C60 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-x433.google.com with SMTP id ffacd0b85a97d-31c7912416bso3984953f8f.1 for ; Mon, 18 Sep 2023 01:17:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695025053; x=1695629853; darn=gcc.gnu.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=lEgNV5h5mGcnFDGyDoDvUH8onx2Pr3QqBFrRWyn2DSU=; b=G8EwRxDOl/jwtE2sI1YHGq584iRXhH8pqlRn+ilDx63BfV9SBDG8s6YqwYjG0FXkBw 2qalkoRhyIJPxCZT4z/vzLAkFDG6T/e+fTIJLvfhPu26HofN7O1wOknBQ7WWynZ3/c/V aYx2QkP+ySLS2K16lmfDwVmdunvBgV+XHF8FmqwaUJqsn9W4+cQziJOoLBNnf+zc4aiw Yqolw92+bJe12sJydyd4kT/IuiXEEeNJI3YzKDNTS0TPaa0fjF9eFky9l96PaRp3BJhX Mn9UYXgyRRKzoDnO6hvJ2EaV5TwTaHnl8vf/9Swvn2SUZzv+8Njo7vokRC5SNc/3MPjJ p2ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695025053; x=1695629853; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=lEgNV5h5mGcnFDGyDoDvUH8onx2Pr3QqBFrRWyn2DSU=; b=t+ndh3xA381gdps2Qc7+mtRbebJlCnPwek6119HAFAwo99iypQvpjwGYNeLwn1YBra 9Fualx5VxvSYxeTUicsB5G5ed6LrKeWIFO+qxd4sWrJRztNcm0Rx9lYGY9vmc70mCvNT p4MCExjXJxRPzx3q11REcwEqrncvxVbrL1I0bZBitl5cq3AoT1/vZPUTXIYcN1ZSqe64 IHH+wUW6jJAMLv3F8ycPLqmwDWETmC1yf6v4pRzPAcPvDr2RaldUFBKsiejGwCjyA28g OIuxdST6yW/TLksXrjoTAy549HBA863BH3F8hq25mKzS1fb4SxbUl1IFOPdtBB0CH8+j Ecmw== X-Gm-Message-State: AOJu0YycE71dXKxnSB+pijGH7qO4VaIaePXg6iTUtawWblJc7utQttH2 fnQd2pW6uulAtq28T7kebhk= X-Google-Smtp-Source: AGHT+IFjiMaZLwoGL7o1ZdMKnpw8AuJ/Ia8iQZB97P8peV/WyL9g2qgF7aiebh9TabV7yQrjNtN17g== X-Received: by 2002:a5d:69c7:0:b0:317:5ddd:837b with SMTP id s7-20020a5d69c7000000b003175ddd837bmr7730002wrw.7.1695025052369; Mon, 18 Sep 2023 01:17:32 -0700 (PDT) Received: from vra-169-207.tugraz.at (vra-169-207.tugraz.at. [129.27.169.207]) by smtp.gmail.com with ESMTPSA id d6-20020adfef86000000b0031f82743e25sm11841405wro.67.2023.09.18.01.17.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Sep 2023 01:17:31 -0700 (PDT) Message-ID: <9959df868a3aadee6034e564cf575083e76d5833.camel@gmail.com> Subject: Re: Question on -fwrapv and -fwrapv-pointer From: Martin Uecker To: Richard Biener Cc: keescook@chromium.org, gcc@gcc.gnu.org Date: Mon, 18 Sep 2023 10:17:30 +0200 In-Reply-To: References: <202309151107.C2F7D9A01@keescook> <228be10b14dabe4e8216f9f93da99f629328b3e0.camel@gmail.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.4-2 MIME-Version: 1.0 X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Am Montag, dem 18.09.2023 um 09:31 +0200 schrieb Richard Biener via Gcc: > On Sat, Sep 16, 2023 at 10:38=E2=80=AFAM Martin Uecker via Gcc wrote: > >=20 > >=20 > >=20 > > (moved to gcc@) > >=20 > > > On Fri, Sep 15, 2023 at 08:18:28AM -0700, Andrew Pinski wrote: > > > > On Fri, Sep 15, 2023 at 8:12=E2=80=AFAM Qing Zhao wrote: > > > > >=20 > > > > >=20 > > > > >=20 > > > > > > On Sep 15, 2023, at 3:43 AM, Xi Ruoyao wro= te: > > > > > >=20 > > > > > > On Thu, 2023-09-14 at 21:41 +0000, Qing Zhao wrote: > > > > > > > > > CLANG already provided -fsanitize=3Dunsigned-integer-over= flow. GCC > > > > > > > > > might need to do the same. > > > > > > > >=20 > > > > > > > > NO. There is no such thing as unsigned integer overflow. Th= at option > > > > > > > > is badly designed and the GCC community has rejected a few = times now > > > > > > > > having that sanitizer before. It is bad form to have a sani= tizer for > > > > > > > > well defined code. > > > > > > >=20 > > > > > > > Even though unsigned integer overflow is well defined, it mig= ht be > > > > > > > unintentional, shall we warn user about this? > > > > > >=20 > > > > > > *Everything* could be unintentional and should be warned then. = GCC is a > > > > > > compiler, not an advanced AI educating the programmers. > > > > >=20 > > > > > Well, you are right in some sense. -:) > > > > >=20 > > > > > However, overflow is one important source for security flaws, it= =E2=80=99s important for compilers to detect > > > > > overflows in the programs in general. > > > >=20 > > > > Except it is NOT an overflow. Rather it is wrapping. That is a big > > > > point here. unsigned wraps and does NOT overflow. Yes there is a ma= jor > > > > difference. > > >=20 > > > Right, yes. I will try to pick my language very carefully. :) > > >=20 > > > The practical problem I am trying to solve in the 30 million lines of > > > Linux kernel code is that of catching arithmetic wrap-around. The > > > problem is one of evolving the code -- I can't just drop -fwrapv and > > > -fwrapv-pointer because it's not possible to fix all the cases at onc= e. > > > (And we really don't want to reintroduce undefined behavior.) > > >=20 > > > So, for signed, pointer, and unsigned types, we need: > > >=20 > > > a) No arithmetic UB -- everything needs to have deterministic behavio= r. > > > The current solution here is "-fno-strict-overflow", which elimina= tes > > > the UB and makes sure everything wraps. > > >=20 > > > b) A way to run-time warn/trap on overflow/underflow/wrap-around. Thi= s > > > would work with -fsanitize=3D[signed-integer|pointer]-overflow exc= ept > > > due to "a)" we always wrap. And there isn't currently coverage lik= e > > > this for unsigned (in GCC). > > >=20 > > > Our problem is that the kernel is filled with a mix of places where t= here > > > is intended wrap-around and unintended wrap-around. We can chip away = at > > > fixing the intended wrap-around that we can find with static analyzer= s, > > > etc, but at the end of the day there is a long tail of finding the pl= aces > > > where intended wrap-around is hiding. But when the refactoring is > > > sufficiently completely, we can move the wrap-around warning to a tra= p, > > > and the kernel will not longer have this class of security flaw. > > >=20 > > > As a real-world example, here is a bug where a u8 wraps around causin= g > > > an under-allocation that allowed for a heap overwrite: > > >=20 > > > https://git.kernel.org/linus/6311071a0562 > > > https://elixir.bootlin.com/linux/v6.5/source/net/wireless/nl80211.c#L= 5422 > > >=20 > > > If there were more than 255 elements in a linked list, the allocation > > > would be too small, and the second loop would write past the end of t= he > > > allocation. This is a pretty classic allocation underflow and linear > > > heap write overflow security flaw. (And it would be trivially stopped= by > > > trapping on the u8 wrap around.) > > >=20 > > > So, I want to be able to catch that at run-time. But we also have cod= e > > > doing things like "if (ulong + offset < ulong) { ... }": > > >=20 > > > https://elixir.bootlin.com/linux/v6.5/source/drivers/crypto/axis/artp= ec6_crypto.c#L1187 > > >=20 > > > This is easy for a static analyzer to find and we can replace it with= a > > > non-wrapping test (e.g. __builtin_add_overflow()), but we'll not find > > > them all immediately, especially for the signed and pointer cases. > > >=20 > > > So, I need to retain the "everything wraps" behavior while still bein= g > > > able to detect when it happens. > >=20 > >=20 > > Hi Kees, > >=20 > > I have a couple of questions: > >=20 > > Currently, my thinking was that you would use signed integers > > if you want the usual integer arithmetic rules we know from > > elementary school and if you overflow this is clearly a bug > > you can diagnose with UBsan. > >=20 > > There are people who think that signed overflow should be > > defined to wrap, but I think this would be a severe > > mistake because then code would start to rely on it, which > > makes it then difficult to differentiate between bugs and > > intended uses (e.g. the unfortunate situation you have > > with the kernel). > >=20 > > I assume you want to combine UBSan plus wrapping for > > production use? Or only for testing? Or in other words: > > why would testing UBSan and production with wrapping > > not be sufficient to find and fix all bugs? > >=20 > > Wrapping would not be correct because it may lead to > > logic errors or use-after-free etc. I assume it is still > > preferred because it more deterministic than whatever comes > > out of the optimizer assuming that overflow has UB. Is this > > the reasoning applied here? > >=20 > >=20 > > For unsigned the intended use case is modulo arithmetic > > where wrapping is the correct behavior. At least, this > > is what I thought so far.. This seems also to be the > > position of the overall GCC community rejecting > > -fsanitize=3Dunsigned-integer-overflow. > >=20 > > But then there are also people like Richard Seacord that > > have the position that one should use "unsigned" for > > every quantity which can not be negative, which implies > > that then the modulo use case becomes the exception. > >=20 > > Related to this I have the following question: In the > > bug you refer to above, an unsigned variable was used > > for something that is not meant for modulo arithmetic. > > Is this done in the kernel to save space? I.e. because > > 127 would not be enough as maximum but going to i16 takes > > to much space? or is this for compatibility with some > > on-the-wire protocol? > >=20 > > Would an attribute for variables help that tells the > > compiler that if stores to the variable wrap around > > then this is not intended and this is an error? > >=20 > > u8 x [[nowrap]]; > >=20 > > x =3D 256; // can be diagnosed ? >=20 > There's a problem with representing this, it's a long-standing thing stan= ding > in the way of handling -fwrapv in the IL. Maybe C wants to have > 'nonnegative int' in addition to 'unsigned int', or 'bit int' for the > "just some two-complement bits". >=20 > For GCC we think of -fwrapv to perform signed integer arithmetic as if > promoted to a large enough type first and then truncated to the original > type in the implementation defined manner (modulo reducing). But we > end up doing what the actual hardware does (I'm for example unsure > how that handles INT_MIN / -1 - x86 documents it as raising #DE > which it indeed does, even with -fwrapv). >=20 > volatile int a, b; > int main () > { > a =3D -__INT_MAX__ - 1; > b =3D -1; > int z =3D a / b; > __builtin_printf ("%d\n", z); > } >=20 > raises SIGFPE with and without -fwrapv. I'll note that we document > -fwrapv to only affect addition, subtraction and multiplication > (it also affects negation), matching the libgcc *v functions we have > for -ftrapv. -fsanitize=3Dundefined instruments division but > -fsanitize=3Drecover doesn't help. Maybe there's a bug about this > recorded. The documentation of -fwrapv should likely be clarified. >=20 > For recent work I failed to spot the C standards language on > singed overflow behavior for negation, addition, subtraction > and multiplication. I can find it for division though. Can someone > point me to where that is (moved?)? I don't think it moved. It should follow from the general rule 6.5p5: "If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined." Martin