From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) by sourceware.org (Postfix) with ESMTPS id 214FF3853558 for ; Thu, 1 Sep 2022 06:55:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 214FF3853558 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x62f.google.com with SMTP id lx1so32644481ejb.12 for ; Wed, 31 Aug 2022 23:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=dB6rWvQIqTidQti2HCm4EN6ANQhI5u6SOVEVGgiAr0M=; b=DMznIuGdDoMZiuqFPTfl+/O/ivYx5D4VRU9e8yDw5Oq8dLz9iKYWKFky5iXPxixdb9 T0iAe7tzTrp7zC1puqqy8EI4L/jL4qyzAtyt2CNjcKejvp+6o8BuSu6xGF4d7UXxbxv8 EMVQIcp9xgn4IrDOMiCZ/L00VeeakKXS9hrIWhkDW3QCWNZwigZSm8VMKovfF8Td8Jr8 oR6OamwGlSxF4yOVr2He8jEM9rKU7dDfT+VTjoKD+D9fg2e8V0P7/l6I00+J6cLMV0lw xbFHxiZjdaReVpsXmUw0LrL4UbNpiKlM7yQYOgOXYzk7WtAA3u7ZM9PUMZoQNQXEcQsf Fwuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=dB6rWvQIqTidQti2HCm4EN6ANQhI5u6SOVEVGgiAr0M=; b=dn/o5XzuMmcyEhZNGzpj9Vk9CzyV9E7+cU1FA5+x8HJutjIM3jx+tQFpEWYa4FYFRx UFv1JKTNm4p9a5DDU2zv10gDa7HU7/yyaf91NcoVPerNod8IOknaN3ZhKmQxnoIHbxKh pJcJF6yGoQjxgX7E2qR8aIMskhKEYin39HRDur46Fyx7Ir8Jz5OKyvEOvoVM+Iv6GJhY cQwma3xqs1+MAudgfzQ0t8BSZ7KGQkjcsp1VDm/Nv55tOq99sJIACj+QaUQbvCIaiEFK /lfQ1P6jUnbFywuXhFCxwpZaa7W7eL7boRnA1QzeINkkHCDJITJE1y9vnWl2jADJfcuQ nm/A== X-Gm-Message-State: ACgBeo1A4c+u/rP+ZAJl/CWBtkjWU1bY/VAD21s3FMMCSFcpHRjSuJ1b wyNGUlFS7sFiDvywUQ5eOlatseYgKd1ICp6T3ZY= X-Google-Smtp-Source: AA6agR7u/56ExyjWNNP15/VDjyo+4DIErhIhRoEZf+2QbS8aqqyArvRCorVfdlKY3uxYnnzbYD/4FbRvrhsElGAyz8g= X-Received: by 2002:a17:907:9712:b0:731:67db:1b48 with SMTP id jg18-20020a170907971200b0073167db1b48mr22617913ejc.754.1662015306727; Wed, 31 Aug 2022 23:55:06 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Thu, 1 Sep 2022 08:54:54 +0200 Message-ID: Subject: Re: GIMPLE undefined behavior To: Krister Walfridsson , Andrew MacLeod Cc: GCC Development Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Sep 1, 2022 at 1:57 AM Krister Walfridsson via Gcc wrote: > > I'm implementing a tool for translation validation (similar to Alive2 for > LLVM). The tool uses an SMT solver to verify for each GIMPLE pass that the > output IR is a refinement of the input IR: > * That each compiled function returns an identical result before/after > the pass (for input that does not invoke UB) > * That each function does not have additional UB after the pass > * That values in global memory are identical after the two versions of > the function are run Nice. > I have reported three bugs (106523, 106613, 106744) where the tool has > found differences in the result, but it is a bit unclear to me what is > considered undefined behavior in GIMPLE, so I have not reported any such > cases yet... > > For example, ifcombine optimizes > > int foo (int x, int a, int b) > { > int c; > int _1; > int _2; > int _3; > int _4; > > : > c_6 = 1 << a_5(D); > _1 = c_6 & x_7(D); > if (_1 != 0) > goto ; > else > goto ; > > : > _2 = x_7(D) >> b_8(D); > _3 = _2 & 1; > if (_3 != 0) > goto ; > else > goto ; > > : > > : > # _4 = PHI <2(4), 0(2), 0(3)> > return _4; > } > > to > > int foo (int x, int a, int b) > { > int c; > int _4; > int _10; > int _11; > int _12; > int _13; > > : > _10 = 1 << b_8(D); > _11 = 1 << a_5(D); > _12 = _10 | _11; > _13 = x_7(D) & _12; > if (_12 == _13) > goto ; > else > goto ; > > : > > : > # _4 = PHI <2(3), 0(2)> > return _4; > } > > Both return the same value for foo(8, 1, 34), but the optimized version > shifts more than 31 bits when calculating _10. tree.def says for > LSHIFT_EXPR that "the result is undefined" if the number of bits to shift > by is larger than the type size, which I interpret as it just should be > considered to return an arbitrary value. I.e., this does not invoke > undefined behavior, so the optimization is allowed. Is my understanding > correct? It's generally poorly documented what is considered 'undefined behavior'. We desparately need a section in the internals manual for this. For the {L,R}SHIFT_EXPR case we assume the shift operand is in range of [0, precision - 1], so in theory value-range propagation could infer that b_8(D) < 32 after it "executed". But it seems that range-on-exit doesn't do that yet. int x; int foo (int i, int n) { x = i << n; if (n > 32) return 7; return 0; } isn't optimized. The problem with shifts is that there's not a "do it anway, but without undefined behavior" operation to substitute. As I said, we lack clear documentation here :/ > What about signed integer wrapping for PLUS_EXPR? This happens for > > int f (int c, int s) > { > int y2; > int y1; > int x2; > int x1; > int _7; > > : > y1_2 = c_1(D) + 2; > x1_4 = y1_2 * s_3(D); > y2_5 = c_1(D) + 4; > x2_6 = s_3(D) * y2_5; > _7 = x1_4 + x2_6; > return _7; > } > > where slsr optimizes this to > > int f (int c, int s) > { > int y1; > int x2; > int x1; > int _7; > int slsr_9; > > : > y1_2 = c_1(D) + 2; > x1_4 = y1_2 * s_3(D); > slsr_9 = s_3(D) * 2; > x2_6 = x1_4 + slsr_9; > _7 = x1_4 + x2_6; > return _7; > > Calling f(-3, 0x75181005) makes slsr_9 overflow in the optimized code, > even though the original did not overflow. My understanding is that signed > overflow invokes undefined behavior in GIMPLE, so this is a bug in > ifcombine. Is my understanding correct? Yes, the above would be a bug - again value-range propagation might be leveraged to produce a wrong-code testcase. > I would appreciate some comments on which non-memory-related operations I > should treat as invoking undefined behavior (memory operations are more > complicated, and I'll be back with more concrete questions later...). The more "interesting" cases are uninitialized values (registers or memory). In general what we should worry about most is introducing undefined behavior that, when a later pass can assume it doesn't happen, causes wrong code to be generated. Likewise when we have late instrumentation that would flag such undefined behavior as a user error. I think only -ftrapv would cause "late" instrumentation (but please don't test -ftrapv, it's known to be broken in ways). Thanks, Richard. > /Krister