From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd33.google.com (mail-io1-xd33.google.com [IPv6:2607:f8b0:4864:20::d33]) by sourceware.org (Postfix) with ESMTPS id DAEBC3858437 for ; Tue, 1 Feb 2022 23:54:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DAEBC3858437 Received: by mail-io1-xd33.google.com with SMTP id r144so23325803iod.9 for ; Tue, 01 Feb 2022 15:54:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=1HpV/db4D5jppVHy5jUP2T1V1lVwZTUUHcjMPmxLGDo=; b=UahfMjWa69bcIDRqgxTcEoHD9AfaS2QtROmAlflKc0AX+aSjPworgvKmXSDh1aIRQx IcLaJRCJjAXgiwQqH24q7LRAiYcrmcFNz269ePqAV9tOAtrHC+epR1V7i9DvrkTO39Qt 1wOSztNxoTNTFe3U9l7ENIGA3fQCh7nZk1jw8dCuQuVj8O2Vj4R7R7KOmuX8uSR+ogfS RCHfSZcPy7QOFGQoucNicB9OuU9vlB77IGZxMQAfVHwy5Hc3RXmimK2dpMAChDq7Z9Yh 4LHQjK0O1K74H4a/ed0xGJT3xXFSjrBUEJo+u6SO6nuYZKC8zXUiFicxIPV0hlgK3bQI baRQ== X-Gm-Message-State: AOAM531lGFSMkzZ+UbHbujJDU5GI2uB/Lm2SMXPJQ4Jmck9if8MO10PQ 0D0xZjOEs2k61c9hYaOyl+kW5WxKI68= X-Google-Smtp-Source: ABdhPJyZH6daZa1Br/NRh7e1SFyn/KOT6PpU0z4gULy3dfnMwrte2IrrA5n30UkpmsIErj1ZZNoAhA== X-Received: by 2002:a02:b112:: with SMTP id r18mr2362778jah.151.1643759658989; Tue, 01 Feb 2022 15:54:18 -0800 (PST) Received: from [192.168.0.41] (97-118-100-142.hlrn.qwest.net. [97.118.100.142]) by smtp.gmail.com with ESMTPSA id w19sm22185441iov.16.2022.02.01.15.54.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 01 Feb 2022 15:54:18 -0800 (PST) Message-ID: <478a5194-c8aa-2be7-e4aa-f0d441cbfac0@gmail.com> Date: Tue, 1 Feb 2022 16:54:17 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: Potentially false-positive -Wstringop-overflow= warning with gcc >= 11.1 Content-Language: en-US To: Martin Sebor via Gcc-help , Dumitru Ceara , Segher Boessenkool , richard.sandiford@arm.com References: <8d42151f-7be2-4d67-5b46-a83ba55d5c1c@redhat.com> <20220128152748.GD614@gate.crashing.org> From: Martin Sebor In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-help@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-help mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Feb 2022 23:54:21 -0000 On 1/31/22 22:18, Richard Sandiford wrote: > Martin Sebor via Gcc-help writes: >> On 1/28/22 09:16, Dumitru Ceara via Gcc-help wrote: >>> On 1/28/22 16:27, Segher Boessenkool wrote: >>>> On Fri, Jan 28, 2022 at 04:01:36PM +0100, Dumitru Ceara via Gcc-help wrote: >>>>> void *l4data = p.l4_ofs != UINT16_MAX ? (char *) p.base_ + p.l4_ofs : NULL; >>>>> struct hdr2 *h2 = l4data; >>>>> >>>>> memcpy(h2 + 1, &somedata[0], 6); >>>> >>>> l4data can be 0, and everything false apart from there on. >>>> >>> >>> In general, yes, l4data can be 0, if p.l4_ofs == UINT16_MAX, which can >>> only happen if pkt_bar() changed p.base_. >>> >>> But the compiler can't know that for sure and the warning makes it sound >>> like it's a sure thing: >>> >>> "warning: ‘memcpy’ writing 6 bytes into a region of size 0 overflows the >>> destination" >> >> The reason why it makes it sound like it's a sure thing is because >> a) the warning sees an invalid statement in the IL, and b) because >> GCC warnings work on one IL statement at a time and without regard >> to the conditions under which they might be executed. >> >> An IL statement is not always the same as a source code statement. >> Some statements are also isolated from the source code even when they >> don't appear there, or appear with different arguments. That's also >> what happens in this case: GCC replaces the memcpy call and >> the subsequent store with two: one pair is valid and another pair >> of invalid statements. Here's the problem in the IL the warning >> sees and points out: >> >> [count: 0]: >> memcpy (4B, &somedata[0], 6); <<< -Warray-bounds >> MEM[(struct hdr2 *)0B].h21 ={v} 0; <<< causes path isolation >> __builtin_trap (); and trap to be added >> >> Before the warning is issued GCC notices the branch of the code where >> l4data is null is invalid. It splits it into two branches: valid and >> invalid and adds a trap after the invalid sequence for good measure. >> The warning (which runs much later) then discovers the isolated code >> is invalid and points it out. The path isolation can be disabled by >> compiling with -fno-isolate-erroneous-paths-dereference. That also >> avoids the warning. >> >> This is an example where the invalid code isolation seems to work at >> cross purposes with the warning (or at least without coordination >> with it). It's also one that illustrates an inconsistency in >> the handling of invalid code (undefined behavior) in GCC. Some >> instances are eliminated (optimized away), others are replaced with >> a trap, others left in place but followed by a trap (the store at >> address zero), and others still are just left in place (the invalid >> memcpy at address 4). >> >> More to your point, it's also an example where warnings issued for >> code that's reachable only under some conditions make it seem like >> they point out unconditional bugs. >> >> All these are known problems and all of them should be tackled >> somehow. The last one (warning for conditional code) especially >> has been a sore point because it's in everyone's face. A number >> of ideas have been thrown out over the years, starting with simply >> rephrasing the warning to be less affirmative. E.g., instead of >> "memcpy writing..." print: >> >> ‘memcpy’ may write 6 bytes into a region of size 0 >> >> The problem with this suggestion is that because most instances of >> these warnings occur in some conditionally executed code, and because >> all GCC warnings are meant to be taken as possible (as opposed to >> definitive) indication of bugs, changing their phrasing wouldn't >> actually solve anything. All it would do is replace one with >> the other, without helping you understand why or when the bug might >> happen. > > I agree that changing the wording doesn't solve the whole problem, > but I think it does solve something. At the moment, we're effectively > asking each individual user to be aware of the context above, to know what > meaning is being attached to the present tense. Making the message more > equivocal would at least suggest that the compiler doesn't “know”. The compiler can never "know" for certain if any statement will be executed. Every warning message that involves control or data flow must be interpreted in the context of the surrounding code, whether it's an expression, statement, or the whole function. Every warning message must necessarily also be understood as only suggesting that "there may be a problem" in the program rather than "there definitely is a bug." There's plenty of literature out there that explains this, including the GCC manual, so I'd expect most C/C++ programmers to understand that. I don't think that rewording every warning message just to drill that message home and without addressing the bigger problem would make enough of a difference to justify the effort. (Users don't feel any better about -Wmaybe-uninitialized false positives than about -Wuninitalized, and they've periodically argued to remove the former from -Wall despite its equivocal phrasing.) (That said, with my tongue in cheek, the original phrasing of these warnings (up to GCC 6) was: call to 'memcpy' will always overflow destination buffer so some progress has been made on this front ;-) > > It's not the pass's “fault” that it can't tell how closely the IL > it sees matches the original code. But it is GCC's “fault” as a whole, > not the user's, and I think that's what matters here. I agree. I think there are at least two underlying problems: users expecting every warning message to point out an actual bug in their code, and GCC failing to somehow indicate the conditional nature of the problems in the messages. >> What I think is needed here is to mention the conditions >> under which the invalid statement is executed. With that, we should >> be able to print the same context as what the static analyzer prints: >> >> warning: dereference of NULL ‘0’ [CWE-476] [-Wanalyzer-null-dereference] >> 43 | h2->h21 = 0; >> | ~~~~~~~~^~~ >> ‘foo’: events 1-3 >> | >> | 39 | *l4data = p.l4_ofs != UINT16_MAX ? (char *) p.base_ + >> p.l4_ofs : NULL; >> | | >> ^ >> | | >> | >> | | >> (1) following ‘false’ branch... >> >> |...... >> | 42 | py(h2 + 1, &somedata[0], 6); >> | | ~~~~~~ >> >> | | | >> | | (2) ...to here >> >> | 43 | h21 = 0; >> | | ~~~~~~~ >> >> | | | >> | | (3) dereference of NULL ‘’ > > I agree this would be better, but I don't think it should hold back > tweaks to the error messages in the meantime. (And if the tracing > was an optional feature, we'd still want wording that makes sense > when the feature is turned off.) > > I realise this is rehashing an old discussion, sorry. But it seems > like it's a discussion that gets rehashed precisely because it's > about an issue that users keep hitting. It's an old issue that users are running into increasingly often as the middle end warnings keep getting "smarter." Although that's not the case here, the switch to Ranger in GCC 12 in particular has made the data flow analysis used by warnings like -Wstringop-overflow much more accurate. That's exactly what we wanted; what we underestimated is the extent to which the warnings expose unreachable statements, or those that are reachable under complex conditions, or that are synthesized by GCC seemingly out of thin air. All these are problematic, especially when the conditions involve layers of libstdc++ (or other third party) template code. From a user's point of view, there's no difference between the two. Indicating the flow like in the above isn't too difficult. I put together a prototype that does that for some warnings, including -Wmaybe-uninitialized and a subset of -Wstringop-overflow. What it doesn't capture (and what I think is essential) is conditions involved in determining ranges of values like pointer offsets, array bounds, or memcpy sizes. To capture those we either need to extend Ranger or develop a separate range analyzer just for these warnings. Martin > > Thanks, > Richard