From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <msebor@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTPS id 54E6F3858C3A
 for <gcc-patches@gcc.gnu.org>; Tue, 15 Mar 2022 20:36:14 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 54E6F3858C3A
Received: from mail-io1-f69.google.com (mail-io1-f69.google.com
 [209.85.166.69]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-250-0YHOqnIMOCyFo4C4kPHAeg-1; Tue, 15 Mar 2022 16:36:12 -0400
X-MC-Unique: 0YHOqnIMOCyFo4C4kPHAeg-1
Received: by mail-io1-f69.google.com with SMTP id
 h14-20020a056602008e00b00645739bbd4dso60505iob.9
 for <gcc-patches@gcc.gnu.org>; Tue, 15 Mar 2022 13:36:12 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
 :content-language:to:cc:references:from:in-reply-to
 :content-transfer-encoding;
 bh=vQ/XiP3NNmyjOGiMpiCTFnlpYhYS0HYfRuhhza4uyWQ=;
 b=NiZbwfYcovoXI2VXz4+ErjHvuc1irNyzTa3nRn4EjXcg/aiquU6xXxkVpX0hfpMIUQ
 FeDXR26L4EXrHKjXfMimV2XgLAZ1bn3tHHajckPAl66R6IKPdBI9AeGd65EnX1+tVGy/
 ZdXGyt/UUyf+t+0VpA7H0I7ASxqofQOpHunZfG+acxM7FHU0lGAjIG1Z5ej52IBD2/lG
 ReqjrLD6fPRfQTe2SAA9WCyMRoamNEeTvPfzwz0Sv+XEaAdLxLKGezAqGLwzf+qft4II
 aNZ+DoMN9FvuyE/oRhD04YSLATj0ewdlv3fh0sQIrreRnV/rkyVxX4nPaT9O8se31rDY
 /7nw==
X-Gm-Message-State: AOAM530oXY3GSvvYzwn7ORkSXYFiPQaY3DeK8IlM08VDcepaLPpWjzzf
 Tsorp4/IowSABVqlL6G3TxqGvClQsLtBSZS7Njovy6YJROlLJf7DDS6T5OalYEQE/9fzGepoHB2
 k1AmDy+D2KWCj2momKg==
X-Received: by 2002:a6b:d603:0:b0:641:2e6f:fa25 with SMTP id
 w3-20020a6bd603000000b006412e6ffa25mr22122287ioa.103.1647376571940; 
 Tue, 15 Mar 2022 13:36:11 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJxtCeC9rvgleZX56vQ5fpf5LPJyNJ0tiIlUYT/yISFX7CU0+i8mKaUzpoiMSq61H8ayr2dt4Q==
X-Received: by 2002:a6b:d603:0:b0:641:2e6f:fa25 with SMTP id
 w3-20020a6bd603000000b006412e6ffa25mr22122279ioa.103.1647376571657; 
 Tue, 15 Mar 2022 13:36:11 -0700 (PDT)
Received: from [192.168.0.41] (174-16-121-59.hlrn.qwest.net. [174.16.121.59])
 by smtp.gmail.com with ESMTPSA id
 s12-20020a056e02216c00b002c79500ea83sm72552ilv.41.2022.03.15.13.36.10
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Tue, 15 Mar 2022 13:36:11 -0700 (PDT)
Message-ID: <4104ab01-c36c-12e9-5601-6d99019f6bef@redhat.com>
Date: Tue, 15 Mar 2022 14:36:10 -0600
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.4.0
Subject: Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings
To: Siddhesh Poyarekar <siddhesh@gotplt.org>, gcc-patches@gcc.gnu.org
Cc: jakub@redhat.com
References: <20220311071236.2349381-1-siddhesh@gotplt.org>
 <20220315053158.1486555-1-siddhesh@gotplt.org>
 <f18b9ff2-5b12-7155-ad2e-0eedbe3706f7@redhat.com>
 <01c49ecb-cd91-3bcc-c5a0-97e81281e1dd@gotplt.org>
From: Martin Sebor <msebor@redhat.com>
In-Reply-To: <01c49ecb-cd91-3bcc-c5a0-97e81281e1dd@gotplt.org>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF,
 NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL,
 SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Mar 2022 20:36:18 -0000

On 3/15/22 10:40, Siddhesh Poyarekar wrote:
> On 15/03/2022 21:09, Martin Sebor wrote:
>> The strncmp function takes arrays as arguments (not necessarily
>> strings).  The main purpose of the -Wstringop-overread warning
>> for calls to it is to detect calls where one of the arrays is
>> not a nul-terminated string and the bound is larger than the size
>> of the array.  For example:
>>
>>    char a[4], b[4];
>>
>>    int f (void)
>>    {
>>      return strncmp (a, b, 8);   // -Wstringop-overread
>>    }
>>
>> Such a call is suspect: if one of the arrays isn't nul-terminated
>> the call is undefined.  Otherwise, if both are nul-terminated there
> 
> Isn't "suspect" too harsh a description though?  The bound does not 
> specify the size of a or b, it specifies the maximum extent to which to 
> compare a and b, the extent being any application-specific limit.  In 
> fact the limit could be the size of some arbitrary third buffer that the 
> contents of a or b must be copied to, truncating to the bound.

The intended use of the strncmp bound is to limit the comparison to
at most the size of the arrays or (in a subset of cases) the length
of an initial substring. Providing an arbitrary bound that's not
related to the sizes as you describe sounds very much like a misuse.

As a historical note, strncmp was first introduced in UNIX v7 where
its purpose, alongside strncpy, was to manipulate (potentially)
unterminated character arrays like file names stored in fixed size
arrays (typically 14 bytes).  Strncpy would fill the buffers with
ASCII data up to their size and pad the rest with nuls only if there
was room.

Strncmp was then used to compare these potentially unterminated
character arrays (e.g., archive headers in ld and ranlib).  The bound
was the size of the fixed size array.  Its other use case was to compare
leading portions of strings (e.g, when looking for an environment
variable or when stripping "./" from path names).

Since the early UNIX days, both strncpy and to a lesser extent strncmp
have been widely misused and, along with many other functions in
<string.h>, a frequent source of bugs due to common misunderstanding
of their intended purpose.  The aim of these warnings is to detect
the common (and sometimes less common) misuses and bugs.

> I agree the call is undefined if one of the arrays is not nul-terminated 
> and that's the thing; nothing about the bound is undefined in this 
> context, it's the NUL termination that is key.
> 
>> is no point in calling strncmp with a bound greater than their sizes.
> 
> There is, when the bound describes something else, e.g. the size of a 
> third destination buffer into which one of the input buffers may get 
> copied into.  Or when the bound describes the maximum length of a set of 
> strings where only a subset of the strings are reachable in the current 
> function and ranger sees it, allowing us to reduce our input string size 
> estimate.  The bounds being the maximum of the lengths of two input 
> strings is just one of many possibilities.
> 
>> With no evidence that this warning is ever harmful I'd consider
> 
> There is, the false positives were seen in Fedora/RHEL builds.

I haven't seen these so I can't very well comment on them.  But I can
assure you that warning for the code above is intentional.  Whether
or not the arrays are nul-terminated, the expected way to call
the function is with a bound no greater than their size (some coding
guidelines are explicit about this; see for example the CERT C Secure
Coding standard rule ARR38-C).

(Granted, the manual makes it sound like -Wstringop-overread only
detects provable past-the-end reads.  That's a mistake in
the documentation that should be fixed.  The warning was never quite
so limited, nor was it intended to be.)

Martin

> 
>> suppressing it a regression.  Since the warning is a deliberate
>> feature in a released compiler and GCC is now in a regression
>> fixing stage, this patch is out of scope even if a case where
>> the warning wasn't helpful did turn up (none has been reported
>> so far).
> 
> Wait, I just reported an issue and it's across multiple packages in 
> Fedora/RHEL :)
> 
> I think this is a regression since gcc 11 due to misunderstanding the 
> specification and assuming too strong a relationship between the size 
> argument of strncmp (and indeed strnlen and strndup) and the size of 
> objects being passed to it.  Compliant code relies on the compiler to do 
> the right thing here, i.e. optimize the strncmp call to strcmp and not 
> panic about the size argument being larger than the input buffer size. 
> If at all such a diagnostic needs to stay, it ought to go into the 
> analyzer, where such looser heuristic suggestions are more acceptable 
> and sometimes even appreciated.
> 
> FWIW, I'm open to splitting the warning levels as you suggested if 
> that's the consensus since it at least provides a way to make these 
> warnings saner. However I still haven't found the rationale presented so 
> far compelling enough to justify these false positives; I just don't see 
> a proportional enough reward.  Hopefully more people can chime in with 
> their perspective on this.
> 
> Thanks,
> Siddhesh
>