From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) by sourceware.org (Postfix) with ESMTPS id 737FE3858401 for ; Thu, 17 Mar 2022 16:46:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 737FE3858401 Received: by mail-pj1-x102d.google.com with SMTP id rm8-20020a17090b3ec800b001c55791fdb1so6010113pjb.1 for ; Thu, 17 Mar 2022 09:46:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ySDyGO29IarMI3VbYo4PwdjMph2F/uKOG5Crj8iivC8=; b=7rXzq+k+1LTgDx3/3gqDaoN8KAgXBZa5CCpx9iWAItnLYmlfqf5Z7bzazqIpCAS3wW lLqXqwiStW45mheqNDci6eH9UjWsBHGBCn6JHSukg7c7nJvXBJSiKDN8zdpfOl+Is5mE 9klQBpaTX87EhmqegnR7UPYX5ZAF761YQL2eCQA+xksH8121ZEq0xGNbnTnoEWPDoEaD uG7p47GsUCfov/BkyqMqYHeKEnrUKk5NT+R84NeY4EhDeLfkvrAUr0o3HodH7bzAfSrk 9A87Utzz/YWHLCU65ZUypawmtJiPBF/tXpXONkCxVHx7hMGuLpAzdUH8XsJfURdTWzKC 6d+g== X-Gm-Message-State: AOAM530hKl/x4D6NSHiuGi7vbVPndeMqIu1J7Q4aRdfvDfiJaA1L3GKn fqu/IiHxGRZopChFjwrm1sZiFQjr3zSNNaO5iXg= X-Google-Smtp-Source: ABdhPJyPcDWPiqhdYXruMEMDWBxP9UMpGsNc8G+i2b88S05YMpChy/7J6tP3gfuv1obktnPbiCjq/f3NO9wDzs9dX84= X-Received: by 2002:a17:90b:4c08:b0:1c6:40e4:776c with SMTP id na8-20020a17090b4c0800b001c640e4776cmr6384237pjb.237.1647535607135; Thu, 17 Mar 2022 09:46:47 -0700 (PDT) MIME-Version: 1.0 References: <20220311071236.2349381-1-siddhesh@gotplt.org> <20220315053158.1486555-1-siddhesh@gotplt.org> <01c49ecb-cd91-3bcc-c5a0-97e81281e1dd@gotplt.org> <4104ab01-c36c-12e9-5601-6d99019f6bef@redhat.com> In-Reply-To: From: Jeff Law Date: Thu, 17 Mar 2022 10:46:33 -0600 Message-ID: Subject: Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings To: Marek Polacek Cc: Siddhesh Poyarekar , Jakub Jelinek , Martin Sebor , GCC Patches X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2022 16:46:50 -0000 On Thu, Mar 17, 2022 at 9:32 AM Marek Polacek via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: > On Wed, Mar 16, 2022 at 06:54:51AM +0530, Siddhesh Poyarekar wrote: > > On 16/03/2022 02:06, Martin Sebor wrote: > > > The intended use of the strncmp bound is to limit the comparison to > > > at most the size of the arrays or (in a subset of cases) the length > > > of an initial substring. Providing an arbitrary bound that's not > > > related to the sizes as you describe sounds very much like a misuse. > > > > Nothing in the standard says that the bound is related to the sizes of > input > > buffers. I don't think deducing that intent makes sense either, nor > > concluding that any other use case is misuse. > > > > > As a historical note, strncmp was first introduced in UNIX v7 where > > > its purpose, alongside strncpy, was to manipulate (potentially) > > > unterminated character arrays like file names stored in fixed size > > > arrays (typically 14 bytes). Strncpy would fill the buffers with > > > ASCII data up to their size and pad the rest with nuls only if there > > > was room. > > > > > > Strncmp was then used to compare these potentially unterminated > > > character arrays (e.g., archive headers in ld and ranlib). The bound > > > was the size of the fixed size array. Its other use case was to > compare > > > leading portions of strings (e.g, when looking for an environment > > > variable or when stripping "./" from path names). > > > > Thanks for sharing the historical perspective. > > > > > Since the early UNIX days, both strncpy and to a lesser extent strncm= p > > > have been widely misused and, along with many other functions in > > > , a frequent source of bugs due to common misunderstanding > > > of their intended purpose. The aim of these warnings is to detect > > > the common (and sometimes less common) misuses and bugs. > > > > They're all valid uses however since they do not violate the standard. > If we > > find at compile time that the strings don't terminate at the bounds, > > emitting the warning is OK but the more pessimistic check seems like > > overkill. > > I think I agree, I've tried > > #include > char a[] =3D "abc"; > char b[] =3D "abcd"; > > int f (void) > { > return strncmp (a, b, 8); > } > > where I get > > t.c:7:10: warning: =E2=80=98strncmp=E2=80=99 specified bound 8 exceeds so= urce size 5 > [-Wstringop-overread] > 7 | return strncmp (a, b, 8); // -Wstringop-overread > | ^~~~~~~~~~~~~~~~~ > > even without -Wall. strncmp sees that a[3] is '\0' so it stops comparing > and there's no UB. > This one is a clear case where warning is bad. Both arguments are constant and we can determine they are NUL terminated and an overread will never occur. No deep analysis really needed here. THe far more interesting case in my mind is when one or both arguments have an unknown NUL termination state. I could argue either side of that case. I lean towards warning but I understand that opinions differ and my priorities have moved away from distro-level issues, so identifying code that needs a careful review for correctness, particularly old or security sensitive code, has become a much lower priority for me. Combine that with the fact that we're really just dealing with over-reads here, I can support whatever the broadest consensus is. jeff