From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13260 invoked by alias); 31 Jul 2018 23:20:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 13234 invoked by uid 89); 31 Jul 2018 23:20:42 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=H*M:a538 X-HELO: mail-qt0-f196.google.com Received: from mail-qt0-f196.google.com (HELO mail-qt0-f196.google.com) (209.85.216.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 31 Jul 2018 23:20:39 +0000 Received: by mail-qt0-f196.google.com with SMTP id d4-v6so17926810qtn.13 for ; Tue, 31 Jul 2018 16:20:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=LK3wzXqy94jNS/xikklUpm7RA2jwcc+KeVoqPNCf14o=; b=PItR8z9OMpEwdHiJ4Sh5Wt56WJHbQea3u/KkrPr7aC5YgiuQRH8eq0fWOBbPnLTbxs 2+MdQu05gIS3yx96xWWo2jGDEgcTBlShrqf/M0LQz9jjHivTvJsilw7+tUgyoL6PzOSa 0wNKU2u0fr4eZbXV4pUnhPWVBKWQZLCS07p0mQT284JYRFwgJP8yFsf0F7vbe1x1zRlt 0VIyPadCO2R8pMfOx/QpoNt7Ft4xPIntozRdi7ODaXw60kvDj0Q7D7soUeaNviSxhmmF QMozy6kfanRQQ2zwi4qWupaTGN86YJZK3zCVdT6ZXwRcnfkppjikxDlqnh3ri4+rQUx0 g3zg== Return-Path: Received: from localhost.localdomain (97-118-124-30.hlrn.qwest.net. [97.118.124.30]) by smtp.gmail.com with ESMTPSA id a6-v6sm11229660qth.8.2018.07.31.16.20.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 31 Jul 2018 16:20:37 -0700 (PDT) Subject: Re: [PATCH] Make strlen range computations more conservative To: Jakub Jelinek References: <28fed157-7221-f517-4d2a-0d3f74b19e29@redhat.com> <93caaaa6-d6d1-0d4d-c735-b4d9d5bcce07@gmail.com> <8b0e06a1-eea4-418e-35df-c394766bea10@gmail.com> <20180731063839.GC17988@tucnak> <3d6899a7-4536-253e-e082-819301e6ab38@gmail.com> <20180731154812.GF17988@tucnak> Cc: Bernd Edlinger , Jeff Law , GCC Patches , Richard Biener From: Martin Sebor Message-ID: <933a1c4a-8cd0-a538-1e7e-d481b7d6ce80@gmail.com> Date: Tue, 31 Jul 2018 23:20:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20180731154812.GF17988@tucnak> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2018-07/txt/msg02022.txt.bz2 On 07/31/2018 09:48 AM, Jakub Jelinek wrote: > On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote: >> On 07/31/2018 12:38 AM, Jakub Jelinek wrote: >>> On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote: >>>> Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past >>>> the end of subobjects by string functions. With _FORTIFY_SOURCE=2 >>>> it calls abort. This is the default on popular distributions, >>> >>> Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the standard >>> requires, imposes extra requirements. So from what this mode accepts or >>> rejects we shouldn't determine what is or isn't considered valid. >> >> I'm not sure what the additional requirements are but the ones >> I am referring to are the enforcing of struct member boundaries. >> This is in line with the standard requirements of not accessing >> [sub]objects via pointers derived from other [sub]objects. > > In the middle-end the distinction between what was originally a reference > to subobjects and what was a reference to objects is quickly lost > (whether through SCCVN or other optimizations). > We've run into this many times with the __builtin_object_size already. > So, if e.g. > struct S { char a[3]; char b[5]; } s = { "abc", "defg" }; > ... > strlen ((char *) &s) is well defined but > strlen (s.a) is not in C, for the middle-end you might not figure out which > one is which. Yes, I'm aware of the middle-end transformation to MEM_REF -- it's one of the reasons why detecting invalid accesses by the middle end warnings, including -Warray-bounds, -Wformat-overflow, -Wsprintf-overflow, and even -Wrestrict, is less than perfect. But is strlen(s.a) also meant to be well-defined in the middle end (with the semantics of computing the length or "abcdefg"?) And if so, what makes it well defined? Certainly not every "strlen" has these semantics. For example, this open-coded one doesn't: int len = 0; for (int i = 0; s.a[i]; ++i) ++len; It computes 2 (with no warning for the out-of-bounds access). So if the standard doesn't guarantee it and different kinds of accesses behave differently, how do we explain what "works" and what doesn't without relying on GCC implementation details? If we can't then the only language we have in common with users is the standard. (This, by the way, is what the C memory model group is trying to address -- the language or feature that's missing from the standard that says when, if ever, these things might be valid.) Martin