public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Martin Sebor <msebor@gmail.com>,Jeff Law <law@redhat.com>,Bernd
	Edlinger <bernd.edlinger@hotmail.de>,GCC Patches
	<gcc-patches@gcc.gnu.org>
Cc: Jakub Jelinek <jakub@redhat.com>
Subject: Re: [PATCH] Make strlen range computations more conservative
Date: Mon, 06 Aug 2018 17:44:00 -0000	[thread overview]
Message-ID: <14925CCD-628E-4FED-A0C3-2407FF6D88D1@suse.de> (raw)
In-Reply-To: <82cbca79-e73e-c7c9-0244-8416f1fed1f3@gmail.com>

On August 6, 2018 6:32:41 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote:
>On 08/06/2018 09:12 AM, Jeff Law wrote:
>> On 08/04/2018 03:56 PM, Martin Sebor wrote:
>>> On 08/03/2018 01:00 AM, Jeff Law wrote:
>>>> On 07/24/2018 05:18 PM, Bernd Edlinger wrote:
>>>>> On 07/24/18 23:46, Jeff Law wrote:
>>>>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> This patch makes strlen range computations more conservative.
>>>>>>>
>>>>>>> Firstly if there is a visible type cast from type A to B before
>>>>>>> passing
>>>>>>> then value to strlen, don't expect the type layout of B to
>restrict
>>>>>>> the
>>>>>>> possible return value range of strlen.
>>>>>> Why do you think this is the right thing to do?  ie, is there
>language
>>>>>> in the standards that makes you think the code as it stands today
>is
>>>>>> incorrect from a conformance standpoint?  Is there a significant
>>>>>> body of
>>>>>> code that is affected in an adverse way by the current code?  If
>so,
>>>>>> what code?
>>>>>>
>>>>>>
>>>>>
>>>>> I think if you have an object, of an effective type A say
>char[100],
>>>>> then
>>>>> you can cast the address of A to B, say typedef char (*B)[2] for
>>>>> instance
>>>>> and then to const char *, say for use in strlen.  I may be wrong,
>but
>>>>> I think
>>>>> that we should at least try not to pick up char[2] from B, but
>instead
>>>>> use A for strlen ranges, or leave this range open.  Currently the
>range
>>>>> info for strlen is [0..1] in this case, even if we see the type
>cast
>>>>> in the generic tree.
>>>> ISTM that you're essentially saying that the cast to const char *
>>>> destroys any type information we can exploit here.  But if that's
>the
>>>> case, then I don't think we can even derive a range of [0,99]. 
>What's
>>>> to say that "A" didn't result from a similar cast of some object
>that
>>>> was char[200] that happened out of the scope of what we could see
>during
>>>> the strlen range computation?
>>>>
>>>> If that is what you're arguing, then I think there's a
>re-evaluation
>>>> that needs to happen WRT strlen range computation/
>>>>
>>>> And just to be clear, I do see this as a significant correctness
>>>> question.
>>>>
>>>> Martin, thoughts?
>>>
>>> The argument is that given:
>>>
>>>   struct S { char a[4], b; };
>>>
>>>   char a[8] = "1234567";
>>>
>>> this is valid and should pass:
>>>
>>>   __attribute__ ((noipa))
>>>   void f (struct S *p)
>>>   {
>>>     assert (7 == strlen (p->a));
>>>   }
>>>
>>>   int main (void)
>>>   {
>>>     f ((struct S*)a);
>>>   }
>>>
>>> (This is the basic premise behind pr86259.)
>>>
>>> This argument is wrong and the code is invalid.  For the access
>>> to p->a to be valid p must point to an object of struct S (it
>>> doesn't) and the p->a array must hold a nul-terminated string
>>> (it also doesn't).
>> I agree with you for C/C++, but I think it's been shown elsewhere in
>> this thread that GIMPLE semantics to not respect the subobject
>> boundaries.  That's a sad reality.
>>
>> [ ... ]
>>
>>>
>>> I care less about the optimization than I do about the basic
>>> premise that it's essential to respect subobject boundaries(*).
>> I understand, but the semantics of GIMPLE do not respect them.  We
>can
>> argue about whether or not those should change and what it would take
>to
>> fix that. But right now the existing semantics do not respect those
>> boundaries.
>
>They don't respect them in all cases (i.e., when MEM_REF loses
>information about the structure of an access) but in a good
>number of them GCC can still derive useful information from
>the access.  It's relied on to great a effect by _FORTIFTY_SOURCE.
>I think it would be a welcome enhancement if besides out-of-
>bounds writes _FORTIFTY_SOURCE also prevented out-of-bounds
>reads.
>
>>> It would make little sense to undo the strlen optimization
>>> without also undoing the optimization for the direct array
>>> access case.  Undoing either would raise the question about
>>> the validity of the _FORRTIFY_SOURCE=2 behavior.  That would
>>> be a huge step backwards in terms of code security.  If we
>>> did some of these but not others, it would make the behavior
>>> inconsistent and surprising, all to accommodate one instance
>>> of invalid code.
>> In the direct array access case I think (and I'm sure Jakub, Richi
>and
>> others will correct me if I'm wrong), we can use the object's type
>> because the dereferences are actually using the array's type.
>
>Subscripting and pointer access are identical both in C/C++
>and in GCC's _FORTIFY_SOURCE.  The absence of a distinction
>between the two is essential for preventing writes past
>the end by string functions like strcpy (_FORTIFY_SOURCE).
>
>>> If we had a valid test case where the strlen optimization
>>> leads to invalid code, or even if there were a significant
>>> number of bug reports showing that it breaks an invalid
>>> but common idiom, I would certainly feel compelled to
>>> make it right somehow.  But there has been just one bug
>>> report with clearly invalid code that should be easily
>>> corrected.
>> Again, I think you're too narrowly focused on C/C++ semantics here.
>> What matters are the semantics in GIMPLE.
>
>I don't get that.  GCC is a C/C++ compiler (besides other
>languages), but not a GIMPLE compiler.   The only reason this
>came up at all is a bug report with an invalid C test case that
>reads past the end.  The only reason in my mind to consider
>relaxing an assumption/restriction would be a valid test case
>(in any supported language) that the optimization invalidates.
>
>But as I said, far more essential than the optimization is
>the ability to detect these invalid access (both reads and
>writes), such as in:

The essential thing is to not introduce latent wrong code issues because you exploit C language constraints that are not preserved by GIMPLE transforms because they are not constraints in the GIMPLE IL _WHICH_ _IS_ _NOT_ _C_. 

Richard. 

>
>   struct S { char a[4], b[2], c; };
>
>   void f (struct S *p)
>   {
>     strcpy (p->a, "1234");        // certain buffer overflow
>
>     sprintf (p->b, "%s", p->a);   // potential buffer overflow
>
>     // ...but, to avoid false positives:
>     sprintf (p->a, "%s", p->b);   // no buffer overflow here
>   }
>
>You've recently made comment elsewhere that you wish GCC would
>be more aggressive in detecting preventing undefined behavior
>by inserting traps.  I agree but I don't see how we can aim
>for both looser and stricter UB detection at the same time.
>
>In any event, it seems clear to me that I've lost the argument.
>If you undo the optimization then please retain the functionality
>for the warnings.  Otherwise you might as well remove those too.
>
>Martin

  reply	other threads:[~2018-08-06 17:44 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-24  7:59 Bernd Edlinger
2018-07-24 14:50 ` Richard Biener
2018-07-25 13:03   ` Bernd Edlinger
2018-07-24 16:14 ` Martin Sebor
2018-07-24 21:46 ` Jeff Law
2018-07-24 23:18   ` Bernd Edlinger
2018-07-25  4:52     ` Jeff Law
2018-07-25  7:23     ` Richard Biener
2018-07-25 19:37       ` Martin Sebor
2018-07-26  8:55         ` Richard Biener
2018-08-07  2:24           ` Martin Sebor
2018-08-07  8:51             ` Richard Biener
2018-08-07 14:37               ` Martin Sebor
2018-08-07 17:44                 ` Richard Biener
2018-08-08  2:33                   ` Martin Sebor
2018-08-17 10:31                     ` Richard Biener
2018-08-17 15:49                       ` Martin Sebor
2018-08-19 15:55                         ` Bernd Edlinger
2018-08-20 10:24                           ` Richard Biener
2018-08-20 17:23                             ` Bernd Edlinger
2018-08-21  8:46                               ` Richard Biener
2018-08-21 22:25                                 ` Jeff Law
2018-08-22  4:05                                   ` Bernd Edlinger
2018-08-22 16:05                                     ` Martin Sebor
2018-08-22 17:22                                       ` Bernd Edlinger
2018-08-22 22:34                                         ` Jeff Law
2018-08-22 22:57                                           ` Bernd Edlinger
2018-08-22 22:57                                           ` Martin Sebor
2018-08-22 23:08                                             ` Bernd Edlinger
2018-08-21 22:43                           ` Jeff Law
2018-08-22  4:16                             ` Bernd Edlinger
2018-08-22 23:41                               ` Jeff Law
2018-08-26  9:58                                 ` Bernd Edlinger
2018-09-15  9:22                                   ` Bernd Edlinger
2018-10-10 23:12                                     ` Jeff Law
2018-10-12 15:03                                     ` Jeff Law
2018-10-13  9:07                                       ` Bernd Edlinger
2018-10-17 23:59                                         ` Jeff Law
2018-10-20 11:16                                           ` Bernd Edlinger
2018-11-16 17:26                                             ` Bernd Edlinger
2018-08-22 13:10                             ` Bernd Edlinger
2018-10-24  9:14                             ` Maxim Kuvyrkov
2018-10-24 13:38                               ` Bernd Edlinger
2018-10-24 14:26                                 ` Maxim Kuvyrkov
2018-08-03  7:29         ` Jeff Law
2018-08-03  7:19       ` Jeff Law
2018-08-03  7:48         ` Jakub Jelinek
2018-08-06 14:58           ` Jeff Law
2018-08-20 10:06         ` Richard Biener
2018-07-25 17:31     ` Martin Sebor
2018-07-27  6:49       ` Bernd Edlinger
2018-07-31  3:45         ` Martin Sebor
2018-07-31  6:38           ` Jakub Jelinek
2018-07-31 15:17             ` Martin Sebor
2018-07-31 15:48               ` Jakub Jelinek
2018-07-31 23:20                 ` Martin Sebor
2018-08-01  6:55                   ` Bernd Edlinger
2018-08-03  4:19                     ` Martin Sebor
2018-08-06 15:39                       ` Jeff Law
2018-08-01  7:19                   ` Richard Biener
2018-08-01  8:40                     ` Jakub Jelinek
2018-08-03  3:59                       ` Martin Sebor
2018-08-03  7:43                         ` Jakub Jelinek
2018-08-04 20:52                           ` Martin Sebor
2018-08-05  6:51                             ` Bernd Edlinger
2018-08-05 15:49                               ` Jeff Law
2018-08-06 17:15                                 ` Martin Sebor
2018-08-06 17:40                                   ` Jeff Law
2018-08-07  3:39                                     ` Martin Sebor
2018-08-07  5:45                                       ` Richard Biener
2018-08-07 15:02                                         ` Martin Sebor
2018-08-07 15:33                                           ` Bernd Edlinger
2018-08-07 16:31                                             ` Martin Sebor
2018-08-07 17:46                                               ` Richard Biener
2018-08-08 15:51                                                 ` Martin Sebor
2018-08-08 16:12                                                   ` Bernd Edlinger
2018-08-08 17:19                                                   ` Richard Biener
2018-08-07 15:32                                       ` Jeff Law
2018-08-06 22:39                                   ` Jeff Law
2018-08-05 17:00                             ` Jeff Law
2018-08-05 17:27                             ` Richard Biener
2018-08-06 15:36                               ` Martin Sebor
2018-08-02  3:13                     ` Martin Sebor
2018-08-02 10:22                       ` Bernd Edlinger
2018-08-02 15:42                         ` Martin Sebor
2018-08-02 17:00                           ` Martin Sebor
2018-08-02 18:15                             ` Bernd Edlinger
2018-08-03  3:06                               ` Martin Sebor
2018-08-02 18:20                             ` Jakub Jelinek
2018-08-03  3:24                               ` Martin Sebor
2018-08-09  5:36                           ` Jeff Law
2018-08-10 16:56                             ` Martin Sebor
2018-08-15  4:39                               ` Jeff Law
2018-08-20 10:12                                 ` Richard Biener
2018-08-20 10:23                                   ` Bernd Edlinger
2018-08-20 14:26                                     ` Jeff Law
2018-08-20 15:16                                       ` Bernd Edlinger
2018-08-20 20:42                                         ` Martin Sebor
2018-08-20 21:31                                           ` Bernd Edlinger
2018-08-21  2:43                                             ` Martin Sebor
2018-08-21  5:38                                               ` Bernd Edlinger
2018-08-21 21:58                                         ` Jeff Law
2018-08-03  7:47                       ` Jeff Law
2018-08-03  7:38                     ` Jeff Law
2018-08-06 15:34         ` Jeff Law
2018-08-03  7:00     ` Jeff Law
2018-08-04 21:56       ` Martin Sebor
2018-08-05  6:08         ` Bernd Edlinger
2018-08-05 15:58           ` Jeff Law
2018-08-06 11:57             ` Bernd Edlinger
2018-08-06 15:12         ` Jeff Law
2018-08-06 16:32           ` Martin Sebor
2018-08-06 17:44             ` Richard Biener [this message]
2018-08-06 23:59               ` Martin Sebor
2018-08-07 15:54                 ` Jeff Law
2018-08-06 22:48             ` Jeff Law
2018-08-09  5:26     ` Jeff Law
2018-08-09  6:27       ` Richard Biener
2018-08-17  5:09         ` Jeff Law
2018-07-25  7:08   ` Richard Biener
2018-08-02 16:45     ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14925CCD-628E-4FED-A0C3-2407FF6D88D1@suse.de \
    --to=rguenther@suse.de \
    --cc=bernd.edlinger@hotmail.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=law@redhat.com \
    --cc=msebor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).