public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Jeff Law <law@redhat.com>, Jakub Jelinek <jakub@redhat.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] fold more string comparison with known result (PR 90879)
Date: Wed, 21 Aug 2019 07:40:00 -0000	[thread overview]
Message-ID: <7012f14f-174a-0cae-ef35-93dd32417f07@gmail.com> (raw)
In-Reply-To: <985cb0b2-00b5-ec46-8054-4fa6036b2488@gmail.com>

Jeff,

Please let me know if you agree/disagree and what I need to
do to advance this work:

   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html

Thanks
Martin

On 8/14/19 1:59 PM, Martin Sebor wrote:
> On 8/13/19 4:46 PM, Jeff Law wrote:
>> On 8/13/19 3:43 PM, Martin Sebor wrote:
>>> On 8/13/19 2:07 PM, Jeff Law wrote:
>>>> On 8/9/19 10:51 AM, Martin Sebor wrote:
>>>>>
>>>>> PR tree-optimization/90879 - fold zero-equality of strcmp between a
>>>>> longer string and a smaller array
>>>>>
>>>>> gcc/c-family/ChangeLog:
>>>>>
>>>>>      PR tree-optimization/90879
>>>>>      * c.opt (-Wstring-compare): New option.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>      PR tree-optimization/90879
>>>>>      * gcc.dg/Wstring-compare-2.c: New test.
>>>>>      * gcc.dg/Wstring-compare.c: New test.
>>>>>      * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
>>>>>      * gcc.dg/strcmpopt_6.c: New test.
>>>>>      * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
>>>>>      test cases.
>>>>>      * gcc.dg/strlenopt-66.c: Run it.
>>>>>      * gcc.dg/strlenopt-68.c: New test.
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>      PR tree-optimization/90879
>>>>>      * builtins.c (check_access): Avoid using maxbound when null.
>>>>>      * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen
>>>>> change.
>>>>>      * doc/invoke.texi (-Wstring-compare): Document new warning 
>>>>> option.
>>>>>      * gimple-fold.c (get_range_strlen_tree): Make setting maxbound
>>>>>      conditional.
>>>>>      (get_range_strlen): Overwrite initial maxbound when non-null.
>>>>>      * gimple-ssa-sprintf.c (get_string_length): Adjust to
>>>>> get_range_strlen
>>>>>      change.
>>>>>      * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
>>>>>      (used_only_for_zero_equality): New function.
>>>>>      (handle_builtin_memcmp): Call it.
>>>>>      (determine_min_objsize): Return an integer instead of tree.
>>>>>      (get_len_or_size, strxcmp_eqz_result): New functions.
>>>>>      (maybe_warn_pointless_strcmp): New function.
>>>>>      (handle_builtin_string_cmp): Call it.  Fold zero-equality of 
>>>>> strcmp
>>>>>      between a longer string and a smaller array.
>>>>>
>>>>
>>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>>> index 4af47855e7c..31e012b741b 100644
>>>>> --- a/gcc/tree-ssa-strlen.c
>>>>> +++ b/gcc/tree-ssa-strlen.c
>>>>
>>>>> @@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest)
>>>>>        type = TYPE_MAIN_VARIANT (type);
>>>>>    -  /* We cannot determine the size of the array if it's a flexible
>>>>> array,
>>>>> -     which is declared at the end of a structure.  */
>>>>> -  if (TREE_CODE (type) == ARRAY_TYPE
>>>>> -      && !array_at_struct_end_p (dest))
>>>>> +  /* The size of a flexible array cannot be determined.  Otherwise,
>>>>> +     for arrays with more than one element, return the size of its
>>>>> +     type.  GCC itself misuses arrays of both zero and one elements
>>>>> +     as flexible array members so they are excluded as well.  */
>>>>> +  if (TREE_CODE (type) != ARRAY_TYPE
>>>>> +      || !array_at_struct_end_p (dest))
>>>>>        {
>>>>> -      tree size_t = TYPE_SIZE_UNIT (type);
>>>>> -      if (size_t && TREE_CODE (size_t) == INTEGER_CST
>>>>> -      && !integer_zerop (size_t))
>>>>> -        return size_t;
>>>>> +      tree type_size = TYPE_SIZE_UNIT (type);
>>>>> +      if (type_size && TREE_CODE (type_size) == INTEGER_CST
>>>>> +      && !integer_onep (type_size)
>>>>> +      && !integer_zerop (type_size))
>>>>> +        return tree_to_uhwi (type_size);
>>>> So I nearly commented on this when looking at the original patch.  Can
>>>> we really depend on the size when we've got an array at the end of a
>>>> struct with a declared size other than 0/1?   While 0/1 are by far the
>>>> most common way to declare them, couldn't someone have used other 
>>>> sizes?
>>>>    I think we pondered doing that at one time to cut down on the noise
>>>> from Coverity for RTL and TREE operand accessors.
>>>>
>>>> Your code makes us safer, so I'm not saying you've done anything wrong,
>>>> just trying to decide if we need to tighten this up even further.
>>>
>>> This patch issues a warning in these cases, i.e., when it sees
>>> a call like, say, strcmp("foobar", A) with an A that's smaller
>>> than the string, because it seems they are likely (rare) bugs.
>>> I haven't seen the warning in any of the projects I tested it
>>> with (Binutils/GDB, GCC, Glibc, the Linux kernel, and LLVM).
>>>
>>> The warning uses strcmp to detect these mistakes (or misuses)
>>> but I'd like to add similar warnings for other string functions
>>> as well and have code out there that does this on purpose use
>>> true flexible array members (or the zero-length extension)
>>> instead.  That makes the intent clear.
>>>
>>> It's a judgment call whether to also fold (or do something else
>>> like insert a trap) in addition to issuing a warning.  In this
>>> case (reading) I don't think it matters as much as it does for
>>> writes.  Either way, it would be nice to set a policy and
>>> document it in the manual so users know what to expect and
>>> so we don't have to revisit this question for each patch that
>>> touches on this subject.
>> The GCC manual documents zero length arrays at the end of an aggregate
>> as a GNU extension for variable length objects.  The manual also
>> documents that it could be done with single element arrays, but that
>> doing so does contribute to the base size of the aggregate, but
>> otherwise it's handled like a zero length array.
>>
>> So both zero and one element arrays are documented as supported for this
>> use case.  However, I could easily see someone making the case that any
>> size should work here and I could easily think of cases where that would
>> be a reasonable thing to do.  We do not handle these cases in a
>> consistent way -- we'll treat sizes other than 0/1 as being a variable
>> length object in some cases, but not in others.
>>
>> I'm tempted to bring consistency here.  We're likely not losing
>> significant diagnostic opportunities or optimizations if we treat all
>> trailing arrays as creating potentially variable sized objects.
> 
> It's not terribly important in this case but it is very much so
> in general.
> 
> I would tend to agree that treating all trailing arrays as flexible
> array members might not have a dramatic an impact om efficiency.
> But when it comes to detecting and diagnosing buffer overflow, it
> most certainly does.  For example:
> 
>    struct S
>    {
>      void (*pf)(void);
>      char a[8];
>    };
> 
>    void f (struct S *p)
>    {
>      strcpy (p->a, "0123456789");
>    }
> 
> GCC doesn't diagnose the overflow even with _FORTIFY_SOURCE=2,
> despite the high likelihood of memory corruption.  How common
> are structs with trailing arrays?
> 
> In a GCC build on x86_64, there are 726 distinct structs with
> an array as the last member.  Of those, 268 have more than one
> element.  In Binutils/GDB, it's 638 and 537, respectively.  In
> the kernel it's 8283 and 7584 (and 3795 have 10 or more).
> 
> Some of these serve the purpose of flexible array members but
> it seems highly unlikely it's more than a small subset.  By
> assuming the opposite we are leaving all the code that accesses
> those arrays with no protection against buffer overflow.
> 
> I believe object and subobject boundaries must be respected.
> There is room for exemptions but those need to be made explicit
> in the code.  In this case, the mechanism is the flexible array
> member syntax (or the zero-length array extension(*)).  The GCC
> manual explicitly discourages the one-element array case and
> so I'd say it would be appropriate to warn about it if we can
> determine it's being misused (otherwise, why discourage it if
> we don't really mean it?)  The manual doesn't mention support
> for larger arrays and writing past the end of those is almost
> certainly a bug.  Those should all be diagnosed.  Clang and ICC
> already do diagnose these(**), and both, especially Clang, are
> being used to compile increasing proportion of the Linux
> ecosystem, so there is less and less reason for GCC to be
> as permissive as it has been and not issue comparable
> diagnostics.
> 
> Martin
> 
> [*] To minimize the transition effort we could introduce a new
> "flexarray" attribute to annotate larger trailing arrays with
> to indicate they are intended to be used as flexible array members.
> But I don't have the impression the patter is wide- spread enough
> to justify adding yet another extension.
> 
> [**] Both Clang and ICC have issued a warning for the out-of-
> bounds access below for many releases:
> 
> struct S
> {
>    void (*pf)(void);
>    char a[8];
> };
> 
> void f (struct S *p)
> {
>    p->a[8] = 0;
> }

  reply	other threads:[~2019-08-21  2:11 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-09 16:42 Martin Sebor
2019-08-09 16:51 ` Jakub Jelinek
2019-08-09 17:07   ` Martin Sebor
2019-08-09 17:07     ` Jakub Jelinek
2019-08-09 22:45       ` Martin Sebor
2019-08-12 13:56         ` Michael Matz
2019-08-14 16:30           ` Martin Sebor
2019-08-12 20:15         ` Jeff Law
2019-08-12 22:32           ` Martin Sebor
2019-08-13  2:22             ` Jeff Law
2019-08-13 20:08     ` Jeff Law
2019-08-13 23:26       ` Martin Sebor
2019-08-14  0:39         ` Jeff Law
2019-08-14 20:57           ` Martin Sebor
2019-08-21  7:40             ` Martin Sebor [this message]
2019-08-22 22:23               ` Jeff Law
2019-08-28 21:36                 ` Martin Sebor
2019-09-03 20:01                   ` Jeff Law
2019-09-23 22:14                     ` Martin Sebor
2019-10-04 21:15                       ` Jeff Law
2019-08-12 22:22 ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7012f14f-174a-0cae-ef35-93dd32417f07@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=law@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).