public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Jeff Law <law@redhat.com>, Jakub Jelinek <jakub@redhat.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] fold more string comparison with known result (PR 90879)
Date: Wed, 14 Aug 2019 20:57:00 -0000	[thread overview]
Message-ID: <985cb0b2-00b5-ec46-8054-4fa6036b2488@gmail.com> (raw)
In-Reply-To: <3097bc55-3102-1545-1dda-5ebe87f480ce@redhat.com>

On 8/13/19 4:46 PM, Jeff Law wrote:
> On 8/13/19 3:43 PM, Martin Sebor wrote:
>> On 8/13/19 2:07 PM, Jeff Law wrote:
>>> On 8/9/19 10:51 AM, Martin Sebor wrote:
>>>>
>>>> PR tree-optimization/90879 - fold zero-equality of strcmp between a
>>>> longer string and a smaller array
>>>>
>>>> gcc/c-family/ChangeLog:
>>>>
>>>>      PR tree-optimization/90879
>>>>      * c.opt (-Wstring-compare): New option.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>      PR tree-optimization/90879
>>>>      * gcc.dg/Wstring-compare-2.c: New test.
>>>>      * gcc.dg/Wstring-compare.c: New test.
>>>>      * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
>>>>      * gcc.dg/strcmpopt_6.c: New test.
>>>>      * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
>>>>      test cases.
>>>>      * gcc.dg/strlenopt-66.c: Run it.
>>>>      * gcc.dg/strlenopt-68.c: New test.
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>      PR tree-optimization/90879
>>>>      * builtins.c (check_access): Avoid using maxbound when null.
>>>>      * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen
>>>> change.
>>>>      * doc/invoke.texi (-Wstring-compare): Document new warning option.
>>>>      * gimple-fold.c (get_range_strlen_tree): Make setting maxbound
>>>>      conditional.
>>>>      (get_range_strlen): Overwrite initial maxbound when non-null.
>>>>      * gimple-ssa-sprintf.c (get_string_length): Adjust to
>>>> get_range_strlen
>>>>      change.
>>>>      * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
>>>>      (used_only_for_zero_equality): New function.
>>>>      (handle_builtin_memcmp): Call it.
>>>>      (determine_min_objsize): Return an integer instead of tree.
>>>>      (get_len_or_size, strxcmp_eqz_result): New functions.
>>>>      (maybe_warn_pointless_strcmp): New function.
>>>>      (handle_builtin_string_cmp): Call it.  Fold zero-equality of strcmp
>>>>      between a longer string and a smaller array.
>>>>
>>>
>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>> index 4af47855e7c..31e012b741b 100644
>>>> --- a/gcc/tree-ssa-strlen.c
>>>> +++ b/gcc/tree-ssa-strlen.c
>>>
>>>> @@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest)
>>>>        type = TYPE_MAIN_VARIANT (type);
>>>>    -  /* We cannot determine the size of the array if it's a flexible
>>>> array,
>>>> -     which is declared at the end of a structure.  */
>>>> -  if (TREE_CODE (type) == ARRAY_TYPE
>>>> -      && !array_at_struct_end_p (dest))
>>>> +  /* The size of a flexible array cannot be determined.  Otherwise,
>>>> +     for arrays with more than one element, return the size of its
>>>> +     type.  GCC itself misuses arrays of both zero and one elements
>>>> +     as flexible array members so they are excluded as well.  */
>>>> +  if (TREE_CODE (type) != ARRAY_TYPE
>>>> +      || !array_at_struct_end_p (dest))
>>>>        {
>>>> -      tree size_t = TYPE_SIZE_UNIT (type);
>>>> -      if (size_t && TREE_CODE (size_t) == INTEGER_CST
>>>> -      && !integer_zerop (size_t))
>>>> -        return size_t;
>>>> +      tree type_size = TYPE_SIZE_UNIT (type);
>>>> +      if (type_size && TREE_CODE (type_size) == INTEGER_CST
>>>> +      && !integer_onep (type_size)
>>>> +      && !integer_zerop (type_size))
>>>> +        return tree_to_uhwi (type_size);
>>> So I nearly commented on this when looking at the original patch.  Can
>>> we really depend on the size when we've got an array at the end of a
>>> struct with a declared size other than 0/1?   While 0/1 are by far the
>>> most common way to declare them, couldn't someone have used other sizes?
>>>    I think we pondered doing that at one time to cut down on the noise
>>> from Coverity for RTL and TREE operand accessors.
>>>
>>> Your code makes us safer, so I'm not saying you've done anything wrong,
>>> just trying to decide if we need to tighten this up even further.
>>
>> This patch issues a warning in these cases, i.e., when it sees
>> a call like, say, strcmp("foobar", A) with an A that's smaller
>> than the string, because it seems they are likely (rare) bugs.
>> I haven't seen the warning in any of the projects I tested it
>> with (Binutils/GDB, GCC, Glibc, the Linux kernel, and LLVM).
>>
>> The warning uses strcmp to detect these mistakes (or misuses)
>> but I'd like to add similar warnings for other string functions
>> as well and have code out there that does this on purpose use
>> true flexible array members (or the zero-length extension)
>> instead.  That makes the intent clear.
>>
>> It's a judgment call whether to also fold (or do something else
>> like insert a trap) in addition to issuing a warning.  In this
>> case (reading) I don't think it matters as much as it does for
>> writes.  Either way, it would be nice to set a policy and
>> document it in the manual so users know what to expect and
>> so we don't have to revisit this question for each patch that
>> touches on this subject.
> The GCC manual documents zero length arrays at the end of an aggregate
> as a GNU extension for variable length objects.  The manual also
> documents that it could be done with single element arrays, but that
> doing so does contribute to the base size of the aggregate, but
> otherwise it's handled like a zero length array.
> 
> So both zero and one element arrays are documented as supported for this
> use case.  However, I could easily see someone making the case that any
> size should work here and I could easily think of cases where that would
> be a reasonable thing to do.  We do not handle these cases in a
> consistent way -- we'll treat sizes other than 0/1 as being a variable
> length object in some cases, but not in others.
> 
> I'm tempted to bring consistency here.  We're likely not losing
> significant diagnostic opportunities or optimizations if we treat all
> trailing arrays as creating potentially variable sized objects.

It's not terribly important in this case but it is very much so
in general.

I would tend to agree that treating all trailing arrays as flexible
array members might not have a dramatic an impact om efficiency.
But when it comes to detecting and diagnosing buffer overflow, it
most certainly does.  For example:

   struct S
   {
     void (*pf)(void);
     char a[8];
   };

   void f (struct S *p)
   {
     strcpy (p->a, "0123456789");
   }

GCC doesn't diagnose the overflow even with _FORTIFY_SOURCE=2,
despite the high likelihood of memory corruption.  How common
are structs with trailing arrays?

In a GCC build on x86_64, there are 726 distinct structs with
an array as the last member.  Of those, 268 have more than one
element.  In Binutils/GDB, it's 638 and 537, respectively.  In
the kernel it's 8283 and 7584 (and 3795 have 10 or more).

Some of these serve the purpose of flexible array members but
it seems highly unlikely it's more than a small subset.  By
assuming the opposite we are leaving all the code that accesses
those arrays with no protection against buffer overflow.

I believe object and subobject boundaries must be respected.
There is room for exemptions but those need to be made explicit
in the code.  In this case, the mechanism is the flexible array
member syntax (or the zero-length array extension(*)).  The GCC
manual explicitly discourages the one-element array case and
so I'd say it would be appropriate to warn about it if we can
determine it's being misused (otherwise, why discourage it if
we don't really mean it?)  The manual doesn't mention support
for larger arrays and writing past the end of those is almost
certainly a bug.  Those should all be diagnosed.  Clang and ICC
already do diagnose these(**), and both, especially Clang, are
being used to compile increasing proportion of the Linux
ecosystem, so there is less and less reason for GCC to be
as permissive as it has been and not issue comparable
diagnostics.

Martin

[*] To minimize the transition effort we could introduce a new
"flexarray" attribute to annotate larger trailing arrays with
to indicate they are intended to be used as flexible array members.
But I don't have the impression the patter is wide- spread enough
to justify adding yet another extension.

[**] Both Clang and ICC have issued a warning for the out-of-
bounds access below for many releases:

struct S
{
   void (*pf)(void);
   char a[8];
};

void f (struct S *p)
{
   p->a[8] = 0;
}

  reply	other threads:[~2019-08-14 20:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-09 16:42 Martin Sebor
2019-08-09 16:51 ` Jakub Jelinek
2019-08-09 17:07   ` Martin Sebor
2019-08-09 17:07     ` Jakub Jelinek
2019-08-09 22:45       ` Martin Sebor
2019-08-12 13:56         ` Michael Matz
2019-08-14 16:30           ` Martin Sebor
2019-08-12 20:15         ` Jeff Law
2019-08-12 22:32           ` Martin Sebor
2019-08-13  2:22             ` Jeff Law
2019-08-13 20:08     ` Jeff Law
2019-08-13 23:26       ` Martin Sebor
2019-08-14  0:39         ` Jeff Law
2019-08-14 20:57           ` Martin Sebor [this message]
2019-08-21  7:40             ` Martin Sebor
2019-08-22 22:23               ` Jeff Law
2019-08-28 21:36                 ` Martin Sebor
2019-09-03 20:01                   ` Jeff Law
2019-09-23 22:14                     ` Martin Sebor
2019-10-04 21:15                       ` Jeff Law
2019-08-12 22:22 ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=985cb0b2-00b5-ec46-8054-4fa6036b2488@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=law@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).