From: Martin Sebor <msebor@gmail.com>
To: Jeff Law <law@redhat.com>, Jakub Jelinek <jakub@redhat.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] fold more string comparison with known result (PR 90879)
Date: Wed, 21 Aug 2019 07:40:00 -0000 [thread overview]
Message-ID: <7012f14f-174a-0cae-ef35-93dd32417f07@gmail.com> (raw)
In-Reply-To: <985cb0b2-00b5-ec46-8054-4fa6036b2488@gmail.com>
Jeff,
Please let me know if you agree/disagree and what I need to
do to advance this work:
https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html
Thanks
Martin
On 8/14/19 1:59 PM, Martin Sebor wrote:
> On 8/13/19 4:46 PM, Jeff Law wrote:
>> On 8/13/19 3:43 PM, Martin Sebor wrote:
>>> On 8/13/19 2:07 PM, Jeff Law wrote:
>>>> On 8/9/19 10:51 AM, Martin Sebor wrote:
>>>>>
>>>>> PR tree-optimization/90879 - fold zero-equality of strcmp between a
>>>>> longer string and a smaller array
>>>>>
>>>>> gcc/c-family/ChangeLog:
>>>>>
>>>>> Â Â Â Â Â PR tree-optimization/90879
>>>>> Â Â Â Â Â * c.opt (-Wstring-compare): New option.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> Â Â Â Â Â PR tree-optimization/90879
>>>>> Â Â Â Â Â * gcc.dg/Wstring-compare-2.c: New test.
>>>>> Â Â Â Â Â * gcc.dg/Wstring-compare.c: New test.
>>>>> Â Â Â Â Â * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
>>>>> Â Â Â Â Â * gcc.dg/strcmpopt_6.c: New test.
>>>>> Â Â Â Â Â * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
>>>>> Â Â Â Â Â test cases.
>>>>> Â Â Â Â Â * gcc.dg/strlenopt-66.c: Run it.
>>>>> Â Â Â Â Â * gcc.dg/strlenopt-68.c: New test.
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> Â Â Â Â Â PR tree-optimization/90879
>>>>> Â Â Â Â Â * builtins.c (check_access): Avoid using maxbound when null.
>>>>> Â Â Â Â Â * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen
>>>>> change.
>>>>> Â Â Â Â Â * doc/invoke.texi (-Wstring-compare): Document new warning
>>>>> option.
>>>>> Â Â Â Â Â * gimple-fold.c (get_range_strlen_tree): Make setting maxbound
>>>>> Â Â Â Â Â conditional.
>>>>> Â Â Â Â Â (get_range_strlen): Overwrite initial maxbound when non-null.
>>>>> Â Â Â Â Â * gimple-ssa-sprintf.c (get_string_length): Adjust to
>>>>> get_range_strlen
>>>>> Â Â Â Â Â change.
>>>>> Â Â Â Â Â * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
>>>>> Â Â Â Â Â (used_only_for_zero_equality): New function.
>>>>> Â Â Â Â Â (handle_builtin_memcmp): Call it.
>>>>> Â Â Â Â Â (determine_min_objsize): Return an integer instead of tree.
>>>>> Â Â Â Â Â (get_len_or_size, strxcmp_eqz_result): New functions.
>>>>> Â Â Â Â Â (maybe_warn_pointless_strcmp): New function.
>>>>>      (handle_builtin_string_cmp): Call it. Fold zero-equality of
>>>>> strcmp
>>>>> Â Â Â Â Â between a longer string and a smaller array.
>>>>>
>>>>
>>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>>> index 4af47855e7c..31e012b741b 100644
>>>>> --- a/gcc/tree-ssa-strlen.c
>>>>> +++ b/gcc/tree-ssa-strlen.c
>>>>
>>>>> @@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest)
>>>>> Â Â Â Â Â type = TYPE_MAIN_VARIANT (type);
>>>>> Â Â -Â /* We cannot determine the size of the array if it's a flexible
>>>>> array,
>>>>> -    which is declared at the end of a structure. */
>>>>> -Â if (TREE_CODE (type) == ARRAY_TYPE
>>>>> -Â Â Â Â Â && !array_at_struct_end_p (dest))
>>>>> + /* The size of a flexible array cannot be determined. Otherwise,
>>>>> +Â Â Â Â for arrays with more than one element, return the size of its
>>>>> +    type. GCC itself misuses arrays of both zero and one elements
>>>>> +    as flexible array members so they are excluded as well. */
>>>>> +Â if (TREE_CODE (type) != ARRAY_TYPE
>>>>> +Â Â Â Â Â || !array_at_struct_end_p (dest))
>>>>> Â Â Â Â Â Â {
>>>>> -Â Â Â Â Â tree size_t = TYPE_SIZE_UNIT (type);
>>>>> -Â Â Â Â Â if (size_t && TREE_CODE (size_t) == INTEGER_CST
>>>>> -Â Â Â Â Â && !integer_zerop (size_t))
>>>>> -Â Â Â Â Â Â Â return size_t;
>>>>> +Â Â Â Â Â tree type_size = TYPE_SIZE_UNIT (type);
>>>>> +Â Â Â Â Â if (type_size && TREE_CODE (type_size) == INTEGER_CST
>>>>> +Â Â Â Â Â && !integer_onep (type_size)
>>>>> +Â Â Â Â Â && !integer_zerop (type_size))
>>>>> +Â Â Â Â Â Â Â return tree_to_uhwi (type_size);
>>>> So I nearly commented on this when looking at the original patch. Can
>>>> we really depend on the size when we've got an array at the end of a
>>>> struct with a declared size other than 0/1?  While 0/1 are by far the
>>>> most common way to declare them, couldn't someone have used other
>>>> sizes?
>>>> Â Â I think we pondered doing that at one time to cut down on the noise
>>>> from Coverity for RTL and TREE operand accessors.
>>>>
>>>> Your code makes us safer, so I'm not saying you've done anything wrong,
>>>> just trying to decide if we need to tighten this up even further.
>>>
>>> This patch issues a warning in these cases, i.e., when it sees
>>> a call like, say, strcmp("foobar", A) with an A that's smaller
>>> than the string, because it seems they are likely (rare) bugs.
>>> I haven't seen the warning in any of the projects I tested it
>>> with (Binutils/GDB, GCC, Glibc, the Linux kernel, and LLVM).
>>>
>>> The warning uses strcmp to detect these mistakes (or misuses)
>>> but I'd like to add similar warnings for other string functions
>>> as well and have code out there that does this on purpose use
>>> true flexible array members (or the zero-length extension)
>>> instead. That makes the intent clear.
>>>
>>> It's a judgment call whether to also fold (or do something else
>>> like insert a trap) in addition to issuing a warning. In this
>>> case (reading) I don't think it matters as much as it does for
>>> writes. Either way, it would be nice to set a policy and
>>> document it in the manual so users know what to expect and
>>> so we don't have to revisit this question for each patch that
>>> touches on this subject.
>> The GCC manual documents zero length arrays at the end of an aggregate
>> as a GNU extension for variable length objects. The manual also
>> documents that it could be done with single element arrays, but that
>> doing so does contribute to the base size of the aggregate, but
>> otherwise it's handled like a zero length array.
>>
>> So both zero and one element arrays are documented as supported for this
>> use case. However, I could easily see someone making the case that any
>> size should work here and I could easily think of cases where that would
>> be a reasonable thing to do. We do not handle these cases in a
>> consistent way -- we'll treat sizes other than 0/1 as being a variable
>> length object in some cases, but not in others.
>>
>> I'm tempted to bring consistency here. We're likely not losing
>> significant diagnostic opportunities or optimizations if we treat all
>> trailing arrays as creating potentially variable sized objects.
>
> It's not terribly important in this case but it is very much so
> in general.
>
> I would tend to agree that treating all trailing arrays as flexible
> array members might not have a dramatic an impact om efficiency.
> But when it comes to detecting and diagnosing buffer overflow, it
> most certainly does. For example:
>
> Â struct S
> Â {
> Â Â Â void (*pf)(void);
> Â Â Â char a[8];
> Â };
>
> Â void f (struct S *p)
> Â {
> Â Â Â strcpy (p->a, "0123456789");
> Â }
>
> GCC doesn't diagnose the overflow even with _FORTIFY_SOURCE=2,
> despite the high likelihood of memory corruption. How common
> are structs with trailing arrays?
>
> In a GCC build on x86_64, there are 726 distinct structs with
> an array as the last member. Of those, 268 have more than one
> element. In Binutils/GDB, it's 638 and 537, respectively. In
> the kernel it's 8283 and 7584 (and 3795 have 10 or more).
>
> Some of these serve the purpose of flexible array members but
> it seems highly unlikely it's more than a small subset. By
> assuming the opposite we are leaving all the code that accesses
> those arrays with no protection against buffer overflow.
>
> I believe object and subobject boundaries must be respected.
> There is room for exemptions but those need to be made explicit
> in the code. In this case, the mechanism is the flexible array
> member syntax (or the zero-length array extension(*)). The GCC
> manual explicitly discourages the one-element array case and
> so I'd say it would be appropriate to warn about it if we can
> determine it's being misused (otherwise, why discourage it if
> we don't really mean it?)Â The manual doesn't mention support
> for larger arrays and writing past the end of those is almost
> certainly a bug. Those should all be diagnosed. Clang and ICC
> already do diagnose these(**), and both, especially Clang, are
> being used to compile increasing proportion of the Linux
> ecosystem, so there is less and less reason for GCC to be
> as permissive as it has been and not issue comparable
> diagnostics.
>
> Martin
>
> [*] To minimize the transition effort we could introduce a new
> "flexarray" attribute to annotate larger trailing arrays with
> to indicate they are intended to be used as flexible array members.
> But I don't have the impression the patter is wide- spread enough
> to justify adding yet another extension.
>
> [**] Both Clang and ICC have issued a warning for the out-of-
> bounds access below for many releases:
>
> struct S
> {
> Â void (*pf)(void);
> Â char a[8];
> };
>
> void f (struct S *p)
> {
> Â p->a[8] = 0;
> }
next prev parent reply other threads:[~2019-08-21 2:11 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-09 16:42 Martin Sebor
2019-08-09 16:51 ` Jakub Jelinek
2019-08-09 17:07 ` Martin Sebor
2019-08-09 17:07 ` Jakub Jelinek
2019-08-09 22:45 ` Martin Sebor
2019-08-12 13:56 ` Michael Matz
2019-08-14 16:30 ` Martin Sebor
2019-08-12 20:15 ` Jeff Law
2019-08-12 22:32 ` Martin Sebor
2019-08-13 2:22 ` Jeff Law
2019-08-13 20:08 ` Jeff Law
2019-08-13 23:26 ` Martin Sebor
2019-08-14 0:39 ` Jeff Law
2019-08-14 20:57 ` Martin Sebor
2019-08-21 7:40 ` Martin Sebor [this message]
2019-08-22 22:23 ` Jeff Law
2019-08-28 21:36 ` Martin Sebor
2019-09-03 20:01 ` Jeff Law
2019-09-23 22:14 ` Martin Sebor
2019-10-04 21:15 ` Jeff Law
2019-08-12 22:22 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7012f14f-174a-0cae-ef35-93dd32417f07@gmail.com \
--to=msebor@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=law@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).