public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
To: Jeff Law <law@redhat.com>,
	"gcc-patches@gcc.gnu.org"	<gcc-patches@gcc.gnu.org>,
	Richard Biener <rguenther@suse.de>,
	Martin Sebor	<msebor@gmail.com>
Subject: Re: [PATCH] Cleanup strcpy/stpcpy no nul warning code
Date: Tue, 18 Sep 2018 12:06:00 -0000	[thread overview]
Message-ID: <VI1PR0701MB286291A978C4E3291AA389C6E41D0@VI1PR0701MB2862.eurprd07.prod.outlook.com> (raw)
In-Reply-To: <469fc374-4dc5-e279-c08c-39ec2240b059@redhat.com>

On 09/18/18 07:31, Jeff Law wrote:
> On 9/17/18 1:18 PM, Bernd Edlinger wrote:
>> On 09/17/18 20:32, Jeff Law wrote:
>>> On 9/17/18 12:20 PM, Bernd Edlinger wrote:
>>>> On 09/17/18 19:33, Jeff Law wrote:
>>>>> On 9/16/18 1:58 PM, Bernd Edlinger wrote:
>>>>>> Hi,
>>>>>>
>>>>>> this is a cleanup of the recently added strlen/strcpy/stpcpy
>>>>>> no nul warning code.
>>>>>>
>>>>>> Most importantly it moves the SSA_NAME handling from
>>>>>> unterminated_array to string_constant, thereby fixing
>>>>>> another round of xfails in the strlen and stpcpy test cases.
>>>>>>
>>>>>> I need to say that the fix for bug 86622 is relying in
>>>>>> type info on the pointer which is probably not safe in
>>>>>> GIMPLE in the light of the recent discussion.
>>>>>>
>>>>>> I had to add two further exceptions, which should
>>>>>> be safe in general: that is a pointer arithmentic on a string
>>>>>> literal is okay, and arithmetic on a string constant
>>>>>> that is exactly the size of the whole DECL, cannot
>>>>>> access an adjacent member.
>>>>>>
>>>>>>
>>>>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>>>>> Is it OK for trunk?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
>>>>>>
>>>>>>
>>>>>> patch-cleanup-no-nul.diff
>>>>>>
>>>>>> gcc:
>>>>>> 2018-09-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>>>
>>>>>> 	* builtins.h (unterminated_array): Remove prototype.
>>>>>>            * builtins.c (unterminated_array): Simplify.  Make static.
>>>>>>            (expand_builtin_strcpy_args): Don't use TREE_NO_WARNING here.
>>>>>> 	(expand_builtin_stpcpy_1): Remove warn_string_no_nul without effect.
>>>>>> 	* expr.c (string_constant): Handle SSA_NAME.  Add more exceptions
>>>>>> 	where pointer arithmetic is safe.
>>>>>> 	* gimple-fold.c (get_range_strlen): Handle nonstr like in c_strlen.
>>>>>> 	(get_max_strlen): Remove the unnecessary mynonstr handling.
>>>>>> 	(gimple_fold_builtin_strcpy): Simplify.
>>>>>> 	(gimple_fold_builtin_stpcpy): Simplify.
>>>>>> 	(gimple_fold_builtin_sprintf): Remove NO_WARNING propagation
>>>>>> 	without effect.
>>>>>> 	(gimple_fold_builtin_strlen): Simplify.
>>>>>>
>>>>>> testsuite:
>>>>>> 2018-09-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>>>
>>>>>> 	* gcc.dg/warn-stpcpy-no-nul.c: Remove xfails.
>>>>>> 	* gcc.dg/warn-strlen-no-nul.c: Remove xfails.
>>>>>>
>>>>>> Index: gcc/builtins.c
>>>>>> ===================================================================
>>>>>> --- gcc/builtins.c	(revision 264342)
>>>>>> +++ gcc/builtins.c	(working copy)
>>>>>> @@ -567,31 +567,12 @@ warn_string_no_nul (location_t loc, const char *fn
>>>>>>        the declaration of the object of which the array is a member or
>>>>>>        element.  Otherwise return null.  */
>>>>>>     
>>>>>> -tree
>>>>>> +static tree
>>>>>>     unterminated_array (tree exp)
>>>>>>     {
>>>>>> -  if (TREE_CODE (exp) == SSA_NAME)
>>>>>> -    {
>>>>>> -      gimple *stmt = SSA_NAME_DEF_STMT (exp);
>>>>>> -      if (!is_gimple_assign (stmt))
>>>>>> -	return NULL_TREE;
>>>>>> -
>>>>>> -      tree rhs1 = gimple_assign_rhs1 (stmt);
>>>>>> -      tree_code code = gimple_assign_rhs_code (stmt);
>>>>>> -      if (code == ADDR_EXPR
>>>>>> -	  && TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF)
>>>>>> -	rhs1 = rhs1;
>>>>>> -      else if (code != POINTER_PLUS_EXPR)
>>>>>> -	return NULL_TREE;
>>>>>> -
>>>>>> -      exp = rhs1;
>>>>>> -    }
>>>>>> -
>>>>>>       tree nonstr = NULL;
>>>>>> -  if (c_strlen (exp, 1, &nonstr, 1) == NULL && nonstr)
>>>>>> -    return nonstr;
>>>>>> -
>>>>>> -  return NULL_TREE;
>>>>>> +  c_strlen (exp, 1, &nonstr);
>>>>>> +  return nonstr;
>>>>>>     }
>>>>> Sigh.  This is going to conflict with patch #6 from Martin's kit.
>>>>>
>>>>
>>>> Sorry, it just feels wrong to do this folding here instead of in
>>>> string_constant.
>>>>
>>>> I think the handling of POINTER_PLUS_EXPR above, is faulty,
>>>> because it is ignoring the offset parameter, which typically
>>>> contains the offset part, may add an offset to a different
>>>> structure member or another array index.  That is what happened
>>>> in PR 86622.
>>>>
>>>> So you are likely looking at the wrong index, or even the wrong
>>>> structure member.
>>> I'm not disagreeing that something's wrong here -- the whole concept
>>> that we extract the rhs and totally ignore the offset seems wrong.  I've
>>> stumbled over it working through issues with either patch #4 or #6 from
>>> Martin.  But I felt I needed to go back and reevaluate any assumptions I
>>> had about how the code was supposed to be used before I called it out.
>>>
>>>
>>> Your expr.c changes may be worth pushing through independently of the
>>> rest.    AFAICT they're really just exposing more cases where we can
>>> determine that we're working with a stirng constant.
>>>
>>
>> I think that moves just the folding from here to expr.c,
>> I don't see how the change in part #6 on unterminated_array
>> is supposed to work, I quote it here and add some comments:
>>
> Essentially there's a couple of concepts he wants to get in
> unterminated_array.
> 
> First, given an address, if there's a variable part we want to clear
> exact so we can adjust the text of the message in the caller.  ie
> "exceeds the size" vs "may exceed the size".
> 
> Second, he really wants LEN to be set to the length of the string for
> whatever the constant part of the address might be, even if it's
> potentially not properly terminated.
> 
> So given:
> 
> const char b[][5] = { /* { dg-message "declared here" } */
>    "12", "345", "6789", "abcde"
> };
> 
> A reference like:
> 
> T (&b[3][1] + v0, bsz);
> 
> We want the length of the string at &b[3][1] into *LEN.  That's used to
> provide the potential length of the string in the message.
> 
> I don't think we can get that from c_strlen right now.  Right now we've
> defined c_strlen as (reasonably) returning NULL for the length when
> we've got an string without proper termination.
> 
> I'm hesitant to provide a boolean argument that essentially says give me
> whatever length you've computed, even if the string isn't properly
> terminated.   But conceptually that's one of the things we'd need.  It's
> also needed for patch #4.
> 
> Jeff
> 
> ps.  Yes there's still some issues in the code.  I'm mostly trying to
> highlight how it's being used and what requirements we'd have if we were
> going to do something like you've suggested in your patch to
> unterminated_array.
> 

Hmm, you know, I wrote a while ago the following:
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00411.html

Where I suggested to change c_strlen parameter nonstr to a
structure, where additional information could go.

There are a lot more informations that could be relevant
for warnings.

So it would be easy to convey the unterminated string length
there, also I could imagine to diagnose unaligned wide string
constants, if we have a decl with a too low alignment factor,
or a known unaligned offset for instance.  Also the location
of the expression on which c_strlen found the unterminated
string, could be of interest, especially when get_range_strlen
follows phi nodes.

Well, I have the impression that the 6/6 patch should be split
up in a part that uses c_strlen in its current form, and a
follow-up patch that adds any additional info.
I prefer doing things step-by-step, you know.

Note that the example given is probably unsafe:
> T (&b[3][1] + v0, bsz);

This does not pass the check in string_constant because it
looks like _2 = (char*)&b + _1; _3 = strlen (_2);
I have no way to prove that v0 is not larger than sizeof (b[3]).

However T(&a[1] + v0, bsz) will definitely work.

So the warning will likely get completely bogus data out
of unterminated_array.

By the way test test case for pr87053 is an example
of a false positive warning (it comes twice even though
TREE_NO_WARNING is used).

gcc pr87053.c
pr87053.c: In function 'main':
pr87053.c:15:26: warning: 'strlen' argument missing terminating nul [-Wstringop-overflow=]
15 |   if (__builtin_strlen (u.z) != 7)
    |                         ~^~
pr87053.c:11:3: note: referenced argument declared here
11 | } u = {{"1234", "567"}};
    |   ^
pr87053.c:15:26: warning: 'strlen' argument missing terminating nul [-Wstringop-overflow=]
15 |   if (__builtin_strlen (u.z) != 7)
    |                         ~^~
pr87053.c:11:3: note: referenced argument declared here
11 | } u = {{"1234", "567"}};
    |   ^

I think the duplicate warnings will not go away unless
this warning is only diagnosed in the tree-ssa-strlen
pass instead of the folding code, which happens repeatedly.



Bernd.

  reply	other threads:[~2018-09-18 11:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-16 20:05 Bernd Edlinger
2018-09-17 17:35 ` Jeff Law
2018-09-17 18:32   ` Bernd Edlinger
2018-09-17 18:35     ` Jeff Law
2018-09-17 19:18       ` Bernd Edlinger
2018-09-17 23:01         ` Jeff Law
2018-09-18  5:38         ` Jeff Law
2018-09-18 12:06           ` Bernd Edlinger [this message]
2018-09-18 18:55             ` Jeff Law
2018-09-22 18:47               ` Martin Liška
2018-09-23  8:49                 ` Martin Liška
2018-09-23 14:35                 ` Jeff Law
2018-09-24 18:17 ` Jeff Law
2018-09-24 18:19   ` Bernd Edlinger
2018-09-25  3:19     ` Jeff Law
2018-09-25 22:43 ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR0701MB286291A978C4E3291AA389C6E41D0@VI1PR0701MB2862.eurprd07.prod.outlook.com \
    --to=bernd.edlinger@hotmail.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=msebor@gmail.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).