public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
Cc: Jeff Law <law@redhat.com>,
	Richard Biener <richard.guenther@gmail.com>,
	Gcc Patch List <gcc-patches@gcc.gnu.org>,
	Richard Biener <rguenther@suse.de>
Subject: Re: [PATCH] restore -Warray-bounds for string literals (PR 83776)
Date: Fri, 20 Jul 2018 16:57:00 -0000	[thread overview]
Message-ID: <6126b90f-34b8-a22e-1362-846cc83aa821@gmail.com> (raw)
In-Reply-To: <yddbmb22l9a.fsf@CeBiTec.Uni-Bielefeld.DE>

On 07/20/2018 05:34 AM, Rainer Orth wrote:
> Hi Martin,
>
>> On 07/19/2018 03:51 PM, Jeff Law wrote:
>>> On 07/13/2018 05:45 PM, Martin Sebor wrote:
>>>>>
>>>>> +      offset_int ofr[] = {
>>>>> +       wi::to_offset (fold_convert (ptrdiff_type_node, vr->min)),
>>>>> +       wi::to_offset (fold_convert (ptrdiff_type_node, vr->max))
>>>>> +      };
>>>>>
>>>>> huh.  Do you maybe want to use widest_int for ofr[]?  What's
>>>>> wrong with wi::to_offset (vr->min)?  Building another intermediate
>>>>> tree node here looks just bogus.
>>>>
>>>> I need to convert size_type indices to signed to keep their sign
>>>> if it's negative and include it in warnings.  I've moved the code
>>>> into a conditional where it's used to minimize the cost but other
>>>> than that I don't know how else to convert it.
>>>>
>>>>>
>>>>> What are you trying to do in this loop anyways?
>>>>
>>>> The loop It determines the range of the final index/offset for
>>>> a series of POINTER_PLUS_EXPRs.  It handles cases like this:
>>>>
>>>>   int g (int i, int j, int k)
>>>>   {
>>>>     if (i < 1) i = 1;
>>>>     if (j < 1) j = 1;
>>>>     if (k < 1) k = 1;
>>>>
>>>>     const char *p0 = "123";
>>>>     const char *p1 = p0 + i;
>>>>     const char *p2 = p1 + j;
>>>>     const char *p3 = p2 + k;
>>>>
>>>>     // p2[3] and p3[1] are out of bounds
>>>>     return p0[4] + p1[3] + p2[2] + p3[1];
>>>>   }
>>>>
>>>>> I suppose
>>>>> look at
>>>>>
>>>>>   p_1 = &obj[i_2];  // already bounds-checked, but with ignore_off_by_one
>>>>>   ... = MEM[p_1 + CST];
>>>>>
>>>>> ?  But then
>>>>>
>>>>> +      if (TREE_CODE (varoff) != SSA_NAME)
>>>>> +       break;
>>>>>
>>>>> you should at least handle INTEGER_CSTs here?
>>>>
>>>> It's already handled (and set in CSTOFF).  There should be no
>>>> more constant offsets after that (I haven't come across any.)
>>>>
>>>>>
>>>>> +      if (!vr || vr->type == VR_UNDEFINED || !vr->min || !vr->max)
>>>>> +       break;
>>>>>
>>>>> please use positive tests here, VR_RANGE || VR_ANTI_RANGE.  As usual
>>>>> the anti-range handling looks odd.  Please add comments so we can follow
>>>>> what you were thinking when writing range merging code.  Even better if you
>>>>>
>>>>> can stick to use existing code rather than always re-inventing the wheel...
>>>>>
>>>>
>>>> The anti-range handling is to conservatively add
>>>> [-MAXOBJSIZE -1, MAXOBJSIZE] to OFFRANGE.  I've added comments
>>>> to make it clear.  I'd be more than happy to reuse existing
>>>> code if I knew where to find it (if it exists).  It sure would
>>>> save me lots of work to have a library of utility functions to
>>>> call instead of rolling my own code each time.
>>> Finding stuff is never easy :(  GCC's just gotten so big it's virtually
>>> impossible for anyone to know all the APIs.
>>>
>>> The suggestion I'd have would be to (when possible) factor this stuff
>>> into routines you can reuse.  We (as a whole) have this tendency to
>>> open-code all kinds of things rather than factoring the code into
>>> reusable routines.
>>>
>>> In addition to increasing the probability that you'll be able to reuse
>>> code later, just reading a new function's header tends to make me (as a
>>> reviewer) internally ask if there's already a routine we should be using
>>> instead.  When it's open-coded it's significantly harder to spot those
>>> cases (at least for me).
>>>
>>>
>>>>
>>>>>
>>>>> I think I commented on earlier variants but this doesn't seem to resemble
>>>>> any of them.
>>>>
>>>> I've reworked the patch (sorry) to also handle arrays.  For GCC
>>>> 9 it seems I might as well do both in one go.
>>>>
>>>> Attached is an updated patch with these changes.
>>>>
>>>> Martin
>>>>
>>>> gcc-83776.diff
>>>>
>>>>
>>>> PR tree-optimization/84047 - missing -Warray-bounds on an out-of-bounds
>>>> index into an array
>>>> PR tree-optimization/83776 - missing -Warray-bounds indexing past the
>>>> end of a string literal
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 	PR tree-optimization/84047
>>>> 	PR tree-optimization/83776
>>>> 	* tree-vrp.c (vrp_prop::check_mem_ref): New function.
>>>> 	(check_array_bounds): Call it.
>>>> 	* /gcc/tree-sra.c (get_access_for_expr): Fail for out-of-bounds
>>>> 	array offsets.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 	PR tree-optimization/83776
>>>> 	PR tree-optimization/84047
>>>> 	* gcc.dg/Warray-bounds-29.c: New test.
>>>> 	* gcc.dg/Warray-bounds-30.c: New test.
>>>> 	* gcc.dg/Warray-bounds-31.c: New test.
>>>> 	* gcc.dg/Warray-bounds-32.c: New test.
>>>>
>>>
>>>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>>>> index 3e30f6b..8221a06 100644
>>>> --- a/gcc/tree-sra.c
>>>> +++ b/gcc/tree-sra.c
>>>> @@ -3110,6 +3110,19 @@ get_access_for_expr (tree expr)
>>>>        || !DECL_P (base))
>>>>      return NULL;
>>>>
>>>> +  /* Fail for out-of-bounds array offsets.  */
>>>> +  tree basetype = TREE_TYPE (base);
>>>> +  if (TREE_CODE (basetype) == ARRAY_TYPE)
>>>> +    {
>>>> +      if (offset < 0)
>>>> +	return NULL;
>>>> +
>>>> +      if (tree size = DECL_SIZE (base))
>>>> +	if (tree_fits_uhwi_p (size)
>>>> +	    && tree_to_uhwi (size) < (unsigned HOST_WIDE_INT) offset)
>>>> +	  return NULL;
>>>> +    }
>>>> +
>>>>    if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
>>>>      return NULL;
>>> So I'm a bit curious about this hunk.  Did we end up creating an access
>>> structure that walked off the end of the array?  Presumably  you
>>> suppressing SRA at this point so that you can see the array access later
>>> and give a suitable warning.  Right?
>>
>> Yes, but I didn't make a note of the test case that triggered
>> it and I'm not able to trigger the code path now, so the change
>> might not be necessary.  I've removed it from the patch.  If it
>> turns out that it is needed after all I'll commit it later.
>>
>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>>> index a7453ab..27021760 100644
>>>> --- a/gcc/tree-vrp.c
>>>> +++ b/gcc/tree-vrp.c
>>>> @@ -4930,21 +4931,280 @@ vrp_prop::check_array_ref (location_t location, tree ref,
>>>>      }
>>>>  }
>>>>
>>>> +/* Checks one MEM_REF in REF, located at LOCATION, for out-of-bounds
>>>> +   references to string constants.  If VRP can determine that the array
>>>> +   subscript is a constant, check if it is outside valid range.
>>>> +   If the array subscript is a RANGE, warn if it is non-overlapping
>>>> +   with valid range.
>>>> +   IGNORE_OFF_BY_ONE is true if the MEM_REF is inside an ADDR_EXPR
>>>> +   (used to allow one-past-the-end indices for code that takes
>>>> +   the address of the just-past-the-end element of an array).  */
>>>> +
>>>> +void
>>>> +vrp_prop::check_mem_ref (location_t location, tree ref, bool ignore_off_by_one)
>>>> +{
>>>> +  if (TREE_NO_WARNING (ref))
>>>> +    return;
>>>> +
>>>> +  tree arg = TREE_OPERAND (ref, 0);
>>>> +  /* The constant and variable offset of the reference.  */
>>>> +  tree cstoff = TREE_OPERAND (ref, 1);
>>>> +  tree varoff = NULL_TREE;
>>>> +
>>>> +  const offset_int maxobjsize = tree_to_shwi (max_object_size ());
>>>> +
>>>> +  /* The array or string constant bounds in bytes.  Initially set
>>>> +     to [-MAXOBJSIZE - 1, MAXOBJSIZE]  until a tighter bound is
>>>> +     determined.  */
>>>> +  offset_int arrbounds[2];
>>>> +  arrbounds[1] = maxobjsize;
>>>> +  arrbounds[0] = -arrbounds[1] - 1;
>>> I realize that arrbounds[1] == maxobjsize at this point.  But is there
>>> any reason not to use maxobjsize in the assignment to arrbounds[0] so
>>> that its computation matches the comment.  It would also seem advisable
>>> to set the min first, then the max since that's the ordering in the
>>> comment and in the array.
>>
>> Done.
>>
>>>
>>>
>>>> +
>>>> +  /* The minimum and maximum intermediate offset.  For a reference
>>>> +     to be valid, not only does the final offset/subscript must be
>>>> +     in bounds but all intermediate offsets must be as well. */
>>>> + offset_int ioff = wi::to_offset (fold_convert (ptrdiff_type_node,
>>>> cstoff));
>>>> +  offset_int extrema[2] = { 0, wi::abs (ioff) };
>>> You should probably tone back the comment here since the intermediate
>>> offsets do not have to be in bounds.
>>
>> Done.
>>
>>>
>>>
>>> [ Big snip ]
>>>
>>>> +  /* The type of the object being referred to.  It can be an array,
>>>> +     string literal, or a non-array type when the MEM_REF represents
>>>> +     a reference/subscript via a pointer to an object that is not
>>>> +     an element of an array.  References to members of structs and
>>>> +     unions are excluded because MEM_REF doesn't make it possible
>>>> +     to identify the member where the reference orginated.  */
>>> s/orginated/originated/
>>>
>>> OK with those changes.
>>
>> Committed without the SRA change as 262893.
>
> your patch has caused quite a number of testsuite failures:
>
> +FAIL: c-c++-common/Warray-bounds-2.c  -std=gnu++11 (test for excess errors)
> +FAIL: c-c++-common/Warray-bounds-2.c  -std=gnu++14 (test for excess errors)
> +FAIL: c-c++-common/Warray-bounds-2.c  -std=gnu++98 (test for excess errors)
>
> Excess errors:
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/c-c++-common/Warray-bounds-2.c:200:11: warning: array subscript -1 is outside array bounds of 'Array [2]' [-Warray-bounds]
>
> +FAIL: c-c++-common/Warray-bounds-2.c  -Wc++-compat  (test for excess errors)
>
> on 32 and 64-bit Solaris/SPARC and x86, also Linux/x86_64
>
> +XPASS: gcc.dg/Warray-bounds-31.c  (test for warnings, line 26)
> +XPASS: gcc.dg/Warray-bounds-31.c  (test for warnings, line 40)
> +FAIL: gcc.dg/Warray-bounds-31.c (test for excess errors)
> +XPASS: gcc.dg/Warray-bounds-31.c bug 84047 (test for warnings, line 72)
> +XPASS: gcc.dg/Warray-bounds-31.c bug 84047 (test for warnings, line 90)
>
> Excess errors:
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/Warray-bounds-31.c:100:3: warning: array subscript -2147483648 is outside array bounds of 'char[9]' [-Warray-bounds]
>
> 32-bit Solaris and Linux only.
>
> +FAIL: gcc.dg/Warray-bounds-32.c  (test for warnings, line 28)

There are outstanding issues that prevent some instances
of out-of-bounds indices from being diagnosed depending on
the target/data model.  I raised pr86611 and pr86613 for
those and xfailed the tests in r262906.  There's also a minor
new issue with duplicate -Warray-bounds in C++ (or in C when
strncpy is not defined by libc as a macro), or with the macro
suppressing a -Warray-bounds when it is a macro.  That caused
the c-c++-common/Warray-bounds-2.c to fail on some targets
I undefined the macros to have the test behave consistently
regardless of libc macros, pruned the duplicate diagnostic,
and opened bug 86614 to look into avoiding the duplication.

Martin

      reply	other threads:[~2018-07-20 16:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-26  4:14 Martin Sebor
2018-04-30 22:57 ` Jeff Law
2018-07-13 23:09   ` Martin Sebor
2018-07-16  2:53     ` Jeff Law
2018-05-02  7:22 ` Richard Biener
2018-07-13 23:46   ` Martin Sebor
2018-07-19 21:51     ` Jeff Law
2018-07-19 23:36       ` Martin Sebor
2018-07-20 11:35         ` Rainer Orth
2018-07-20 16:57           ` Martin Sebor [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6126b90f-34b8-a22e-1362-846cc83aa821@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=rguenther@suse.de \
    --cc=richard.guenther@gmail.com \
    --cc=ro@CeBiTec.Uni-Bielefeld.DE \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).