public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Jeff Law <law@redhat.com>, Richard Biener <richard.guenther@gmail.com>
Cc: Gcc Patch List <gcc-patches@gcc.gnu.org>
Subject: Re: [PING 2][PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)
Date: Sun, 26 Nov 2017 09:12:00 -0000	[thread overview]
Message-ID: <7b05b1da-da02-1385-5ef6-ba0eee6fc7b1@gmail.com> (raw)
In-Reply-To: <5ffed8d2-4957-898f-590f-b09a3d9ca719@redhat.com>

On 11/22/2017 04:50 PM, Jeff Law wrote:
> On 11/16/2017 02:29 PM, Martin Sebor wrote:
>> Ping.
>>
>> I've fixed the outstanding false positive exposed by the Linux
>> kernel.  The kernel builds with four instances of the warning,
>> all of them valid (perfect overlap in memcpy).
>>
>> I also built Glibc.  It shows one instance of the warning, also
>> a true positive (cause by calling a restrict-qualified function
>> with two copies of the same argument).
>>
>> Finally, I built Binutils and GDB with no warnings.
>>
>> The attached patch includes just that one fix, with everything
>> else being the same.
>>
>> On 11/09/2017 04:57 PM, Martin Sebor wrote:
>>> Ping:
>>>
>>>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01642.html
>>>
>>> On 10/23/2017 08:42 PM, Martin Sebor wrote:
>>>> Attached is a reworked solution to enhance -Wrestrict while
>>>> avoiding changing tree-vrp.c or any other VRP machinery.  Richard,
>>>> in considering you suggestions I realized that the ao_ref struct
>>>> isn't general enough to detect the kinds of problems I needed to
>>>> etect (storing bit-offsets in HOST_WIDE_INT means out-of-bounds
>>>> offsets cannot be represented or detected, leading to either false
>>>> positives or false negatives).
> So this seems to be a recurring theme, which makes me wonder if we
> should have an ao_ref-like structure that deals in bytes rather than
> bits and make it a first class citizen.   There's certainly clients that
> work on bits and certainly clients that would prefer to work on bytes.

The class I introduced serves a different purpose than ao_ref and
stores a lot more data.

In https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82042#c3 Richard
says that "this [offset being HOST_WIDE_INT and storing bits] is
a know deficiency in ao_ref 'offset' (and also size and maxsize).
Blowing up to offset_int isn't really a good idea."

>>>> Instead, the solution adds a couple of small classes to builtins.c
>>>> to overcome this limitation (I'm contemplating moving them along
>>>> with -Wstringop-overflow to a separate file to keep builtins.c
>>>> from getting too much bigger).
>>>>
>>>> The detection of out-of-bounds offsets and overlapping accesses
>>>> is relatively simple but the rest of the changes are somewhat
>>>> involved because of the computation of the offsets and sizes of
>>>> the overlaps.
>>>>
>>>> Jeff, as per your suggestion/request in an earlier review (bug
>>>> 81117) I've renamed some of the existing functions to better
>>>> reflect their new function (including renaming strlen_optimize_stmt
>>>> in tree-ssa-strlen.c to strlen_check_and_optimize_stmt).  There's
>>>> quite a bit of churn due to some of this renaming.  I don't think
>>>> this renaming makes the review too difficult but if you feel
>>>> differently I can [be persuaded to] split it out into a separate
>>>> patch.
> Understood.  FWIW, one way to deal with this that I've found works
> reasonably well is to have an initial patch that just does the renaming
> you want to do.  That separates the mechanical stuff from the meat of
> the change.  They can be git squashed together just before committing or
> committed as two changes back-to-back to eliminate or minimize the time
> where the variable and function names are inconsistent.
>
> It's also the case that independent little fixes should just go upstream
> immediately.  I didn't see many, but the alloc_max_size fix for KB seems
> like it should have just gone forward independent of the rest of the
> changes.

Sure, I think that change was incidental, not intentional.  There
should be no others (AFAIK).

>>>> To validate the patch I compiled the Linux kernel and Binutils/GDB.
>>>> There's one false positive I'm working on resolving that's caused
>>>> by an incorrect interpretation of an offset in a range whose lower
>>>> bound is greater than its upper bound.  This it so say that while
>>>> I'm aware the patch isn't perfect it already works reasonably well
>>>> in practice and I think it's close enough to review.
>>>>
>>>> Thanks
>>>> Martin
>>>
>>
>>
>> gcc-78918.diff
>>
>>
>> PR tree-optimization/78918 - missing -Wrestrict on memcpy copying over self
>>
>> gcc/c-family/ChangeLog:
>>
>> 	PR tree-optimization/78918
>> 	* c-common.c (check_function_restrict): Avoid checking built-ins.
>> 	* c.opt (-Wrestrict): Include in -Wall.
>>
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/78918
>> 	* builtins.c (builtin_memref, builtin_access): New classes.
>> 	(check_bounds_or_overlap, maybe_diag_overlap): New static functions.
>> 	(maybe_diag_offset_bounds): Same.
>> 	(check_sizes): Rename...
>> 	(check_access): ...to this.  Rename function arguments for clarity.
>> 	(check_memop_sizes): Adjust names.
>> 	(expand_builtin_memchr, expand_builtin_memcpy): Same.
>> 	(expand_builtin_memmove, expand_builtin_mempcpy): Same.
>> 	(expand_builtin_strcat, expand_builtin_stpncpy): Same.
>> 	(check_strncat_sizes, expand_builtin_strncat): Same.
>> 	(expand_builtin_strncpy, expand_builtin_memset): Same.
>> 	(expand_builtin_bzero, expand_builtin_memcmp): Same.
>> 	(expand_builtin_memory_chk, maybe_emit_chk_warning): Same.
>> 	(maybe_emit_sprintf_chk_warning): Same.
>> 	(expand_builtin_strcpy): Adjust.
>> 	(expand_builtin_stpcpy): Same.
>> 	(expand_builtin_with_bounds): Detect out-of-bounds/overlapping
>> 	accesses in pointer-checking forms of memcpy, memmove, and mempcpy.
>> 	(gcall_to_tree_minimal, check_bounds_or_overlap, max_object_size):
>> 	Define new functions.
>> 	* builtins.h (check_bounds_or_overlap, max_object_size): Declare.
>> 	* calls.c (alloc_max_size): Call max_object_size instead of
>> 	hardcoding ssizetype limit.  Fix a typo.
>> 	(get_size_range): Handle new argument.
>> 	* calls.h (get_size_range): Add a new argument.
>> 	* cfgexpand.c (expand_call_stmt): Propagate no-warning bit.
>> 	* doc/invoke.texi (-Wrestrict): Adjust, add example.
>> 	* gimple-fold.c (gimple_fold_builtin_memory_op): Detect overlapping
>> 	operations.
>> 	(gimple_fold_builtin_memory_chk): Same.
>> 	(gimple_fold_builtin_stxcpy_chk): New function.
>> 	* gimple.c (gimple_build_call_from_tree): Propagate location.
>> 	* tree-ssa-strlen.c (handle_builtin_strcpy): Detect overlapping
>> 	operations.
>> 	(handle_builtin_strcat): Same.
>> 	(strlen_optimize_stmt): Rename...
>> 	(strlen_check_and_optimize_stmt): ...to this.  Handle strncat,
>> 	stpncpy, strncpy, and their checking forms.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR tree-optimization/78918
>> 	* c-c++-common/Warray-bounds.c: New test.
>> 	* c-c++-common/Wrestrict-2.c: New test.
>> 	* c-c++-common/Wrestrict.c: New test.
>> 	* c-c++-common/Wrestrict.s: New test.
>> 	* gcc.dg/Walloca-1.c: Adjust/
>> 	* gcc.dg/memcpy-6.c: New test.
>> 	* gcc.dg/pr69172.c: Adjust.
>> 	* gcc.target/i386/chkp-stropt-17.c: New test.
> So I realize you don't have any code to answer this question, but your
> thoughts on how much we might loose effectiveness if we didn't do the
> warnings within gimple_fold_builtin_<whatever>, but instead broke out a
> distinct pass to handle warnings?  My biggest design concern is the
> warning from within the folder aspects.

With optimization enabled the folder folds things like this into
MEM_REF which would prevent the warning unless the pass pass ran
with early optimizations.

   struct S { char a[7], b[7]; };

   void sink (void*);

   void f (void)
   {
     struct S s;
     sink (&s);

     unsigned n = sizeof s.a;
     memcpy (s.a + 4, s.b, n);

     sink (&s);
   }

Without optimization it isn't folded and so the memcpy call is
emitted (but there is no warning).  The overlapping MEM_REF copy
is safe but the overlapping memcpy call is not, so warning on it
is helpful.

I prototyped a pass over Thanksgiving for the -Wrestict code from
builtins.c to and ran it just after the sprintf pass.  There are
still lots of failures in the new test because it's non-trivial
to compute the same data as builtins.c does (the data is also
computed for -Wstringop-overflow).  I could move all that code
to the new pass as well to clear up the failures.  I could also
arrange for the pass to run multiple times to catch cases like
the one above.  I suspect this would trigger some false negatives
and/or positives due to the differences in range information, so
it would mean some cleanup in the tests.  What I can't do without
seriously compromising the feature is avoid calling into the new
pass from tree-ssa-strlen.c, but that would presumably be fine.

With that said, and although I'm not necessarily opposed to it,
moving all this code into its own pass would mean a non-trivial
amount of work for what seems like a questionable benefit.  All
it would achieve, as far as I can see, is duplicating some of
the work that's already been done: iterating over the GIMPLE,
testing for built-ins to handle, and extracting their arguments.
What exactly do you hope to accomplish by moving it into its own
pass?  (If it's a matter of keeping the warning code separate
that can easily be done by moving it to its own file.)

> Are there any interactions with the issues we touched on at the end of
> our call Monday?  ie, in the case where there is no overlap, but the
> strncat/strncpy overwrites the NUL in the source string, we we issue a
> warning.  Do those issues arise in this code?

The -Wrestrict warning is more strict for string functions than
for raw memory functions and when it can't rule out overlap for
string copies into the same object, it warns.  (For raw memory
functions, it only warns when it can prove there is an overlap.)
Other than that, there are no interactions and the -Wrestrict
code doesn't enable/disable any optimizations.

> FWIW I have some concerns about how this code is going to interact with
> the poly_int stuff.  The code which finds and extracts pointer offsets
> in particular.  I'm sure we can adjust if that's necessary (one of the
> meta comments I need to make sure I note in the poly_int thread is a
> request for guidance WRT when code needs to be poly_int aware).
>
>> @@ -192,6 +193,91 @@ static tree do_mpfr_remquo (tree, tree, tree);
>>  static tree do_mpfr_lgamma_r (tree, tree, tree);
>>  static void expand_builtin_sync_synchronize (void);
>>
>> +struct builtin_memref;
>> +
>> +/* Description of a memory access by a raw memory or string built-in
>> +   function.  */
>> +struct builtin_access
>> +{
>> +private:
>> +  /* Temporaries used to compute the final result.  */
>> +  offset_int dstoff[2];
>> +  offset_int srcoff[2];
>> +  offset_int dstsiz[2];
>> +  offset_int srcsiz[2];
>> +
>> +  /* Member function to call to determine overlap.  */
>> +  bool (builtin_access::*detect_overlap) ();
>
> A nit.  For a POD type we use structs, otherwise make it a class.
> Similarly for the builtin_memref class.
>
> Check your ordering of members in builtin_memref.
>
>
> Any reason not to define builtin_memref prior to builtin_access?

Not anymore.  At one point there was a circular dependency between
the two that I have eliminated.

> So I realize why you're not using the ao_ref infrastructure.  However,
> have you reviewed the ao_ref bits to help ensure that builtin_memref
> handles any various corner cases.  Are there pieces you can refactor and
> share between the two pieces of infrastructure?  If I just look at the
> ctor I see a ton of stuff that looks like fairly generic infrastructure
> to take a memory reference an dig down into it to find the base and
> offset.  I'd hate to have two completely different implementations of
> that code with different bugs in each :(

The only similarity between builtin_memref and ao_ref is between
the ctor and ao_ref_init_from_ptr_and_size.  The ctor is a bit more
involved because it deals with non-constant offsets.  But I'm not
aware of any corner cases in ao_ref_init_from_ptr_and_size (the
ctor started as a copy of the function and evolved to what it looks
like now).

Martin

>> @@ -2962,39 +3048,157 @@ determine_block_size (tree len, rtx len_rtx,
>>  			  GET_MODE_MASK (GET_MODE (len_rtx)));
>>  }
>>
>> +/* Validate REF offsets in an EXPRession passed as an argument to a CALL
>> +   to a built-in function FUNC to make sure they are within the bounds
>> +   of the referenced object if its size is known, or PTRDIFF_MAX otherwise.
>> +   Both initial values of the offsets and their final value computed by
>> +   the function by incrementing the initial value by the size are
>> +   validated.  Return true if the offsets are not valid and a diagnostic
>> +   has been isssued.  */
> s/isssued/issued/
>
>
>
>
>
>
>> +}
>> +
>> +/* Return error_mark_node if the signed offset is exceeds the bounds
>> +   of the address space (PTRDIFF_MAX).  Otherwise, return BASE when
>> +   the offset exceeds the bounds of the BASE object. Otherwise return
>> +   NULL to inidctae the offset is in bounds.  */
> s/inidctae/indicate/
>
>
>
>
>> +
>> +
>> +/* Return true if THIS (AKA DSTREF) and SRCREF describe accesses that
>> +   either overlap one another or that, in order not to overlap, would
>> +   imply that the size of the referenced object(s) exceeds the maximum
>> +   size of an object.  Set SIZRANGE to the range of access sizes,
>> +   OVLOFF to the offsets where the overlap occurs (the offsets must
>> +   be valid so that their sum with the low bound of the access size
>> +   is no greater than PTRDIFF_MAX), and OVLSIZ to the rane of the overlap
>> +   sizes.
>> +   Otherwise, if THIS and SRCREF do not definitely overlap (even though
>> +   they may overlap in a way that's not apparent from the available data),
>> +   return false.
>> +   Used for -Wrestrict warnings.  */
> s/rane/range/
>
>
>> +
>> +bool
>> +builtin_access::overlap ()
>> +{
>> +
>
>
>> +
>> +  /* When the lower bound of the offset is less that the upper bound
>> +     disregard	it and use the inverse of the maximum object size
>> +     instead.  The upper bound is the result of a negative offset
>> +     being represented as a large positive value.  */
> It looks like you've got an embedded tab  in that comment after "disregard".
>
>
>
>
>
>> +/* Attempt to detect and diagnose invalid offset bounds and (exept for
>> +   memmove) overlapping copy in a call expression EXP from SRC to DST
>> +   and DSTSIZE and SRCSIZE bytes, respectively.  Return false when one
>> +   or the other has been detected, true otherwise.  */
> s/exept/except/
>
>
>
>
>
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index cb33c1e..048a2e4 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -699,6 +706,15 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
>>       DEST{,+LEN,+LEN-1}.  */
>>    if (operand_equal_p (src, dest, 0))
>>      {
>> +      /* Avoid diagnosing exact overlap in calls to __builtin_memcpy.
>> +	 It's safe and may even be emitted by GCC itself (see bug
>> +	 32667).  However, diagnose it in explicit calls to the memcpy
>> +	 function.  */
>> +      if (check_overlap && *IDENTIFIER_POINTER (DECL_NAME (func)) != '_')
>> +	warning_at (loc, OPT_Wrestrict,
>> +		    "%qD source argument is the same as destination",
>> +		    func);
>> +
> And one could argue that this should just be folded away in a manner
> similar to the others below.  I'd support adding a case for this
> independently of the warnings.
>
>
> Not an ACK or NACK at this point.
>
> Jeff
>

  reply	other threads:[~2017-11-26  0:54 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-16 23:47 [PATCH] " Martin Sebor
2017-07-20 20:46 ` Martin Sebor
2017-07-25  3:13   ` [PING] " Martin Sebor
2017-08-01  2:27     ` [PING #2] " Martin Sebor
2017-08-01  9:23       ` Richard Biener
2017-08-01  9:25         ` Richard Biener
2017-08-02 17:10           ` Jeff Law
2017-08-03  8:46             ` Richard Biener
2017-08-06 23:08               ` Martin Sebor
2017-08-08 13:08                 ` Richard Biener
2017-08-09 16:14                 ` Jeff Law
2017-08-22  2:04                   ` Martin Sebor
2017-08-22  9:17                     ` Richard Biener
2017-08-24 22:36                       ` Jeff Law
2017-08-29  3:42                         ` Martin Sebor
2017-09-01 15:15                           ` Jeff Law
2017-08-29  2:35                       ` Martin Sebor
2017-08-29 11:32                         ` Richard Biener
     [not found]                           ` <40984eff-b156-3315-7bb5-558e9e83bf6c@gmail.com>
2017-10-24  6:40                             ` Martin Sebor
2017-11-10  0:28                               ` [PING][PATCH] " Martin Sebor
2017-11-16 22:24                                 ` [PING 2][PATCH] " Martin Sebor
2017-11-23  0:16                                   ` Jeff Law
2017-11-26  9:12                                     ` Martin Sebor [this message]
2017-11-30  0:56                                       ` Martin Sebor
2017-12-07 21:14                                         ` Jeff Law
2017-12-07 21:28                                           ` Martin Sebor
2017-12-07 21:48                                             ` Jeff Law
2017-12-07 22:23                                         ` Jeff Law
2017-12-08  0:44                                           ` Martin Sebor
2017-12-08 19:19                                             ` Martin Sebor
2017-12-11 22:27                                               ` Jeff Law
2017-12-17  0:02                                                 ` Martin Sebor
2017-12-17 13:32                                                   ` H.J. Lu
2018-01-08  9:56                                                   ` Tom de Vries
2017-12-07 20:16                                       ` Jeff Law
2017-11-27 12:45                                   ` Richard Biener
2017-11-30  1:19                                     ` Martin Sebor
2017-12-07 20:20                                     ` Jeff Law
2017-08-09 16:09               ` [PING #2] [PATCH] " Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b05b1da-da02-1385-5ef6-ba0eee6fc7b1@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).