Re: [PATCH] handle sprintf(d, "%s", ...) in gimple-ssa-sprintf.c

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <richard.guenther@gmail.com>
To: Jeff Law <law@redhat.com>
Cc: Martin Sebor <msebor@gmail.com>,
	Gcc Patch List <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] handle sprintf(d, "%s", ...) in gimple-ssa-sprintf.c
Date: Tue, 02 May 2017 10:27:00 -0000	[thread overview]
Message-ID: <CAFiYyc37fsOQ4t=OXnn9R-UCaiaEEbR4HGoOT1E3YqR8KO6NFg@mail.gmail.com> (raw)
In-Reply-To: <0de09846-2f84-4763-58b2-b31e5490a6bb@redhat.com>

On Fri, Apr 28, 2017 at 6:35 PM, Jeff Law <law@redhat.com> wrote:
> On 04/25/2017 09:55 PM, Martin Sebor wrote:
>>
>> On 04/25/2017 04:05 PM, Jeff Law wrote:
>>>
>>> On 04/21/2017 03:33 PM, Martin Sebor wrote:
>>>>
>>>> Bug 77671 - missing -Wformat-overflow warning on sprintf overflow
>>>> with "%s", is caused by gimple-fold.c transforming s{,n}printf
>>>> calls with a plain "%s" format string into strcpy regardless of
>>>> whether they are valid well before the -Wformtat-overflow warning
>>>> has had a chance to validate them.
>>>>
>>>> The attached patch moves the transformation from gimple-fold.c
>>>> into the gimple-ssa-sprintf.c pass, thus allowing the calls to
>>>> be validated.  Only valid calls (i.e., those that do not overflow
>>>> the destination) are transformed.
>>>>
>>>> Martin
>>>>
>>>> gcc-77671.diff
>>>>
>>>>
>>>> commit 2cd98763984ffb93606ee96ad658733b4c95c1e4
>>>> Author: Martin Sebor<msebor@redhat.com>
>>>> Date:   Wed Apr 12 18:36:26 2017 -0600
>>>>
>>>>      PR middle-end/77671 - missing -Wformat-overflow warning on
>>>> sprintf overflow
>>>>      with %s
>>>>           gcc/ChangeLog:
>>>>                   PR middle-end/77671
>>>>              * gimple-fold.c (gimple_fold_builtin_sprintf): Make extern.
>>>>              (gimple_fold_builtin_snprintf): Same.
>>>>              * gimple-fold.h (gimple_fold_builtin_sprintf): Declare.
>>>>              (gimple_fold_builtin_snprintf): Same.
>>>>              * gimple-ssa-sprintf.c (get_format_string): Correct the
>>>> detection
>>>>              of character types.
>>>>              (is_call_safe): New function.
>>>>              (try_substitute_return_value): Call it.
>>>>              (try_simplify_call): New function.
>>>>              (pass_sprintf_length::handle_gimple_call): Call it.
>>>>           gcc/testsuite/ChangeLog:
>>>>                   PR middle-end/77671
>>>>              * gcc.dg/tree-ssa/builtin-sprintf-7.c: New test.
>>>>              * gcc.dg/tree-ssa/builtin-sprintf-8.c: New test.
>>>>              * gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Adjust.
>>>>              * gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: Adjust.
>>>>              * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Adjust.
>>>
>>> I assume this went through the usual regression testing cycle?  I'm a
>>> bit surprised nothing failed due to moving the transformation later in
>>> the pipeline.
>>
>>
>> It did.  There was one regression I neglected to mention.  A test
>> exercising -fexec-charset (bug 20110) fails because the sprintf
>> pass assumes the execution character set is the same as the host
>> character set.  With -fexec-charset set to EBCDIC it gets just
>> as confused as the -Wformat warning does (this is subject of
>> the still unresolved bug 20110).  I've opened bug 80523 for
>> the new problem and I'm testing a patch that handles it.
>>
>>
>>>> diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
>>>> index 2474391..07d6897 100644
>>>> --- a/gcc/gimple-ssa-sprintf.c
>>>> +++ b/gcc/gimple-ssa-sprintf.c
>>>> @@ -373,7 +373,16 @@ get_format_string (tree format, location_t *ploc)
>>>>     if (TREE_CODE (format) != STRING_CST)
>>>>       return NULL;
>>>>   -  if (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (format))) !=
>>>> char_type_node)
>>>> +  tree type = TREE_TYPE (format);
>>>> +  if (TREE_CODE (type) == ARRAY_TYPE)
>>>> +    type = TREE_TYPE (type);
>>>> +
>>>> +  /* Can't just test that:
>>>> +       TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (format))) !=
>>>> char_type_node
>>>> +     See bug 79062.  */
>>>> +  if (TREE_CODE (type) != INTEGER_TYPE
>>>> +      || TYPE_MODE (type) != TYPE_MODE (char_type_node)
>>>> +      || TYPE_PRECISION (type) != TYPE_PRECISION (char_type_node))
>>>>       {
>>>>         /* Wide format string.  */
>>>>         return NULL;
>>>
>>> In the referenced BZ Richi mentions how fold-const.c checks for this
>>> case and also hints that you might want t look at tree-ssa-strlen as
>>> well.  That seems wise.  It also seems wise to factor that check and use
>>> it from all the identified locations.  I don't like that this uses a
>>> different check than what's in fold-const.c.
>>
>>
>> I think what I did comes from tree-ssa-strlen.c (Richi's other
>> suggestion in bug 79062).  In either case I don't fully understand
>> why the existing code doesn't work.
>>
>>> It's also not clear if this change is a requirement to address 77671 or
>>> not.  If so ISTM that we fix 79062 first, then 77671.  If not we can fix
>>> them independently.  In both cases the fix for 79062 can be broken out
>>> into its own patch.
>>
>>
>> I suppose that makes sense.  The hunk above doesn't fully fix
>> 79062.  It just lets the sprintf return value optimization take
>> place with -flto.
>>
>>>> @@ -3278,31 +3287,21 @@ get_destination_size (tree dest)
>>>>     return HOST_WIDE_INT_M1U;
>>>>   }
>>>>   -/* Given a suitable result RES of a call to a formatted output
>>>> function
>>>> -   described by INFO, substitute the result for the return value of
>>>> -   the call.  The result is suitable if the number of bytes it
>>>> represents
>>>> -   is known and exact.  A result that isn't suitable for substitution
>>>> may
>>>> -   have its range set to the range of return values, if that is known.
>>>> -   Return true if the call is removed and gsi_next should not be
>>>> performed
>>>> -   in the caller.  */
>>>> -
>>>>   static bool
>>>> -try_substitute_return_value (gimple_stmt_iterator *gsi,
>>>> -                 const pass_sprintf_length::call_info &info,
>>>> -                 const format_result &res)
>>>> +is_call_safe (const pass_sprintf_length::call_info &info,
>>>> +          const format_result &res, bool under4k,
>>>> +          unsigned HOST_WIDE_INT retval[2])
>>>
>>> You need a function comment for is_call_safe.
>>>
>>>
>>>
>>>> @@ -3423,6 +3458,30 @@ try_substitute_return_value
>>>> (gimple_stmt_iterator *gsi,
>>>>     return removed;
>>>>   }
>>>>   +static bool
>>>> +try_simplify_call (gimple_stmt_iterator *gsi,
>>>> +           const pass_sprintf_length::call_info &info,
>>>> +           const format_result &res)
>>>
>>> Needs a function comment.
>>>
>>>
>>>
>>> So there's the  minor nits to add function comments.  The big question
>>> is the stuff for 79062 and a confirmation that this went through a full
>>> regression test cycle.
>>
>>
>> I think I may have included the (partial) the fix for 79062 to
>> get some tests to pass but I'm not 100% sure.
>>
>> Let me first submit the fix for the -fexec-charset limitation
>> (bug 80523), see if I can separate out the partial fix for 79062,
>> and then resubmit this patch.
>
> OK.  I think 80523 is pretty close.  I mentioned one implementation detail
> (initialize translation table once per file rather than once per function)
> and one question about what happens for long translated strings.   I suspect
> you'll have that one good-to-go shortly.
>
>>
>> FWIW, my fix for bug 79062 is only partial (it gets the pass
>> to run but the warnings are still not issued).  I don't quite
>> understand what prevents the warning flag(s) from getting set
>> when -flto is used.  This seems to be a bigger problem than
>> just the sprintf pass not doing something just right.
>
> I've never dug deeply in the LTO stuff, but I believe we stream the compiler
> flags, so it could be something there.

We do.

> Alternately you might be running into a case where in LTO mode we recreate
> base types.  Look for a type equality tester that goes beyond just testing
> pointer equality.
>
> ie, in LTO I think we'll create a type based on the streamed data, but I
> also think we'll create various basic types.  Thus in LTO mode pointer
> equality may not be sufficient.

We make sure that for most basic types we end up re-using them where possible.
char_type_node is an example where that generally doesn't work because it's
value depends on a command-line flag.

Richard.

> jeff

next prev parent reply	other threads:[~2017-05-02 10:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-22  0:51 Martin Sebor
2017-04-25 22:27 ` Jeff Law
2017-04-26  7:59   ` Martin Sebor
2017-04-28  6:07     ` Martin Sebor
2017-04-28 17:06       ` Jeff Law
2017-04-28 16:48     ` Jeff Law
2017-05-02 10:27       ` Richard Biener [this message]
2017-05-02 14:44         ` Martin Sebor
2017-05-03 10:21           ` Richard Biener
2017-05-03 14:48             ` Martin Sebor
2017-05-12 18:44               ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc37fsOQ4t=OXnn9R-UCaiaEEbR4HGoOT1E3YqR8KO6NFg@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=msebor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).