public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jeff Law <law@redhat.com>
To: Martin Sebor <msebor@gmail.com>, Jakub Jelinek <jakub@redhat.com>,
	Joseph Myers <joseph@codesourcery.com>
Cc: Gcc Patch List <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
Date: Fri, 28 Apr 2017 16:35:00 -0000	[thread overview]
Message-ID: <d85393f5-bbf5-4898-579a-7b9c2ba8a624@redhat.com> (raw)
In-Reply-To: <7b8f6999-942c-1540-c6e6-a6284d57baf5@gmail.com>

On 04/27/2017 03:05 PM, Martin Sebor wrote:
> On 04/26/2017 04:34 PM, Jakub Jelinek wrote:
>>
>> Also, can't there be a way to shortcut all this processing if the
>> charsets are the same?  And is it a good idea if every pass that needs 
>> to do
>> something with the exec charset chars caches its own results of the
>> langhook?
> 
> The biggest overhead is calling lang_hooks.to_target_charset
> to initialize each character in the source set.  That could
> be avoided if there were a way to determine in the middle-end
> whether the input and target charsets are the same, but I don't
> see one.  Alternatively, it could be optimized by converting
> all the characters in one shot as a string, but without adding
> a new target hook to do that I don't see how to do that either.
> It might be a useful enhancement but given the scope it feels
> like it should be done independently of this change.  But if
> you know of a solution that escaped me please let me know.
> 
> The overhead of the additional processing should be negligible
> irrespective of whether the charsets are different or the same
> (all it entails is indexing into a table).
So the initialization could be done once per translation unit rather 
than once per function -- assuming the target character set doesn't 
change within a translation unit.

That seems like it ought to be easy.

The table-lookup seems like a reasonable cost and I don't think we 
should litter the code with conditionals to conditionally lookup based 
on whether or not the charsets are the same.

> 
> I agree that sharing data would be a good thing but as it is,
> the little that can be be shared already is (the target_percent
> character with builtins.c).  The rest of it (i.e., the mapping)
> is only being used by gimple-ssa-sprintf.c.
If we ever get to a point where we need this level of mapping elsewhere, 
we can always pull the code out of gimple-ssa-sprintf into a more 
generic location.  I don't think we need to over-engineering sharing 
into this right now.


> 
> If/when -Wformat is ever enhanced to handle -fexec-charset, or
> if another area needs to, then implementinng some more general
> would be worthwhile.
Right.

> 
> Attached is an updated patch with just the overflow handling
> suggested by Joseph.
> 
> Martin
> 
> 
> gcc-80523.diff
> 
> 
> PR tree-optimization/80523 -  -Wformat-overflow doesn't consider -fexec-charset
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/80523
> 	* gimple-ssa-sprintf.c (target_to_host_charmap): New global variable.
> 	(init_target_to_host_charmap, target_to_host, target_strtol10): New
> 	functions.
> 	(maybe_warn, format_directive, parse_directive): Use new functions.
> 	(pass_sprintf_length::execute): Call init_target_to_host_charmap.	
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/80523
> 	* gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: New test.
> 
> Index: gcc/gimple-ssa-sprintf.c
> ===================================================================
> --- gcc/gimple-ssa-sprintf.c	(revision 247263)
> +++ gcc/gimple-ssa-sprintf.c	(working copy)
> +/* Return a string consisting of charavters in the host source character
s/charavters/characters/


> +   set corresponding to the string TARGSTR consisting of characters in
> +   the execution character set.  */
> +
> +static const char*
> +target_to_host (const char *targstr)
> +{
> +  /* The interesting subset of source and execution characters are
> +     the same so no conversion is necessary.  */
> +  if (target_to_host_charmap['\0'] == 1)
> +    return targstr;
> +
> +  /* Convert the initial substring of TARGSTR to the corresponding
> +     characters in the host set, appending "..." if TARGSTR is too
> +     long to fit.  Using the static buffer assumes the function is
> +     not called in between sequence points (which it isn't).  */
> +  static char hostr[32];
> +  for (char *ph = hostr; ; ++targstr)
> +    {
> +      *ph++ = target_to_host (*targstr);
> +      if (!*targstr)
> +	break;
> +
> +      if (ph - hostr == sizeof hostr - 4)
> +	{
> +	  *ph = '\0';
> +	  strcat (ph, "...");
> +	  break;
> +	}
> +    }
> +
> +  return hostr;
Ewww.  I guess the alternative would be something like:

Expand the return value to include a indicator of whether or not the 
original string was returned (common case) or if a new string was 
returned and thus needs to be deallocated by the caller.

That's probably pretty unpleasant given we don't have a central place 
where we call target_to_host, so the caller side would be much uglier.

Are there any downstream impacts when the string is too long to covert 
other than not warning for things which were unconverted?

Jeff


  reply	other threads:[~2017-04-28 16:22 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26 22:27 Martin Sebor
2017-04-27  3:39 ` Joseph Myers
2017-04-27  5:07   ` Jakub Jelinek
2017-04-27 22:52     ` Martin Sebor
2017-04-28 16:35       ` Jeff Law [this message]
2017-04-28 16:37         ` Jakub Jelinek
2017-04-28 17:05           ` Jeff Law
2017-04-28 18:32         ` Martin Sebor
2017-04-28 19:14           ` Jeff Law
2017-04-29 20:44           ` Andreas Schwab
2017-05-03 14:35             ` Christophe Lyon
2017-05-03 15:02               ` Martin Sebor
2017-05-03 15:24                 ` Christophe Lyon
2017-06-02 15:38                   ` Renlin Li
2017-06-04 22:24                     ` Martin Sebor
2017-06-13  8:16                       ` Renlin Li
2017-06-20 11:00                         ` Renlin Li
2018-01-31 17:57                           ` Renlin Li
2018-02-01  0:41                             ` Martin Sebor
2018-02-01 11:54                               ` Renlin Li
2018-02-01 18:27                                 ` Martin Sebor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d85393f5-bbf5-4898-579a-7b9c2ba8a624@redhat.com \
    --to=law@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=msebor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).