From: Jeff Law <law@redhat.com>
To: Martin Sebor <msebor@gmail.com>, Jakub Jelinek <jakub@redhat.com>,
Joseph Myers <joseph@codesourcery.com>
Cc: Gcc Patch List <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
Date: Fri, 28 Apr 2017 16:35:00 -0000 [thread overview]
Message-ID: <d85393f5-bbf5-4898-579a-7b9c2ba8a624@redhat.com> (raw)
In-Reply-To: <7b8f6999-942c-1540-c6e6-a6284d57baf5@gmail.com>
On 04/27/2017 03:05 PM, Martin Sebor wrote:
> On 04/26/2017 04:34 PM, Jakub Jelinek wrote:
>>
>> Also, can't there be a way to shortcut all this processing if the
>> charsets are the same? And is it a good idea if every pass that needs
>> to do
>> something with the exec charset chars caches its own results of the
>> langhook?
>
> The biggest overhead is calling lang_hooks.to_target_charset
> to initialize each character in the source set. That could
> be avoided if there were a way to determine in the middle-end
> whether the input and target charsets are the same, but I don't
> see one. Alternatively, it could be optimized by converting
> all the characters in one shot as a string, but without adding
> a new target hook to do that I don't see how to do that either.
> It might be a useful enhancement but given the scope it feels
> like it should be done independently of this change. But if
> you know of a solution that escaped me please let me know.
>
> The overhead of the additional processing should be negligible
> irrespective of whether the charsets are different or the same
> (all it entails is indexing into a table).
So the initialization could be done once per translation unit rather
than once per function -- assuming the target character set doesn't
change within a translation unit.
That seems like it ought to be easy.
The table-lookup seems like a reasonable cost and I don't think we
should litter the code with conditionals to conditionally lookup based
on whether or not the charsets are the same.
>
> I agree that sharing data would be a good thing but as it is,
> the little that can be be shared already is (the target_percent
> character with builtins.c). The rest of it (i.e., the mapping)
> is only being used by gimple-ssa-sprintf.c.
If we ever get to a point where we need this level of mapping elsewhere,
we can always pull the code out of gimple-ssa-sprintf into a more
generic location. I don't think we need to over-engineering sharing
into this right now.
>
> If/when -Wformat is ever enhanced to handle -fexec-charset, or
> if another area needs to, then implementinng some more general
> would be worthwhile.
Right.
>
> Attached is an updated patch with just the overflow handling
> suggested by Joseph.
>
> Martin
>
>
> gcc-80523.diff
>
>
> PR tree-optimization/80523 - -Wformat-overflow doesn't consider -fexec-charset
>
> gcc/ChangeLog:
>
> PR tree-optimization/80523
> * gimple-ssa-sprintf.c (target_to_host_charmap): New global variable.
> (init_target_to_host_charmap, target_to_host, target_strtol10): New
> functions.
> (maybe_warn, format_directive, parse_directive): Use new functions.
> (pass_sprintf_length::execute): Call init_target_to_host_charmap.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/80523
> * gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: New test.
>
> Index: gcc/gimple-ssa-sprintf.c
> ===================================================================
> --- gcc/gimple-ssa-sprintf.c (revision 247263)
> +++ gcc/gimple-ssa-sprintf.c (working copy)
> +/* Return a string consisting of charavters in the host source character
s/charavters/characters/
> + set corresponding to the string TARGSTR consisting of characters in
> + the execution character set. */
> +
> +static const char*
> +target_to_host (const char *targstr)
> +{
> + /* The interesting subset of source and execution characters are
> + the same so no conversion is necessary. */
> + if (target_to_host_charmap['\0'] == 1)
> + return targstr;
> +
> + /* Convert the initial substring of TARGSTR to the corresponding
> + characters in the host set, appending "..." if TARGSTR is too
> + long to fit. Using the static buffer assumes the function is
> + not called in between sequence points (which it isn't). */
> + static char hostr[32];
> + for (char *ph = hostr; ; ++targstr)
> + {
> + *ph++ = target_to_host (*targstr);
> + if (!*targstr)
> + break;
> +
> + if (ph - hostr == sizeof hostr - 4)
> + {
> + *ph = '\0';
> + strcat (ph, "...");
> + break;
> + }
> + }
> +
> + return hostr;
Ewww. I guess the alternative would be something like:
Expand the return value to include a indicator of whether or not the
original string was returned (common case) or if a new string was
returned and thus needs to be deallocated by the caller.
That's probably pretty unpleasant given we don't have a central place
where we call target_to_host, so the caller side would be much uglier.
Are there any downstream impacts when the string is too long to covert
other than not warning for things which were unconverted?
Jeff
next prev parent reply other threads:[~2017-04-28 16:22 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-26 22:27 Martin Sebor
2017-04-27 3:39 ` Joseph Myers
2017-04-27 5:07 ` Jakub Jelinek
2017-04-27 22:52 ` Martin Sebor
2017-04-28 16:35 ` Jeff Law [this message]
2017-04-28 16:37 ` Jakub Jelinek
2017-04-28 17:05 ` Jeff Law
2017-04-28 18:32 ` Martin Sebor
2017-04-28 19:14 ` Jeff Law
2017-04-29 20:44 ` Andreas Schwab
2017-05-03 14:35 ` Christophe Lyon
2017-05-03 15:02 ` Martin Sebor
2017-05-03 15:24 ` Christophe Lyon
2017-06-02 15:38 ` Renlin Li
2017-06-04 22:24 ` Martin Sebor
2017-06-13 8:16 ` Renlin Li
2017-06-20 11:00 ` Renlin Li
2018-01-31 17:57 ` Renlin Li
2018-02-01 0:41 ` Martin Sebor
2018-02-01 11:54 ` Renlin Li
2018-02-01 18:27 ` Martin Sebor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d85393f5-bbf5-4898-579a-7b9c2ba8a624@redhat.com \
--to=law@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=joseph@codesourcery.com \
--cc=msebor@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).