* [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
@ 2017-04-26 22:27 Martin Sebor
2017-04-27 3:39 ` Joseph Myers
0 siblings, 1 reply; 21+ messages in thread
From: Martin Sebor @ 2017-04-26 22:27 UTC (permalink / raw)
To: Gcc Patch List
[-- Attachment #1: Type: text/plain, Size: 703 bytes --]
Testing my solution for bug 77671 (missing -Wformat-overflow
sprintf with "%s") caused a regression in the charset/builtin2.c
test for bug 25120 (builtin *printf handlers are confused by
-fexec-charset). That led me to realize that like -Wformat
itself, the whole gimple-ssa-sprintf pass is oblivious to
potential differences between the source character set on
the host and the execution character set on the target. As
a result, when the host and target sets are different, the
pass misinterprets ordinary format characters as special
(e.g., parts of directives) and vice versa.
The attached patch implements a simple solution to this problem
by introducing a mapping between the two sets.
Martin
[-- Attachment #2: gcc-80523.diff --]
[-- Type: text/x-patch, Size: 20055 bytes --]
PR tree-optimization/80523 - -Wformat-overflow doesn't consider -fexec-charset
gcc/ChangeLog:
PR tree-optimization/80523
* gimple-ssa-sprintf.c (target_to_host_charmap): New global variable.
(init_target_to_host_charmap, target_to_host, target_strtol10): New
functions.
(maybe_warn, format_directive, parse_directive): Use new functions.
(pass_sprintf_length::execute): Call init_target_to_host_charmap.
gcc/testsuite/ChangeLog:
PR tree-optimization/80523
* gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: New test.
Index: gcc/gimple-ssa-sprintf.c
===================================================================
--- gcc/gimple-ssa-sprintf.c (revision 247263)
+++ gcc/gimple-ssa-sprintf.c (working copy)
@@ -66,6 +66,7 @@ along with GCC; see the file COPYING3. If not see
#include "calls.h"
#include "cfgloop.h"
#include "intl.h"
+#include "langhooks.h"
#include "builtins.h"
#include "stor-layout.h"
@@ -273,6 +274,118 @@ target_size_max ()
return tree_to_uhwi (TYPE_MAX_VALUE (size_type_node));
}
+/* A straightforward mapping from the execution character set to the host
+ character set indexed by execution character. */
+
+static char target_to_host_charmap[256];
+
+/* Initialize a mapping from the execution character set to the host
+ character set. */
+
+static bool
+init_target_to_host_charmap ()
+{
+ if (!init_target_chars ())
+ return false;
+
+ /* The subset of the source character set used by printf conversion
+ specifications (strictly speaking, not all letters are used but
+ they are included here for the sake of simplicity). The dollar
+ sign must be included even though it's not in the basic source
+ character set. */
+ const char srcset[] = " ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+ "abcdefghijklmnopqrstuvwxyz0123456789#%'*+-.$";
+
+ /* Set the mapping for all characters to some ordinary value (i,e.,
+ not one used in printf conversion specifications) and overwrite
+ those that are used by conversion specifications with their
+ corresponding values. */
+ memset (target_to_host_charmap + 1, '?', sizeof target_to_host_charmap - 1);
+
+ for (const char *pc = srcset; *pc; ++pc)
+ {
+ /* Slice off the high end bits in case target characters are
+ signed. All values are expected to be non-nul, otherwise
+ there's a problem. */
+ if (unsigned char tc = lang_hooks.to_target_charset (*pc))
+ target_to_host_charmap[tc] = *pc;
+ else
+ return false;
+ }
+
+ return true;
+}
+
+/* Return the host source character corresponding to the character
+ CH in the execution character set if one exists, or some innocuous
+ (non-special, non-nul) source character otherwise. */
+
+static inline int
+target_to_host (unsigned char ch)
+{
+ return target_to_host_charmap[ch];
+}
+
+/* Return a string consisting of characters in the host source character
+ set corresponding to the string TARGSTR consisting of characters in
+ the execution character set. */
+
+static const char*
+target_to_host (const char *targstr)
+{
+ if (target_to_host ('%') == '%')
+ return targstr;
+
+ static char hostr[32];
+ for (char *ph = hostr; *targstr; ++targstr)
+ {
+ *ph++ = target_to_host (*targstr);
+ if (!*targstr)
+ break;
+
+ if (ph - hostr == sizeof hostr - 4)
+ {
+ *ph = '\0';
+ strcat (ph, "...");
+ break;
+ }
+ }
+
+ return hostr;
+}
+
+/* Convert the sequence of decimal digits in the execution character
+ starting at S to a long, just like strtol does. Return the result
+ and set *END to one past the last converted character. */
+
+static inline long
+target_strtol10 (const char *s, char** end)
+{
+ /* As an optimization use the host strtol if the digit zero is the same
+ in the host and target character sets. */
+ if ('0' == target_to_host ('0'))
+ return strtol (s, end, 10);
+
+ long val = 0;
+ for ( ; ; )
+ {
+ unsigned char c = target_to_host (*s);
+ if (ISDIGIT (c))
+ {
+ val *= 10;
+ val += c - '0';
+ }
+ else
+ break;
+
+ ++s;
+ }
+
+ *end = const_cast<char*>(s);
+
+ return val;
+}
+
/* Return the constant initial value of DECL if available or DECL
otherwise. Same as the synonymous function in c/c-typeck.c. */
@@ -2289,7 +2402,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
/* The size of the destination region is exact. */
unsigned HOST_WIDE_INT navail = avail_range.max;
- if (*dir.beg != '%')
+ if (target_to_host (*dir.beg) != '%')
{
/* For plain character directives (i.e., the format string itself)
but not others, point the caret at the first character that's
@@ -2340,7 +2453,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size %wu")));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg, res.min,
+ dir.len, target_to_host (dir.beg), res.min,
navail);
}
@@ -2357,7 +2470,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.max, navail);
}
@@ -2377,7 +2490,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.likely, navail);
}
@@ -2394,7 +2507,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"%wu bytes into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.min, res.max,
navail);
}
@@ -2410,13 +2523,13 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.min, navail);
}
/* The size of the destination region is a range. */
- if (*dir.beg != '%')
+ if (target_to_host (*dir.beg) != '%')
{
unsigned HOST_WIDE_INT navail = avail_range.max;
@@ -2469,7 +2582,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg, res.min,
+ dir.len, target_to_host (dir.beg), res.min,
avail_range.min, avail_range.max);
}
@@ -2488,7 +2601,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg, res.max,
+ dir.len, target_to_host (dir.beg), res.max,
avail_range.min, avail_range.max);
}
@@ -2510,7 +2623,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg, res.likely,
+ dir.len, target_to_host (dir.beg), res.likely,
avail_range.min, avail_range.max);
}
@@ -2529,7 +2642,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"%wu bytes into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.min, res.max,
avail_range.min, avail_range.max);
}
@@ -2547,7 +2660,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.min,
avail_range.min, avail_range.max);
}
@@ -2636,7 +2749,7 @@ format_directive (const pass_sprintf_length::call_
{
fmtwarn (dirloc, pargrange, NULL, info.warnopt (),
"%<%.*s%> directive argument is null",
- dirlen, dir.beg);
+ dirlen, target_to_host (dir.beg));
/* Don't bother processing the rest of the format string. */
res->warned = true;
@@ -2703,7 +2816,7 @@ format_directive (const pass_sprintf_length::call_
info.warnopt (),
"%<%.*s%> directive output of %wu bytes exceeds "
"minimum required size of 4095",
- dirlen, dir.beg, fmtres.range.min);
+ dirlen, target_to_host (dir.beg), fmtres.range.min);
else
{
const char *fmtstr
@@ -2715,7 +2828,7 @@ format_directive (const pass_sprintf_length::call_
warned = fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dirlen, dir.beg,
+ dirlen, target_to_host (dir.beg),
fmtres.range.min, fmtres.range.max);
}
}
@@ -2744,7 +2857,7 @@ format_directive (const pass_sprintf_length::call_
warned = fmtwarn (dirloc, pargrange, NULL, info.warnopt (),
"%<%.*s%> directive output of %wu bytes causes "
"result to exceed %<INT_MAX%>",
- dirlen, dir.beg, fmtres.range.min);
+ dirlen, target_to_host (dir.beg), fmtres.range.min);
else
{
const char *fmtstr
@@ -2755,7 +2868,7 @@ format_directive (const pass_sprintf_length::call_
"bytes may cause result to exceed %<INT_MAX%>"));
warned = fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dirlen, dir.beg,
+ dirlen, target_to_host (dir.beg),
fmtres.range.min, fmtres.range.max);
}
}
@@ -2847,7 +2960,7 @@ parse_directive (pass_sprintf_length::call_info &i
directive &dir, format_result *res,
const char *str, unsigned *argno)
{
- const char *pcnt = strchr (str, '%');
+ const char *pcnt = strchr (str, target_percent);
dir.beg = str;
if (size_t len = pcnt ? pcnt - str : *str ? strlen (str) : 1)
@@ -2889,7 +3002,7 @@ parse_directive (pass_sprintf_length::call_info &i
For vararg functions set to void_node. */
tree star_precision = NULL_TREE;
- if (ISDIGIT (*pf))
+ if (ISDIGIT (target_to_host (*pf)))
{
/* This could be either a POSIX positional argument, the '0'
flag, or a width, depending on what follows. Store it as
@@ -2896,10 +3009,10 @@ parse_directive (pass_sprintf_length::call_info &i
width and sort it out later after the next character has
been seen. */
char *end;
- width = strtol (pf, &end, 10);
+ width = target_strtol10 (pf, &end);
pf = end;
}
- else if ('*' == *pf)
+ else if (target_to_host (*pf) == '*')
{
/* Similarly to the block above, this could be either a POSIX
positional argument or a width, depending on what follows. */
@@ -2910,7 +3023,7 @@ parse_directive (pass_sprintf_length::call_info &i
++pf;
}
- if (*pf == '$')
+ if (target_to_host (*pf) == '$')
{
/* Handle the POSIX dollar sign which references the 1-based
positional argument number. */
@@ -2959,7 +3072,7 @@ parse_directive (pass_sprintf_length::call_info &i
the next field is the optional flags followed by an optional
width. */
for ( ; ; ) {
- switch (*pf)
+ switch (target_to_host (*pf))
{
case ' ':
case '0':
@@ -2966,7 +3079,7 @@ parse_directive (pass_sprintf_length::call_info &i
case '+':
case '-':
case '#':
- dir.set_flag (*pf++);
+ dir.set_flag (target_to_host (*pf++));
break;
default:
@@ -2975,13 +3088,13 @@ parse_directive (pass_sprintf_length::call_info &i
}
start_width:
- if (ISDIGIT (*pf))
+ if (ISDIGIT (target_to_host (*pf)))
{
char *end;
- width = strtol (pf, &end, 10);
+ width = target_strtol10 (pf, &end);
pf = end;
}
- else if ('*' == *pf)
+ else if (target_to_host (*pf) == '*')
{
if (*argno < gimple_call_num_args (info.callstmt))
star_width = gimple_call_arg (info.callstmt, (*argno)++);
@@ -2993,7 +3106,7 @@ parse_directive (pass_sprintf_length::call_info &i
}
++pf;
}
- else if ('\'' == *pf)
+ else if (target_to_host (*pf) == '\'')
{
/* The POSIX apostrophe indicating a numeric grouping
in the current locale. Even though it's possible to
@@ -3005,17 +3118,17 @@ parse_directive (pass_sprintf_length::call_info &i
}
start_precision:
- if ('.' == *pf)
+ if (target_to_host (*pf) == '.')
{
++pf;
- if (ISDIGIT (*pf))
+ if (ISDIGIT (target_to_host (*pf)))
{
char *end;
- precision = strtol (pf, &end, 10);
+ precision = target_strtol10 (pf, &end);
pf = end;
}
- else if ('*' == *pf)
+ else if (target_to_host (*pf) == '*')
{
if (*argno < gimple_call_num_args (info.callstmt))
star_precision = gimple_call_arg (info.callstmt, (*argno)++);
@@ -3035,10 +3148,10 @@ parse_directive (pass_sprintf_length::call_info &i
}
}
- switch (*pf)
+ switch (target_to_host (*pf))
{
case 'h':
- if (pf[1] == 'h')
+ if (target_to_host (pf[1]) == 'h')
{
++pf;
dir.modifier = FMT_LEN_hh;
@@ -3059,7 +3172,7 @@ parse_directive (pass_sprintf_length::call_info &i
break;
case 'l':
- if (pf[1] == 'l')
+ if (target_to_host (pf[1]) == 'l')
{
++pf;
dir.modifier = FMT_LEN_ll;
@@ -3080,7 +3193,7 @@ parse_directive (pass_sprintf_length::call_info &i
break;
}
- switch (*pf)
+ switch (target_to_host (*pf))
{
/* Handle a sole '%' character the same as "%%" but since it's
undefined prevent the result from being folded. */
@@ -3141,7 +3254,7 @@ parse_directive (pass_sprintf_length::call_info &i
return 0;
}
- dir.specifier = *pf++;
+ dir.specifier = target_to_host (*pf++);
if (star_width)
{
@@ -3708,6 +3821,9 @@ pass_sprintf_length::handle_gimple_call (gimple_st
unsigned int
pass_sprintf_length::execute (function *fun)
{
+ if (!target_to_host_charmap ['%'])
+ init_target_to_host_charmap ();
+
basic_block bb;
FOR_EACH_BB_FN (bb, fun)
{
Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (working copy)
@@ -0,0 +1,110 @@
+/* PR tree-optimization/80523 - -Wformat-overflow doesn't consider
+ -fexec-charset
+ { dg-do compile }
+ { dg-require-iconv "IBM1047" }
+ { dg-options "-O2 -Wall -Wno-format -Wformat-overflow -fexec-charset=IBM1047 -ftrack-macro-expansion=0" } */
+
+char buf[1];
+void sink (void*);
+
+#define T(...) (__builtin_sprintf (buf + 1, __VA_ARGS__), sink (buf))
+
+/* Exercise all special C and POSIX characters. */
+
+void test_characters ()
+{
+ T ("%%"); /* { dg-warning ".%%. directive writing 1 byte" } */
+
+ T ("%A", 0.0); /* { dg-warning ".%A. directive writing between 6 and 20 " } */
+ T ("%a", 0.0); /* { dg-warning ".%a. directive writing between 6 and 20 " } */
+
+ T ("%C", 'a'); /* { dg-warning ".%C. directive writing 1 byte" "bug 80537" { xfail *-*-* } } */
+ T ("%c", 'a'); /* { dg-warning ".%c. directive writing 1 byte" } */
+
+ T ("%d", 12); /* { dg-warning ".%d. directive writing 2 bytes" } */
+ T ("% d", 12); /* { dg-warning ".% d. directive writing 3 bytes" } */
+ T ("%-d", 123); /* { dg-warning ".%-d. directive writing 3 bytes" } */
+ T ("%+d", 1234); /* { dg-warning ".%\\+d. directive writing 5 bytes" } */
+ T ("%'d", 1234); /* { dg-warning ".%'d. directive writing 5 bytes" "bug 80535" { xfail *-*-* } } */
+ T ("%1$d", 2345); /* { dg-warning ".%1\\\$d. directive writing 4 bytes" } */
+
+ /* Verify that digits are correctly interpreted as width and precision. */
+ T ("%0d", 12345); /* { dg-warning ".%0d. directive writing 5 bytes" } */
+ T ("%1d", 12345); /* { dg-warning ".%1d. directive writing 5 bytes" } */
+ T ("%2d", 12345); /* { dg-warning ".%2d. directive writing 5 bytes" } */
+ T ("%3d", 12345); /* { dg-warning ".%3d. directive writing 5 bytes" } */
+ T ("%4d", 12345); /* { dg-warning ".%4d. directive writing 5 bytes" } */
+ T ("%5d", 12345); /* { dg-warning ".%5d. directive writing 5 bytes" } */
+ T ("%6d", 12345); /* { dg-warning ".%6d. directive writing 6 bytes" } */
+ T ("%7d", 12345); /* { dg-warning ".%7d. directive writing 7 bytes" } */
+ T ("%8d", 12345); /* { dg-warning ".%8d. directive writing 8 bytes" } */
+ T ("%9d", 12345); /* { dg-warning ".%9d. directive writing 9 bytes" } */
+
+ T ("%.0d", 12345); /* { dg-warning ".%.0d. directive writing 5 bytes" } */
+ T ("%.1d", 12345); /* { dg-warning ".%.1d. directive writing 5 bytes" } */
+ T ("%.2d", 12345); /* { dg-warning ".%.2d. directive writing 5 bytes" } */
+ T ("%.3d", 12345); /* { dg-warning ".%.3d. directive writing 5 bytes" } */
+ T ("%.4d", 12345); /* { dg-warning ".%.4d. directive writing 5 bytes" } */
+ T ("%.5d", 12345); /* { dg-warning ".%.5d. directive writing 5 bytes" } */
+ T ("%.6d", 12345); /* { dg-warning ".%.6d. directive writing 6 bytes" } */
+ T ("%.7d", 12345); /* { dg-warning ".%.7d. directive writing 7 bytes" } */
+ T ("%.8d", 12345); /* { dg-warning ".%.8d. directive writing 8 bytes" } */
+ T ("%.9d", 12345); /* { dg-warning ".%.9d. directive writing 9 bytes" } */
+
+ T ("%hhd", 12); /* { dg-warning ".%hhd. directive writing 2 bytes" } */
+ T ("%hd", 234); /* { dg-warning ".%hd. directive writing 3 bytes" } */
+
+ {
+ const __PTRDIFF_TYPE__ i = 3456;
+ T ("%jd", i); /* { dg-warning ".%jd. directive writing 4 bytes" } */
+ }
+
+ T ("%ld", 45678L); /* { dg-warning ".%ld. directive writing 5 bytes" } */
+
+ {
+ const __PTRDIFF_TYPE__ i = 56789;
+ T ("%td", i); /* { dg-warning ".%td. directive writing 5 bytes" } */
+ }
+
+ {
+ const __SIZE_TYPE__ i = 67890;
+ T ("%zd", i); /* { dg-warning ".%zd. directive writing 5 bytes" } */
+ }
+
+ T ("%E", 0.0); /* { dg-warning ".%E. directive writing 12 bytes" } */
+ T ("%e", 0.0); /* { dg-warning ".%e. directive writing 12 bytes" } */
+ T ("%F", 0.0); /* { dg-warning ".%F. directive writing 8 bytes" } */
+ T ("%f", 0.0); /* { dg-warning ".%f. directive writing 8 bytes" } */
+ T ("%G", 0.0); /* { dg-warning ".%G. directive writing 1 byte" } */
+ T ("%g", 0.0); /* { dg-warning ".%g. directive writing 1 byte" } */
+
+ T ("%i", 123); /* { dg-warning ".%i. directive writing 3 bytes" } */
+
+ {
+ int n;
+
+ T ("%n", &n); /* { dg-warning "writing a terminating nul" } */
+ T ("%nH", &n); /* { dg-warning ".H. directive writing 1 byte" } */
+ }
+
+ T ("%o", 999); /* { dg-warning ".%o. directive writing 4 bytes" } */
+ T ("%#o", 999); /* { dg-warning ".%#o. directive writing 5 bytes" } */
+
+ T ("%x", 1234); /* { dg-warning ".%x. directive writing 3 bytes" } */
+ T ("%#X", 1235); /* { dg-warning ".%#X. directive writing 5 bytes" } */
+
+ T ("%S", L"1"); /* { dg-warning ".%S. directive writing 1 byte" } */
+ T ("%-s", "1"); /* { dg-warning ".%-s. directive writing 1 byte" } */
+
+ /* Verify that characters in the source character set appear in
+ the text of the warning unchanged (i.e., not as their equivalents
+ in the execution character set on the target). The trailing %%
+ disables sprintf->strcpy optimization. */
+ T ("ABCDEFGHIJ%%"); /* { dg-warning ".ABCDEFGHIJ. directive writing 10 bytes" } */
+ T ("KLMNOPQRST%%"); /* { dg-warning ".KLMNOPQRST. directive writing 10 bytes" } */
+ T ("UVWXYZ%%"); /* { dg-warning ".UVWXYZ. directive writing 6 bytes" } */
+
+ T ("abcdefghij%%"); /* { dg-warning ".abcdefghij. directive writing 10 bytes" } */
+ T ("klmnopqrst%%"); /* { dg-warning ".klmnopqrst. directive writing 10 bytes" } */
+ T ("uvwxyz%%"); /* { dg-warning ".uvwxyz. directive writing 6 bytes" } */
+}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-04-26 22:27 [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503) Martin Sebor
@ 2017-04-27 3:39 ` Joseph Myers
2017-04-27 5:07 ` Jakub Jelinek
0 siblings, 1 reply; 21+ messages in thread
From: Joseph Myers @ 2017-04-27 3:39 UTC (permalink / raw)
To: Martin Sebor; +Cc: Gcc Patch List
On Wed, 26 Apr 2017, Martin Sebor wrote:
> Testing my solution for bug 77671 (missing -Wformat-overflow
> sprintf with "%s") caused a regression in the charset/builtin2.c
> test for bug 25120 (builtin *printf handlers are confused by
> -fexec-charset). That led me to realize that like -Wformat
> itself, the whole gimple-ssa-sprintf pass is oblivious to
> potential differences between the source character set on
> the host and the execution character set on the target. As
> a result, when the host and target sets are different, the
> pass misinterprets ordinary format characters as special
> (e.g., parts of directives) and vice versa.
>
> The attached patch implements a simple solution to this problem
> by introducing a mapping between the two sets.
target_strtol10 appears to do no checking for overflow, which I'd expect
would result in nonsensical results for large width values overflowing
host long (whereas strtol would reliably return LONG_MAX in such cases).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-04-27 3:39 ` Joseph Myers
@ 2017-04-27 5:07 ` Jakub Jelinek
2017-04-27 22:52 ` Martin Sebor
0 siblings, 1 reply; 21+ messages in thread
From: Jakub Jelinek @ 2017-04-27 5:07 UTC (permalink / raw)
To: Joseph Myers; +Cc: Martin Sebor, Gcc Patch List
On Wed, Apr 26, 2017 at 10:26:56PM +0000, Joseph Myers wrote:
> On Wed, 26 Apr 2017, Martin Sebor wrote:
>
> > Testing my solution for bug 77671 (missing -Wformat-overflow
> > sprintf with "%s") caused a regression in the charset/builtin2.c
> > test for bug 25120 (builtin *printf handlers are confused by
> > -fexec-charset). That led me to realize that like -Wformat
> > itself, the whole gimple-ssa-sprintf pass is oblivious to
> > potential differences between the source character set on
> > the host and the execution character set on the target. As
> > a result, when the host and target sets are different, the
> > pass misinterprets ordinary format characters as special
> > (e.g., parts of directives) and vice versa.
> >
> > The attached patch implements a simple solution to this problem
> > by introducing a mapping between the two sets.
>
> target_strtol10 appears to do no checking for overflow, which I'd expect
> would result in nonsensical results for large width values overflowing
> host long (whereas strtol would reliably return LONG_MAX in such cases).
Also, can't there be a way to shortcut all this processing if the
charsets are the same? And is it a good idea if every pass that needs to do
something with the exec charset chars caches its own results of the
langhook?
Jakub
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-04-27 5:07 ` Jakub Jelinek
@ 2017-04-27 22:52 ` Martin Sebor
2017-04-28 16:35 ` Jeff Law
0 siblings, 1 reply; 21+ messages in thread
From: Martin Sebor @ 2017-04-27 22:52 UTC (permalink / raw)
To: Jakub Jelinek, Joseph Myers; +Cc: Gcc Patch List
[-- Attachment #1: Type: text/plain, Size: 2725 bytes --]
On 04/26/2017 04:34 PM, Jakub Jelinek wrote:
> On Wed, Apr 26, 2017 at 10:26:56PM +0000, Joseph Myers wrote:
>> On Wed, 26 Apr 2017, Martin Sebor wrote:
>>
>>> Testing my solution for bug 77671 (missing -Wformat-overflow
>>> sprintf with "%s") caused a regression in the charset/builtin2.c
>>> test for bug 25120 (builtin *printf handlers are confused by
>>> -fexec-charset). That led me to realize that like -Wformat
>>> itself, the whole gimple-ssa-sprintf pass is oblivious to
>>> potential differences between the source character set on
>>> the host and the execution character set on the target. As
>>> a result, when the host and target sets are different, the
>>> pass misinterprets ordinary format characters as special
>>> (e.g., parts of directives) and vice versa.
>>>
>>> The attached patch implements a simple solution to this problem
>>> by introducing a mapping between the two sets.
>>
>> target_strtol10 appears to do no checking for overflow, which I'd expect
>> would result in nonsensical results for large width values overflowing
>> host long (whereas strtol would reliably return LONG_MAX in such cases).
Thanks, good catch! I've added overflow detection (along with
a warning) in the updated version.
>
> Also, can't there be a way to shortcut all this processing if the
> charsets are the same? And is it a good idea if every pass that needs to do
> something with the exec charset chars caches its own results of the
> langhook?
The biggest overhead is calling lang_hooks.to_target_charset
to initialize each character in the source set. That could
be avoided if there were a way to determine in the middle-end
whether the input and target charsets are the same, but I don't
see one. Alternatively, it could be optimized by converting
all the characters in one shot as a string, but without adding
a new target hook to do that I don't see how to do that either.
It might be a useful enhancement but given the scope it feels
like it should be done independently of this change. But if
you know of a solution that escaped me please let me know.
The overhead of the additional processing should be negligible
irrespective of whether the charsets are different or the same
(all it entails is indexing into a table).
I agree that sharing data would be a good thing but as it is,
the little that can be be shared already is (the target_percent
character with builtins.c). The rest of it (i.e., the mapping)
is only being used by gimple-ssa-sprintf.c.
If/when -Wformat is ever enhanced to handle -fexec-charset, or
if another area needs to, then implementinng some more general
would be worthwhile.
Attached is an updated patch with just the overflow handling
suggested by Joseph.
Martin
[-- Attachment #2: gcc-80523.diff --]
[-- Type: text/x-patch, Size: 25294 bytes --]
PR tree-optimization/80523 - -Wformat-overflow doesn't consider -fexec-charset
gcc/ChangeLog:
PR tree-optimization/80523
* gimple-ssa-sprintf.c (target_to_host_charmap): New global variable.
(init_target_to_host_charmap, target_to_host, target_strtol10): New
functions.
(maybe_warn, format_directive, parse_directive): Use new functions.
(pass_sprintf_length::execute): Call init_target_to_host_charmap.
gcc/testsuite/ChangeLog:
PR tree-optimization/80523
* gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: New test.
Index: gcc/gimple-ssa-sprintf.c
===================================================================
--- gcc/gimple-ssa-sprintf.c (revision 247263)
+++ gcc/gimple-ssa-sprintf.c (working copy)
@@ -66,6 +66,7 @@ along with GCC; see the file COPYING3. If not see
#include "calls.h"
#include "cfgloop.h"
#include "intl.h"
+#include "langhooks.h"
#include "builtins.h"
#include "stor-layout.h"
@@ -273,6 +274,143 @@ target_size_max ()
return tree_to_uhwi (TYPE_MAX_VALUE (size_type_node));
}
+/* A straightforward mapping from the execution character set to the host
+ character set indexed by execution character. */
+
+static char target_to_host_charmap[256];
+
+/* Initialize a mapping from the execution character set to the host
+ character set. */
+
+static bool
+init_target_to_host_charmap ()
+{
+ if (!init_target_chars ())
+ return false;
+
+ /* The subset of the source character set used by printf conversion
+ specifications (strictly speaking, not all letters are used but
+ they are included here for the sake of simplicity). The dollar
+ sign must be included even though it's not in the basic source
+ character set. */
+ const char srcset[] = " 0123456789!\"#%&'()*+,-./:;<=>?[\\]^_{|}~$"
+ "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
+
+ /* Set the mapping for all characters to some ordinary value (i,e.,
+ not none used in printf conversion specifications) and overwrite
+ those that are used by conversion specifications to their
+ corresponding values. */
+ memset (target_to_host_charmap + 1, '?', sizeof target_to_host_charmap - 1);
+
+ /* Are the two sets of characters the same? */
+ bool all_same_p = true;
+
+ for (const char *pc = srcset; *pc; ++pc)
+ {
+ /* Slice off the high end bits in case target characters are
+ signed. All values are expected to be non-nul, otherwise
+ there's a problem. */
+ if (unsigned char tc = lang_hooks.to_target_charset (*pc))
+ {
+ target_to_host_charmap[tc] = *pc;
+ if (tc != *pc)
+ all_same_p = false;
+ }
+ else
+ return false;
+
+ }
+
+ /* Set the first element to a non-zero value if the mapping
+ is 1-to-1, otherwise leave it clear (NUL is assumed to be
+ the same in both character sets). */
+ target_to_host_charmap[0] = all_same_p;
+
+ return true;
+}
+
+/* Return the host source character corresponding to the character
+ CH in the execution character set if one exists, or some innocuous
+ (non-special, non-nul) source character otherwise. */
+
+static inline unsigned char
+target_to_host (unsigned char ch)
+{
+ return target_to_host_charmap[ch];
+}
+
+/* Return a string consisting of charavters in the host source character
+ set corresponding to the string TARGSTR consisting of characters in
+ the execution character set. */
+
+static const char*
+target_to_host (const char *targstr)
+{
+ /* The interesting subset of source and execution characters are
+ the same so no conversion is necessary. */
+ if (target_to_host_charmap['\0'] == 1)
+ return targstr;
+
+ /* Convert the initial substring of TARGSTR to the corresponding
+ characters in the host set, appending "..." if TARGSTR is too
+ long to fit. Using the static buffer assumes the function is
+ not called in between sequence points (which it isn't). */
+ static char hostr[32];
+ for (char *ph = hostr; ; ++targstr)
+ {
+ *ph++ = target_to_host (*targstr);
+ if (!*targstr)
+ break;
+
+ if (ph - hostr == sizeof hostr - 4)
+ {
+ *ph = '\0';
+ strcat (ph, "...");
+ break;
+ }
+ }
+
+ return hostr;
+}
+
+/* Convert the sequence of decimal digits in the execution character
+ starting at S to a long, just like strtol does. Return the result
+ and set *END to one past the last converted character. On range
+ error set ERANGE to the digit that caused it. */
+
+static inline long
+target_strtol10 (const char **ps, const char **erange)
+{
+ unsigned HOST_WIDE_INT val = 0;
+ for ( ; ; ++*ps)
+ {
+ unsigned char c = target_to_host (**ps);
+ if (ISDIGIT (c))
+ {
+ c -= '0';
+
+ /* Check for overflow. */
+ if (val > (LONG_MAX - c) / 10LU)
+ {
+ val = LONG_MAX;
+ *erange = *ps;
+
+ /* Skip the remaining digits. */
+ do
+ c = target_to_host (*++*ps);
+ while (ISDIGIT (c));
+ break;
+ }
+ else
+ val = val * 10 + c;
+ }
+ else
+ break;
+ }
+
+ return val;
+}
+
/* Return the constant initial value of DECL if available or DECL
otherwise. Same as the synonymous function in c/c-typeck.c. */
@@ -2289,7 +2427,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
/* The size of the destination region is exact. */
unsigned HOST_WIDE_INT navail = avail_range.max;
- if (*dir.beg != '%')
+ if (target_to_host (*dir.beg) != '%')
{
/* For plain character directives (i.e., the format string itself)
but not others, point the caret at the first character that's
@@ -2340,7 +2478,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size %wu")));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg, res.min,
+ dir.len, target_to_host (dir.beg), res.min,
navail);
}
@@ -2357,7 +2495,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.max, navail);
}
@@ -2377,7 +2515,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.likely, navail);
}
@@ -2394,7 +2532,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"%wu bytes into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.min, res.max,
navail);
}
@@ -2410,13 +2548,13 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.min, navail);
}
/* The size of the destination region is a range. */
- if (*dir.beg != '%')
+ if (target_to_host (*dir.beg) != '%')
{
unsigned HOST_WIDE_INT navail = avail_range.max;
@@ -2469,7 +2607,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg, res.min,
+ dir.len, target_to_host (dir.beg), res.min,
avail_range.min, avail_range.max);
}
@@ -2488,7 +2626,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg, res.max,
+ dir.len, target_to_host (dir.beg), res.max,
avail_range.min, avail_range.max);
}
@@ -2510,7 +2648,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg, res.likely,
+ dir.len, target_to_host (dir.beg), res.likely,
avail_range.min, avail_range.max);
}
@@ -2529,7 +2667,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"%wu bytes into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.min, res.max,
avail_range.min, avail_range.max);
}
@@ -2547,7 +2685,7 @@ maybe_warn (substring_loc &dirloc, source_range *p
"into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ dir.len, target_to_host (dir.beg),
res.min,
avail_range.min, avail_range.max);
}
@@ -2636,7 +2774,7 @@ format_directive (const pass_sprintf_length::call_
{
fmtwarn (dirloc, pargrange, NULL, info.warnopt (),
"%<%.*s%> directive argument is null",
- dirlen, dir.beg);
+ dirlen, target_to_host (dir.beg));
/* Don't bother processing the rest of the format string. */
res->warned = true;
@@ -2703,7 +2841,7 @@ format_directive (const pass_sprintf_length::call_
info.warnopt (),
"%<%.*s%> directive output of %wu bytes exceeds "
"minimum required size of 4095",
- dirlen, dir.beg, fmtres.range.min);
+ dirlen, target_to_host (dir.beg), fmtres.range.min);
else
{
const char *fmtstr
@@ -2715,7 +2853,7 @@ format_directive (const pass_sprintf_length::call_
warned = fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dirlen, dir.beg,
+ dirlen, target_to_host (dir.beg),
fmtres.range.min, fmtres.range.max);
}
}
@@ -2744,7 +2882,7 @@ format_directive (const pass_sprintf_length::call_
warned = fmtwarn (dirloc, pargrange, NULL, info.warnopt (),
"%<%.*s%> directive output of %wu bytes causes "
"result to exceed %<INT_MAX%>",
- dirlen, dir.beg, fmtres.range.min);
+ dirlen, target_to_host (dir.beg), fmtres.range.min);
else
{
const char *fmtstr
@@ -2755,7 +2893,7 @@ format_directive (const pass_sprintf_length::call_
"bytes may cause result to exceed %<INT_MAX%>"));
warned = fmtwarn (dirloc, pargrange, NULL,
info.warnopt (), fmtstr,
- dirlen, dir.beg,
+ dirlen, target_to_host (dir.beg),
fmtres.range.min, fmtres.range.max);
}
}
@@ -2847,7 +2985,7 @@ parse_directive (pass_sprintf_length::call_info &i
directive &dir, format_result *res,
const char *str, unsigned *argno)
{
- const char *pcnt = strchr (str, '%');
+ const char *pcnt = strchr (str, target_percent);
dir.beg = str;
if (size_t len = pcnt ? pcnt - str : *str ? strlen (str) : 1)
@@ -2873,7 +3011,7 @@ parse_directive (pass_sprintf_length::call_info &i
const char *pf = pcnt + 1;
/* POSIX numbered argument index or zero when none. */
- unsigned dollar = 0;
+ HOST_WIDE_INT dollar = 0;
/* With and precision. -1 when not specified, HOST_WIDE_INT_MIN
when given by a va_list argument, and a non-negative value
@@ -2881,6 +3019,17 @@ parse_directive (pass_sprintf_length::call_info &i
HOST_WIDE_INT width = -1;
HOST_WIDE_INT precision = -1;
+ /* Pointers to the beginning of the width and precision decimal
+ string (if any) within the directive. */
+ const char *pwidth = 0;
+ const char *pprec = 0;
+
+ /* When the value of the decimal string that specifies width or
+ precision is out of range, points to the digit that causes
+ the value to exceed the limit. */
+ const char *werange = NULL;
+ const char *perange = NULL;
+
/* Width specified via the asterisk. Need not be INTEGER_CST.
For vararg functions set to void_node. */
tree star_width = NULL_TREE;
@@ -2889,17 +3038,16 @@ parse_directive (pass_sprintf_length::call_info &i
For vararg functions set to void_node. */
tree star_precision = NULL_TREE;
- if (ISDIGIT (*pf))
+ if (ISDIGIT (target_to_host (*pf)))
{
/* This could be either a POSIX positional argument, the '0'
flag, or a width, depending on what follows. Store it as
width and sort it out later after the next character has
been seen. */
- char *end;
- width = strtol (pf, &end, 10);
- pf = end;
+ pwidth = pf;
+ width = target_strtol10 (&pf, &werange);
}
- else if ('*' == *pf)
+ else if (target_to_host (*pf) == '*')
{
/* Similarly to the block above, this could be either a POSIX
positional argument or a width, depending on what follows. */
@@ -2910,7 +3058,7 @@ parse_directive (pass_sprintf_length::call_info &i
++pf;
}
- if (*pf == '$')
+ if (target_to_host (*pf) == '$')
{
/* Handle the POSIX dollar sign which references the 1-based
positional argument number. */
@@ -2925,7 +3073,7 @@ parse_directive (pass_sprintf_length::call_info &i
/* Bail when the numbered argument is out of range (it will
have already been diagnosed by -Wformat). */
if (dollar == 0
- || dollar == info.argidx
+ || dollar == (int)info.argidx
|| dollar > gimple_call_num_args (info.callstmt))
return false;
@@ -2959,7 +3107,7 @@ parse_directive (pass_sprintf_length::call_info &i
the next field is the optional flags followed by an optional
width. */
for ( ; ; ) {
- switch (*pf)
+ switch (target_to_host (*pf))
{
case ' ':
case '0':
@@ -2966,7 +3114,7 @@ parse_directive (pass_sprintf_length::call_info &i
case '+':
case '-':
case '#':
- dir.set_flag (*pf++);
+ dir.set_flag (target_to_host (*pf++));
break;
default:
@@ -2975,13 +3123,13 @@ parse_directive (pass_sprintf_length::call_info &i
}
start_width:
- if (ISDIGIT (*pf))
+ if (ISDIGIT (target_to_host (*pf)))
{
- char *end;
- width = strtol (pf, &end, 10);
- pf = end;
+ werange = 0;
+ pwidth = pf;
+ width = target_strtol10 (&pf, &werange);
}
- else if ('*' == *pf)
+ else if (target_to_host (*pf) == '*')
{
if (*argno < gimple_call_num_args (info.callstmt))
star_width = gimple_call_arg (info.callstmt, (*argno)++);
@@ -2993,7 +3141,7 @@ parse_directive (pass_sprintf_length::call_info &i
}
++pf;
}
- else if ('\'' == *pf)
+ else if (target_to_host (*pf) == '\'')
{
/* The POSIX apostrophe indicating a numeric grouping
in the current locale. Even though it's possible to
@@ -3005,17 +3153,16 @@ parse_directive (pass_sprintf_length::call_info &i
}
start_precision:
- if ('.' == *pf)
+ if (target_to_host (*pf) == '.')
{
++pf;
- if (ISDIGIT (*pf))
+ if (ISDIGIT (target_to_host (*pf)))
{
- char *end;
- precision = strtol (pf, &end, 10);
- pf = end;
+ pprec = pf;
+ precision = target_strtol10 (&pf, &perange);
}
- else if ('*' == *pf)
+ else if (target_to_host (*pf) == '*')
{
if (*argno < gimple_call_num_args (info.callstmt))
star_precision = gimple_call_arg (info.callstmt, (*argno)++);
@@ -3035,10 +3182,10 @@ parse_directive (pass_sprintf_length::call_info &i
}
}
- switch (*pf)
+ switch (target_to_host (*pf))
{
case 'h':
- if (pf[1] == 'h')
+ if (target_to_host (pf[1]) == 'h')
{
++pf;
dir.modifier = FMT_LEN_hh;
@@ -3059,7 +3206,7 @@ parse_directive (pass_sprintf_length::call_info &i
break;
case 'l':
- if (pf[1] == 'l')
+ if (target_to_host (pf[1]) == 'l')
{
++pf;
dir.modifier = FMT_LEN_ll;
@@ -3080,7 +3227,7 @@ parse_directive (pass_sprintf_length::call_info &i
break;
}
- switch (*pf)
+ switch (target_to_host (*pf))
{
/* Handle a sole '%' character the same as "%%" but since it's
undefined prevent the result from being folded. */
@@ -3141,8 +3288,11 @@ parse_directive (pass_sprintf_length::call_info &i
return 0;
}
- dir.specifier = *pf++;
+ dir.specifier = target_to_host (*pf++);
+ /* Store the length of the format directive. */
+ dir.len = pf - pcnt;
+
if (star_width)
{
if (INTEGRAL_TYPE_P (TREE_TYPE (star_width)))
@@ -3156,8 +3306,26 @@ parse_directive (pass_sprintf_length::call_info &i
}
}
else
- dir.set_width (width);
+ {
+ if (width == LONG_MAX && werange)
+ {
+ size_t begin = dir.beg - info.fmtstr + (pwidth - pcnt);
+ size_t caret = begin + (werange - pcnt);
+ size_t end = pf - info.fmtstr - 1;
+ /* Create a location for the width part of the directive,
+ pointing the caret at the first out-of-range digit. */
+ substring_loc dirloc (info.fmtloc, TREE_TYPE (info.format),
+ caret, begin, end);
+
+ fmtwarn (dirloc, NULL, NULL,
+ info.warnopt (), "%<%.*s%> directive width out of range",
+ dir.len, target_to_host (dir.beg));
+ }
+
+ dir.set_width (width);
+ }
+
if (star_precision)
{
if (INTEGRAL_TYPE_P (TREE_TYPE (star_precision)))
@@ -3171,8 +3339,27 @@ parse_directive (pass_sprintf_length::call_info &i
}
}
else
- dir.set_precision (precision);
+ {
+ if (precision == LONG_MAX && perange)
+ {
+ size_t begin = dir.beg - info.fmtstr + (pprec - pcnt) - 1;
+ size_t caret = dir.beg - info.fmtstr + (perange - pcnt) - 1;
+ size_t end = pf - info.fmtstr - 2;
+ /* Create a location for the precision part of the directive,
+ including the leading period, pointing the caret at the first
+ out-of-range digit . */
+ substring_loc dirloc (info.fmtloc, TREE_TYPE (info.format),
+ caret, begin, end);
+
+ fmtwarn (dirloc, NULL, NULL,
+ info.warnopt (), "%<%.*s%> directive precision out of range",
+ dir.len, target_to_host (dir.beg));
+ }
+
+ dir.set_precision (precision);
+ }
+
/* Extract the argument if the directive takes one and if it's
available (e.g., the function doesn't take a va_list). Treat
missing arguments the same as va_list, even though they will
@@ -3181,9 +3368,6 @@ parse_directive (pass_sprintf_length::call_info &i
&& *argno < gimple_call_num_args (info.callstmt))
dir.arg = gimple_call_arg (info.callstmt, dollar ? dollar : (*argno)++);
- /* Return the length of the format directive. */
- dir.len = pf - pcnt;
-
if (dump_file)
{
fprintf (dump_file, " Directive %u at offset %llu: \"%.*s\"",
@@ -3708,6 +3892,8 @@ pass_sprintf_length::handle_gimple_call (gimple_st
unsigned int
pass_sprintf_length::execute (function *fun)
{
+ init_target_to_host_charmap ();
+
basic_block bb;
FOR_EACH_BB_FN (bb, fun)
{
Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (working copy)
@@ -0,0 +1,129 @@
+/* PR tree-optimization/80523 - -Wformat-overflow doesn't consider
+ -fexec-charset
+ { dg-do compile }
+ { dg-require-iconv "IBM1047" }
+ { dg-options "-O2 -Wall -Wno-format -Wformat-overflow -fexec-charset=IBM1047 -ftrack-macro-expansion=0" } */
+
+char buf[1];
+void sink (void*);
+
+#define T(...) (__builtin_sprintf (buf + 1, __VA_ARGS__), sink (buf))
+
+/* Exercise all special C and POSIX characters. */
+
+void test_characters ()
+{
+ T ("%%"); /* { dg-warning ".%%. directive writing 1 byte" } */
+
+ T ("%A", 0.0); /* { dg-warning ".%A. directive writing between 6 and 20 " } */
+ T ("%a", 0.0); /* { dg-warning ".%a. directive writing between 6 and 20 " } */
+
+ T ("%C", 'a'); /* { dg-warning ".%C. directive writing 1 byte" "bug 80537" { xfail *-*-* } } */
+ T ("%c", 'a'); /* { dg-warning ".%c. directive writing 1 byte" } */
+
+ T ("%d", 12); /* { dg-warning ".%d. directive writing 2 bytes" } */
+ T ("% d", 12); /* { dg-warning ".% d. directive writing 3 bytes" } */
+ T ("%-d", 123); /* { dg-warning ".%-d. directive writing 3 bytes" } */
+ T ("%+d", 1234); /* { dg-warning ".%\\+d. directive writing 5 bytes" } */
+ T ("%'d", 1234); /* { dg-warning ".%'d. directive writing 5 bytes" "bug 80535" { xfail *-*-* } } */
+ T ("%1$d", 2345); /* { dg-warning ".%1\\\$d. directive writing 4 bytes" } */
+
+ /* Verify that digits are correctly interpreted as width and precision. */
+ T ("%0d", 12345); /* { dg-warning ".%0d. directive writing 5 bytes" } */
+ T ("%1d", 12345); /* { dg-warning ".%1d. directive writing 5 bytes" } */
+ T ("%2d", 12345); /* { dg-warning ".%2d. directive writing 5 bytes" } */
+ T ("%3d", 12345); /* { dg-warning ".%3d. directive writing 5 bytes" } */
+ T ("%4d", 12345); /* { dg-warning ".%4d. directive writing 5 bytes" } */
+ T ("%5d", 12345); /* { dg-warning ".%5d. directive writing 5 bytes" } */
+ T ("%6d", 12345); /* { dg-warning ".%6d. directive writing 6 bytes" } */
+ T ("%7d", 12345); /* { dg-warning ".%7d. directive writing 7 bytes" } */
+ T ("%8d", 12345); /* { dg-warning ".%8d. directive writing 8 bytes" } */
+ T ("%9d", 12345); /* { dg-warning ".%9d. directive writing 9 bytes" } */
+
+ T ("%.0d", 12345); /* { dg-warning ".%.0d. directive writing 5 bytes" } */
+ T ("%.1d", 12345); /* { dg-warning ".%.1d. directive writing 5 bytes" } */
+ T ("%.2d", 12345); /* { dg-warning ".%.2d. directive writing 5 bytes" } */
+ T ("%.3d", 12345); /* { dg-warning ".%.3d. directive writing 5 bytes" } */
+ T ("%.4d", 12345); /* { dg-warning ".%.4d. directive writing 5 bytes" } */
+ T ("%.5d", 12345); /* { dg-warning ".%.5d. directive writing 5 bytes" } */
+ T ("%.6d", 12345); /* { dg-warning ".%.6d. directive writing 6 bytes" } */
+ T ("%.7d", 12345); /* { dg-warning ".%.7d. directive writing 7 bytes" } */
+ T ("%.8d", 12345); /* { dg-warning ".%.8d. directive writing 8 bytes" } */
+ T ("%.9d", 12345); /* { dg-warning ".%.9d. directive writing 9 bytes" } */
+
+ T ("%hhd", 12); /* { dg-warning ".%hhd. directive writing 2 bytes" } */
+ T ("%hd", 234); /* { dg-warning ".%hd. directive writing 3 bytes" } */
+
+ {
+ const __PTRDIFF_TYPE__ i = 3456;
+ T ("%jd", i); /* { dg-warning ".%jd. directive writing 4 bytes" } */
+ }
+
+ T ("%ld", 45678L); /* { dg-warning ".%ld. directive writing 5 bytes" } */
+
+ {
+ const __PTRDIFF_TYPE__ i = 56789;
+ T ("%td", i); /* { dg-warning ".%td. directive writing 5 bytes" } */
+ }
+
+ {
+ const __SIZE_TYPE__ i = 67890;
+ T ("%zd", i); /* { dg-warning ".%zd. directive writing 5 bytes" } */
+ }
+
+ T ("%E", 0.0); /* { dg-warning ".%E. directive writing 12 bytes" } */
+ T ("%e", 0.0); /* { dg-warning ".%e. directive writing 12 bytes" } */
+ T ("%F", 0.0); /* { dg-warning ".%F. directive writing 8 bytes" } */
+ T ("%f", 0.0); /* { dg-warning ".%f. directive writing 8 bytes" } */
+ T ("%G", 0.0); /* { dg-warning ".%G. directive writing 1 byte" } */
+ T ("%g", 0.0); /* { dg-warning ".%g. directive writing 1 byte" } */
+
+ T ("%i", 123); /* { dg-warning ".%i. directive writing 3 bytes" } */
+
+ {
+ int n;
+
+ T ("%n", &n); /* { dg-warning "writing a terminating nul" } */
+ T ("%nH", &n); /* { dg-warning ".H. directive writing 1 byte" } */
+ }
+
+ T ("%o", 999); /* { dg-warning ".%o. directive writing 4 bytes" } */
+ T ("%#o", 999); /* { dg-warning ".%#o. directive writing 5 bytes" } */
+
+ T ("%x", 1234); /* { dg-warning ".%x. directive writing 3 bytes" } */
+ T ("%#X", 1235); /* { dg-warning ".%#X. directive writing 5 bytes" } */
+
+ T ("%S", L"1"); /* { dg-warning ".%S. directive writing 1 byte" } */
+ T ("%-s", "1"); /* { dg-warning ".%-s. directive writing 1 byte" } */
+
+ /* Verify that characters in the source character set appear in
+ the text of the warning unchanged (i.e., not as their equivalents
+ in the execution character set on the target). The trailing %%
+ disables sprintf->strcpy optimization. */
+ T ("ABCDEFGHIJ%%"); /* { dg-warning ".ABCDEFGHIJ. directive writing 10 bytes" } */
+ T ("KLMNOPQRST%%"); /* { dg-warning ".KLMNOPQRST. directive writing 10 bytes" } */
+ T ("UVWXYZ%%"); /* { dg-warning ".UVWXYZ. directive writing 6 bytes" } */
+
+ T ("abcdefghij%%"); /* { dg-warning ".abcdefghij. directive writing 10 bytes" } */
+ T ("klmnopqrst%%"); /* { dg-warning ".klmnopqrst. directive writing 10 bytes" } */
+ T ("uvwxyz%%"); /* { dg-warning ".uvwxyz. directive writing 6 bytes" } */
+}
+
+#undef T
+#define T(...) (__builtin_sprintf (d, __VA_ARGS__), sink (d))
+
+void test_width_and_precision_out_of_range (char *d)
+{
+#if __LONG_MAX__ == 2147483647
+# define MAX_P1_STR "2147483648"
+#elif __LONG_MAX__ == 9223372036854775807
+# define MAX_P1_STR "9223372036854775808"
+#endif
+
+ T ("%" MAX_P1_STR "i", 0); /* { dg-warning "width out of range" } */
+ /* { dg-warning "result to exceed .INT_MAX. " "" { target *-*-* } .-1 } */
+ T ("%." MAX_P1_STR "i", 0); /* { dg-warning "precision out of range" } */
+
+ /* The following is diagnosed by -Wformat (disabled here). */
+ /* T ("%" MAX_P1_STR "$i", 0); */
+}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-04-27 22:52 ` Martin Sebor
@ 2017-04-28 16:35 ` Jeff Law
2017-04-28 16:37 ` Jakub Jelinek
2017-04-28 18:32 ` Martin Sebor
0 siblings, 2 replies; 21+ messages in thread
From: Jeff Law @ 2017-04-28 16:35 UTC (permalink / raw)
To: Martin Sebor, Jakub Jelinek, Joseph Myers; +Cc: Gcc Patch List
On 04/27/2017 03:05 PM, Martin Sebor wrote:
> On 04/26/2017 04:34 PM, Jakub Jelinek wrote:
>>
>> Also, can't there be a way to shortcut all this processing if the
>> charsets are the same? And is it a good idea if every pass that needs
>> to do
>> something with the exec charset chars caches its own results of the
>> langhook?
>
> The biggest overhead is calling lang_hooks.to_target_charset
> to initialize each character in the source set. That could
> be avoided if there were a way to determine in the middle-end
> whether the input and target charsets are the same, but I don't
> see one. Alternatively, it could be optimized by converting
> all the characters in one shot as a string, but without adding
> a new target hook to do that I don't see how to do that either.
> It might be a useful enhancement but given the scope it feels
> like it should be done independently of this change. But if
> you know of a solution that escaped me please let me know.
>
> The overhead of the additional processing should be negligible
> irrespective of whether the charsets are different or the same
> (all it entails is indexing into a table).
So the initialization could be done once per translation unit rather
than once per function -- assuming the target character set doesn't
change within a translation unit.
That seems like it ought to be easy.
The table-lookup seems like a reasonable cost and I don't think we
should litter the code with conditionals to conditionally lookup based
on whether or not the charsets are the same.
>
> I agree that sharing data would be a good thing but as it is,
> the little that can be be shared already is (the target_percent
> character with builtins.c). The rest of it (i.e., the mapping)
> is only being used by gimple-ssa-sprintf.c.
If we ever get to a point where we need this level of mapping elsewhere,
we can always pull the code out of gimple-ssa-sprintf into a more
generic location. I don't think we need to over-engineering sharing
into this right now.
>
> If/when -Wformat is ever enhanced to handle -fexec-charset, or
> if another area needs to, then implementinng some more general
> would be worthwhile.
Right.
>
> Attached is an updated patch with just the overflow handling
> suggested by Joseph.
>
> Martin
>
>
> gcc-80523.diff
>
>
> PR tree-optimization/80523 - -Wformat-overflow doesn't consider -fexec-charset
>
> gcc/ChangeLog:
>
> PR tree-optimization/80523
> * gimple-ssa-sprintf.c (target_to_host_charmap): New global variable.
> (init_target_to_host_charmap, target_to_host, target_strtol10): New
> functions.
> (maybe_warn, format_directive, parse_directive): Use new functions.
> (pass_sprintf_length::execute): Call init_target_to_host_charmap.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/80523
> * gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: New test.
>
> Index: gcc/gimple-ssa-sprintf.c
> ===================================================================
> --- gcc/gimple-ssa-sprintf.c (revision 247263)
> +++ gcc/gimple-ssa-sprintf.c (working copy)
> +/* Return a string consisting of charavters in the host source character
s/charavters/characters/
> + set corresponding to the string TARGSTR consisting of characters in
> + the execution character set. */
> +
> +static const char*
> +target_to_host (const char *targstr)
> +{
> + /* The interesting subset of source and execution characters are
> + the same so no conversion is necessary. */
> + if (target_to_host_charmap['\0'] == 1)
> + return targstr;
> +
> + /* Convert the initial substring of TARGSTR to the corresponding
> + characters in the host set, appending "..." if TARGSTR is too
> + long to fit. Using the static buffer assumes the function is
> + not called in between sequence points (which it isn't). */
> + static char hostr[32];
> + for (char *ph = hostr; ; ++targstr)
> + {
> + *ph++ = target_to_host (*targstr);
> + if (!*targstr)
> + break;
> +
> + if (ph - hostr == sizeof hostr - 4)
> + {
> + *ph = '\0';
> + strcat (ph, "...");
> + break;
> + }
> + }
> +
> + return hostr;
Ewww. I guess the alternative would be something like:
Expand the return value to include a indicator of whether or not the
original string was returned (common case) or if a new string was
returned and thus needs to be deallocated by the caller.
That's probably pretty unpleasant given we don't have a central place
where we call target_to_host, so the caller side would be much uglier.
Are there any downstream impacts when the string is too long to covert
other than not warning for things which were unconverted?
Jeff
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-04-28 16:35 ` Jeff Law
@ 2017-04-28 16:37 ` Jakub Jelinek
2017-04-28 17:05 ` Jeff Law
2017-04-28 18:32 ` Martin Sebor
1 sibling, 1 reply; 21+ messages in thread
From: Jakub Jelinek @ 2017-04-28 16:37 UTC (permalink / raw)
To: Jeff Law; +Cc: Martin Sebor, Joseph Myers, Gcc Patch List
On Fri, Apr 28, 2017 at 10:22:29AM -0600, Jeff Law wrote:
> So the initialization could be done once per translation unit rather than
> once per function -- assuming the target character set doesn't change within
> a translation unit.
>
> That seems like it ought to be easy.
>
> The table-lookup seems like a reasonable cost and I don't think we should
> litter the code with conditionals to conditionally lookup based on whether
> or not the charsets are the same.
One option would be to have the cache array initialized say to 0 for
all chars except for % (which can be taken from target_percent or how is
that called), and then query (and cache) chars you need (you don't need
anything until % is seen, then obviously you need to translate chars
following that until end of the format string is recognized, then again
skip until next % etc.
And/or enhance libcpp and the langhooks so that they will tell you when
the exec charset is identical to host (or at least the subset format strings
care about).
Jakub
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-04-28 16:37 ` Jakub Jelinek
@ 2017-04-28 17:05 ` Jeff Law
0 siblings, 0 replies; 21+ messages in thread
From: Jeff Law @ 2017-04-28 17:05 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Martin Sebor, Joseph Myers, Gcc Patch List
On 04/28/2017 10:27 AM, Jakub Jelinek wrote:
> On Fri, Apr 28, 2017 at 10:22:29AM -0600, Jeff Law wrote:
>> So the initialization could be done once per translation unit rather than
>> once per function -- assuming the target character set doesn't change within
>> a translation unit.
>>
>> That seems like it ought to be easy.
>>
>> The table-lookup seems like a reasonable cost and I don't think we should
>> litter the code with conditionals to conditionally lookup based on whether
>> or not the charsets are the same.
>
> One option would be to have the cache array initialized say to 0 for
> all chars except for % (which can be taken from target_percent or how is
> that called), and then query (and cache) chars you need (you don't need
> anything until % is seen, then obviously you need to translate chars
> following that until end of the format string is recognized, then again
> skip until next % etc.
That seems like a significant over-complication for little real world
benefit.
Jeff
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-04-28 16:35 ` Jeff Law
2017-04-28 16:37 ` Jakub Jelinek
@ 2017-04-28 18:32 ` Martin Sebor
2017-04-28 19:14 ` Jeff Law
2017-04-29 20:44 ` Andreas Schwab
1 sibling, 2 replies; 21+ messages in thread
From: Martin Sebor @ 2017-04-28 18:32 UTC (permalink / raw)
To: Jeff Law, Jakub Jelinek, Joseph Myers; +Cc: Gcc Patch List
[-- Attachment #1: Type: text/plain, Size: 4114 bytes --]
On 04/28/2017 10:22 AM, Jeff Law wrote:
> On 04/27/2017 03:05 PM, Martin Sebor wrote:
>> On 04/26/2017 04:34 PM, Jakub Jelinek wrote:
>>>
>>> Also, can't there be a way to shortcut all this processing if the
>>> charsets are the same? And is it a good idea if every pass that
>>> needs to do
>>> something with the exec charset chars caches its own results of the
>>> langhook?
>>
>> The biggest overhead is calling lang_hooks.to_target_charset
>> to initialize each character in the source set. That could
>> be avoided if there were a way to determine in the middle-end
>> whether the input and target charsets are the same, but I don't
>> see one. Alternatively, it could be optimized by converting
>> all the characters in one shot as a string, but without adding
>> a new target hook to do that I don't see how to do that either.
>> It might be a useful enhancement but given the scope it feels
>> like it should be done independently of this change. But if
>> you know of a solution that escaped me please let me know.
>>
>> The overhead of the additional processing should be negligible
>> irrespective of whether the charsets are different or the same
>> (all it entails is indexing into a table).
> So the initialization could be done once per translation unit rather
> than once per function -- assuming the target character set doesn't
> change within a translation unit.
>
> That seems like it ought to be easy.
It is easy. I was going to respond by saying "It already is done
that way" because I implemented it in the first patch (by checking
the mapping for '%'. But now I see I accidentally removed the code
in the update while exploring ways to optimize it some more. Sigh.
Let me put it back.
>> + set corresponding to the string TARGSTR consisting of characters in
>> + the execution character set. */
>> +
>> +static const char*
>> +target_to_host (const char *targstr)
>> +{
>> + /* The interesting subset of source and execution characters are
>> + the same so no conversion is necessary. */
>> + if (target_to_host_charmap['\0'] == 1)
>> + return targstr;
>> +
>> + /* Convert the initial substring of TARGSTR to the corresponding
>> + characters in the host set, appending "..." if TARGSTR is too
>> + long to fit. Using the static buffer assumes the function is
>> + not called in between sequence points (which it isn't). */
>> + static char hostr[32];
>> + for (char *ph = hostr; ; ++targstr)
>> + {
>> + *ph++ = target_to_host (*targstr);
>> + if (!*targstr)
>> + break;
>> +
>> + if (ph - hostr == sizeof hostr - 4)
>> + {
>> + *ph = '\0';
>> + strcat (ph, "...");
>> + break;
>> + }
>> + }
>> +
>> + return hostr;
> Ewww. I guess the alternative would be something like:
>
> Expand the return value to include a indicator of whether or not the
> original string was returned (common case) or if a new string was
> returned and thus needs to be deallocated by the caller.
>
> That's probably pretty unpleasant given we don't have a central place
> where we call target_to_host, so the caller side would be much uglier.
>
> Are there any downstream impacts when the string is too long to covert
> other than not warning for things which were unconverted?
The function is only used when printing the text of the directive
in the warning. It doesn't prevent the warnings, it just truncates
them. I think the truncation beyond a certain length is a good
thing regardless of the conversion. Without it, the warning might
end up printing thousands of lines of text, e.g., in
sprintf (d, "thousands of lines of text...");
I don't like returning a pointer to a static buffer but given that
the scope of the function is limited to the pass it seems fairly
safe. Another alternative would be to pass in a buffer and its
size. That shouldn't complicate the caller too much. The easiest,
cleanest, and safest solution by far is to return a std::string,
but I have the impression that would be against GCC convention of
avoiding the STL.
Attached is an updated patch with these two tweaks.
Martin
[-- Attachment #2: gcc-80523.diff --]
[-- Type: text/x-patch, Size: 28981 bytes --]
PR tree-optimization/80523 - -Wformat-overflow doesn't consider -fexec-charset
gcc/ChangeLog:
PR tree-optimization/80523
* gimple-ssa-sprintf.c (target_to_host_charmap): New global variable.
(init_target_to_host_charmap, target_to_host, target_strtol10): New
functions.
(maybe_warn, format_directive, parse_directive): Use new functions.
(pass_sprintf_length::execute): Call init_target_to_host_charmap.
gcc/testsuite/ChangeLog:
PR tree-optimization/80523
* gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: New test.
diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index d3771dd..3e1c119 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -66,6 +66,7 @@ along with GCC; see the file COPYING3. If not see
#include "calls.h"
#include "cfgloop.h"
#include "intl.h"
+#include "langhooks.h"
#include "builtins.h"
#include "stor-layout.h"
@@ -273,6 +274,158 @@ target_size_max ()
return tree_to_uhwi (TYPE_MAX_VALUE (size_type_node));
}
+/* A straightforward mapping from the execution character set to the host
+ character set indexed by execution character. */
+
+static char target_to_host_charmap[256];
+
+/* Initialize a mapping from the execution character set to the host
+ character set. */
+
+static bool
+init_target_to_host_charmap ()
+{
+ /* If the percent sign is non-zero the mapping has already been
+ initialized. */
+ if (target_to_host_charmap['%'])
+ return true;
+
+ /* Initialize the target_percent character (done elsewhere). */
+ if (!init_target_chars ())
+ return false;
+
+ /* The subset of the source character set used by printf conversion
+ specifications (strictly speaking, not all letters are used but
+ they are included here for the sake of simplicity). The dollar
+ sign must be included even though it's not in the basic source
+ character set. */
+ const char srcset[] = " 0123456789!\"#%&'()*+,-./:;<=>?[\\]^_{|}~$"
+ "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
+
+ /* Set the mapping for all characters to some ordinary value (i,e.,
+ not none used in printf conversion specifications) and overwrite
+ those that are used by conversion specifications with their
+ corresponding values. */
+ memset (target_to_host_charmap + 1, '?', sizeof target_to_host_charmap - 1);
+
+ /* Are the two sets of characters the same? */
+ bool all_same_p = true;
+
+ for (const char *pc = srcset; *pc; ++pc)
+ {
+ /* Slice off the high end bits in case target characters are
+ signed. All values are expected to be non-nul, otherwise
+ there's a problem. */
+ if (unsigned char tc = lang_hooks.to_target_charset (*pc))
+ {
+ target_to_host_charmap[tc] = *pc;
+ if (tc != *pc)
+ all_same_p = false;
+ }
+ else
+ return false;
+
+ }
+
+ /* Set the first element to a non-zero value if the mapping
+ is 1-to-1, otherwise leave it clear (NUL is assumed to be
+ the same in both character sets). */
+ target_to_host_charmap[0] = all_same_p;
+
+ return true;
+}
+
+/* Return the host source character corresponding to the character
+ CH in the execution character set if one exists, or some innocuous
+ (non-special, non-nul) source character otherwise. */
+
+static inline unsigned char
+target_to_host (unsigned char ch)
+{
+ return target_to_host_charmap[ch];
+}
+
+/* Convert an initial substring of the string TARGSTR consisting of
+ characters in the execution character set into a string in the
+ source character set on the host and store up to HOSTSZ characters
+ in the buffer pointed to by HOSTR. Return HOSTR. */
+
+static const char*
+target_to_host (char *hostr, size_t hostsz, const char *targstr)
+{
+ /* Make sure the buffer is reasonably big. */
+ gcc_assert (hostsz > 4);
+
+ /* The interesting subset of source and execution characters are
+ the same so no conversion is necessary. However, truncate
+ overlong strings just like the translated strings are. */
+ if (target_to_host_charmap['\0'] == 1)
+ {
+ strncpy (hostr, targstr, hostsz - 4);
+ if (strlen (targstr) >= hostsz)
+ strcpy (hostr + hostsz - 4, "...");
+ return hostr;
+ }
+
+ /* Convert the initial substring of TARGSTR to the corresponding
+ characters in the host set, appending "..." if TARGSTR is too
+ long to fit. Using the static buffer assumes the function is
+ not called in between sequence points (which it isn't). */
+ for (char *ph = hostr; ; ++targstr)
+ {
+ *ph++ = target_to_host (*targstr);
+ if (!*targstr)
+ break;
+
+ if (size_t (ph - hostr) == hostsz - 4)
+ {
+ *ph = '\0';
+ strcat (ph, "...");
+ break;
+ }
+ }
+
+ return hostr;
+}
+
+/* Convert the sequence of decimal digits in the execution character
+ starting at S to a long, just like strtol does. Return the result
+ and set *END to one past the last converted character. On range
+ error set ERANGE to the digit that caused it. */
+
+static inline long
+target_strtol10 (const char **ps, const char **erange)
+{
+ unsigned HOST_WIDE_INT val = 0;
+ for ( ; ; ++*ps)
+ {
+ unsigned char c = target_to_host (**ps);
+ if (ISDIGIT (c))
+ {
+ c -= '0';
+
+ /* Check for overflow. */
+ if (val > (LONG_MAX - c) / 10LU)
+ {
+ val = LONG_MAX;
+ *erange = *ps;
+
+ /* Skip the remaining digits. */
+ do
+ c = target_to_host (*++*ps);
+ while (ISDIGIT (c));
+ break;
+ }
+ else
+ val = val * 10 + c;
+ }
+ else
+ break;
+ }
+
+ return val;
+}
+
/* Return the constant initial value of DECL if available or DECL
otherwise. Same as the synonymous function in c/c-typeck.c. */
@@ -2284,12 +2437,16 @@ maybe_warn (substring_loc &dirloc, source_range *pargrange,
|| (res.max < HOST_WIDE_INT_MAX
&& avail_range.min < res.max)));
+ /* Buffer for the directive in the host character set (used when
+ the source character set is different). */
+ char hostdir[32];
+
if (avail_range.min == avail_range.max)
{
/* The size of the destination region is exact. */
unsigned HOST_WIDE_INT navail = avail_range.max;
- if (*dir.beg != '%')
+ if (target_to_host (*dir.beg) != '%')
{
/* For plain character directives (i.e., the format string itself)
but not others, point the caret at the first character that's
@@ -2339,9 +2496,9 @@ maybe_warn (substring_loc &dirloc, source_range *pargrange,
: G_("%<%.*s%> directive writing %wu bytes "
"into a region of size %wu")));
return fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dir.len, dir.beg, res.min,
- navail);
+ info.warnopt (), fmtstr, dir.len,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
+ res.min, navail);
}
if (res.min == 0 && res.max < maxbytes)
@@ -2356,8 +2513,8 @@ maybe_warn (substring_loc &dirloc, source_range *pargrange,
: G_("%<%.*s%> directive writing up to %wu bytes "
"into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ info.warnopt (), fmtstr, dir.len,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
res.max, navail);
}
@@ -2376,8 +2533,8 @@ maybe_warn (substring_loc &dirloc, source_range *pargrange,
: G_("%<%.*s%> directive writing likely %wu or more bytes "
"into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ info.warnopt (), fmtstr, dir.len,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
res.likely, navail);
}
@@ -2393,10 +2550,9 @@ maybe_warn (substring_loc &dirloc, source_range *pargrange,
: G_("%<%.*s%> directive writing between %wu and "
"%wu bytes into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dir.len, dir.beg,
- res.min, res.max,
- navail);
+ info.warnopt (), fmtstr, dir.len,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
+ res.min, res.max, navail);
}
const char* fmtstr
@@ -2409,14 +2565,14 @@ maybe_warn (substring_loc &dirloc, source_range *pargrange,
: G_("%<%.*s%> directive writing %wu or more bytes "
"into a region of size %wu"));
return fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dir.len, dir.beg,
+ info.warnopt (), fmtstr, dir.len,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
res.min, navail);
}
/* The size of the destination region is a range. */
- if (*dir.beg != '%')
+ if (target_to_host (*dir.beg) != '%')
{
unsigned HOST_WIDE_INT navail = avail_range.max;
@@ -2468,9 +2624,9 @@ maybe_warn (substring_loc &dirloc, source_range *pargrange,
"into a region of size between %wu and %wu")));
return fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dir.len, dir.beg, res.min,
- avail_range.min, avail_range.max);
+ info.warnopt (), fmtstr, dir.len,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
+ res.min, avail_range.min, avail_range.max);
}
if (res.min == 0 && res.max < maxbytes)
@@ -2487,9 +2643,9 @@ maybe_warn (substring_loc &dirloc, source_range *pargrange,
: G_("%<%.*s%> directive writing up to %wu bytes "
"into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dir.len, dir.beg, res.max,
- avail_range.min, avail_range.max);
+ info.warnopt (), fmtstr, dir.len,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
+ res.max, avail_range.min, avail_range.max);
}
if (res.min == 0 && maxbytes <= res.max)
@@ -2509,9 +2665,9 @@ maybe_warn (substring_loc &dirloc, source_range *pargrange,
: G_("%<%.*s%> directive writing likely %wu or more bytes "
"into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dir.len, dir.beg, res.likely,
- avail_range.min, avail_range.max);
+ info.warnopt (), fmtstr, dir.len,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
+ res.likely, avail_range.min, avail_range.max);
}
if (res.max < maxbytes)
@@ -2528,10 +2684,9 @@ maybe_warn (substring_loc &dirloc, source_range *pargrange,
: G_("%<%.*s%> directive writing between %wu and "
"%wu bytes into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dir.len, dir.beg,
- res.min, res.max,
- avail_range.min, avail_range.max);
+ info.warnopt (), fmtstr, dir.len,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
+ res.min, res.max, avail_range.min, avail_range.max);
}
const char* fmtstr
@@ -2546,10 +2701,9 @@ maybe_warn (substring_loc &dirloc, source_range *pargrange,
: G_("%<%.*s%> directive writing %wu or more bytes "
"into a region of size between %wu and %wu"));
return fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dir.len, dir.beg,
- res.min,
- avail_range.min, avail_range.max);
+ info.warnopt (), fmtstr, dir.len,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
+ res.min, avail_range.min, avail_range.max);
}
/* Compute the length of the output resulting from the directive DIR
@@ -2630,13 +2784,17 @@ format_directive (const pass_sprintf_length::call_info &info,
}
}
+ /* Buffer for the directive in the host character set (used when
+ the source character set is different). */
+ char hostdir[32];
+
int dirlen = dir.len;
if (fmtres.nullp)
{
fmtwarn (dirloc, pargrange, NULL, info.warnopt (),
"%<%.*s%> directive argument is null",
- dirlen, dir.beg);
+ dirlen, target_to_host (hostdir, sizeof hostdir, dir.beg));
/* Don't bother processing the rest of the format string. */
res->warned = true;
@@ -2703,7 +2861,9 @@ format_directive (const pass_sprintf_length::call_info &info,
info.warnopt (),
"%<%.*s%> directive output of %wu bytes exceeds "
"minimum required size of 4095",
- dirlen, dir.beg, fmtres.range.min);
+ dirlen,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
+ fmtres.range.min);
else
{
const char *fmtstr
@@ -2714,8 +2874,8 @@ format_directive (const pass_sprintf_length::call_info &info,
"bytes exceeds minimum required size of 4095"));
warned = fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dirlen, dir.beg,
+ info.warnopt (), fmtstr, dirlen,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
fmtres.range.min, fmtres.range.max);
}
}
@@ -2744,7 +2904,9 @@ format_directive (const pass_sprintf_length::call_info &info,
warned = fmtwarn (dirloc, pargrange, NULL, info.warnopt (),
"%<%.*s%> directive output of %wu bytes causes "
"result to exceed %<INT_MAX%>",
- dirlen, dir.beg, fmtres.range.min);
+ dirlen,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
+ fmtres.range.min);
else
{
const char *fmtstr
@@ -2754,8 +2916,8 @@ format_directive (const pass_sprintf_length::call_info &info,
: G_ ("%<%.*s%> directive output between %wu and %wu "
"bytes may cause result to exceed %<INT_MAX%>"));
warned = fmtwarn (dirloc, pargrange, NULL,
- info.warnopt (), fmtstr,
- dirlen, dir.beg,
+ info.warnopt (), fmtstr, dirlen,
+ target_to_host (hostdir, sizeof hostdir, dir.beg),
fmtres.range.min, fmtres.range.max);
}
}
@@ -2847,7 +3009,7 @@ parse_directive (pass_sprintf_length::call_info &info,
directive &dir, format_result *res,
const char *str, unsigned *argno)
{
- const char *pcnt = strchr (str, '%');
+ const char *pcnt = strchr (str, target_percent);
dir.beg = str;
if (size_t len = pcnt ? pcnt - str : *str ? strlen (str) : 1)
@@ -2873,7 +3035,7 @@ parse_directive (pass_sprintf_length::call_info &info,
const char *pf = pcnt + 1;
/* POSIX numbered argument index or zero when none. */
- unsigned dollar = 0;
+ HOST_WIDE_INT dollar = 0;
/* With and precision. -1 when not specified, HOST_WIDE_INT_MIN
when given by a va_list argument, and a non-negative value
@@ -2881,6 +3043,17 @@ parse_directive (pass_sprintf_length::call_info &info,
HOST_WIDE_INT width = -1;
HOST_WIDE_INT precision = -1;
+ /* Pointers to the beginning of the width and precision decimal
+ string (if any) within the directive. */
+ const char *pwidth = 0;
+ const char *pprec = 0;
+
+ /* When the value of the decimal string that specifies width or
+ precision is out of range, points to the digit that causes
+ the value to exceed the limit. */
+ const char *werange = NULL;
+ const char *perange = NULL;
+
/* Width specified via the asterisk. Need not be INTEGER_CST.
For vararg functions set to void_node. */
tree star_width = NULL_TREE;
@@ -2889,17 +3062,16 @@ parse_directive (pass_sprintf_length::call_info &info,
For vararg functions set to void_node. */
tree star_precision = NULL_TREE;
- if (ISDIGIT (*pf))
+ if (ISDIGIT (target_to_host (*pf)))
{
/* This could be either a POSIX positional argument, the '0'
flag, or a width, depending on what follows. Store it as
width and sort it out later after the next character has
been seen. */
- char *end;
- width = strtol (pf, &end, 10);
- pf = end;
+ pwidth = pf;
+ width = target_strtol10 (&pf, &werange);
}
- else if ('*' == *pf)
+ else if (target_to_host (*pf) == '*')
{
/* Similarly to the block above, this could be either a POSIX
positional argument or a width, depending on what follows. */
@@ -2910,7 +3082,7 @@ parse_directive (pass_sprintf_length::call_info &info,
++pf;
}
- if (*pf == '$')
+ if (target_to_host (*pf) == '$')
{
/* Handle the POSIX dollar sign which references the 1-based
positional argument number. */
@@ -2925,7 +3097,7 @@ parse_directive (pass_sprintf_length::call_info &info,
/* Bail when the numbered argument is out of range (it will
have already been diagnosed by -Wformat). */
if (dollar == 0
- || dollar == info.argidx
+ || dollar == (int)info.argidx
|| dollar > gimple_call_num_args (info.callstmt))
return false;
@@ -2959,14 +3131,14 @@ parse_directive (pass_sprintf_length::call_info &info,
the next field is the optional flags followed by an optional
width. */
for ( ; ; ) {
- switch (*pf)
+ switch (target_to_host (*pf))
{
case ' ':
case '0':
case '+':
case '-':
case '#':
- dir.set_flag (*pf++);
+ dir.set_flag (target_to_host (*pf++));
break;
default:
@@ -2975,13 +3147,13 @@ parse_directive (pass_sprintf_length::call_info &info,
}
start_width:
- if (ISDIGIT (*pf))
+ if (ISDIGIT (target_to_host (*pf)))
{
- char *end;
- width = strtol (pf, &end, 10);
- pf = end;
+ werange = 0;
+ pwidth = pf;
+ width = target_strtol10 (&pf, &werange);
}
- else if ('*' == *pf)
+ else if (target_to_host (*pf) == '*')
{
if (*argno < gimple_call_num_args (info.callstmt))
star_width = gimple_call_arg (info.callstmt, (*argno)++);
@@ -2993,7 +3165,7 @@ parse_directive (pass_sprintf_length::call_info &info,
}
++pf;
}
- else if ('\'' == *pf)
+ else if (target_to_host (*pf) == '\'')
{
/* The POSIX apostrophe indicating a numeric grouping
in the current locale. Even though it's possible to
@@ -3005,17 +3177,16 @@ parse_directive (pass_sprintf_length::call_info &info,
}
start_precision:
- if ('.' == *pf)
+ if (target_to_host (*pf) == '.')
{
++pf;
- if (ISDIGIT (*pf))
+ if (ISDIGIT (target_to_host (*pf)))
{
- char *end;
- precision = strtol (pf, &end, 10);
- pf = end;
+ pprec = pf;
+ precision = target_strtol10 (&pf, &perange);
}
- else if ('*' == *pf)
+ else if (target_to_host (*pf) == '*')
{
if (*argno < gimple_call_num_args (info.callstmt))
star_precision = gimple_call_arg (info.callstmt, (*argno)++);
@@ -3035,10 +3206,10 @@ parse_directive (pass_sprintf_length::call_info &info,
}
}
- switch (*pf)
+ switch (target_to_host (*pf))
{
case 'h':
- if (pf[1] == 'h')
+ if (target_to_host (pf[1]) == 'h')
{
++pf;
dir.modifier = FMT_LEN_hh;
@@ -3059,7 +3230,7 @@ parse_directive (pass_sprintf_length::call_info &info,
break;
case 'l':
- if (pf[1] == 'l')
+ if (target_to_host (pf[1]) == 'l')
{
++pf;
dir.modifier = FMT_LEN_ll;
@@ -3080,7 +3251,7 @@ parse_directive (pass_sprintf_length::call_info &info,
break;
}
- switch (*pf)
+ switch (target_to_host (*pf))
{
/* Handle a sole '%' character the same as "%%" but since it's
undefined prevent the result from being folded. */
@@ -3141,7 +3312,14 @@ parse_directive (pass_sprintf_length::call_info &info,
return 0;
}
- dir.specifier = *pf++;
+ dir.specifier = target_to_host (*pf++);
+
+ /* Store the length of the format directive. */
+ dir.len = pf - pcnt;
+
+ /* Buffer for the directive in the host character set (used when
+ the source character set is different). */
+ char hostdir[32];
if (star_width)
{
@@ -3156,7 +3334,25 @@ parse_directive (pass_sprintf_length::call_info &info,
}
}
else
- dir.set_width (width);
+ {
+ if (width == LONG_MAX && werange)
+ {
+ size_t begin = dir.beg - info.fmtstr + (pwidth - pcnt);
+ size_t caret = begin + (werange - pcnt);
+ size_t end = pf - info.fmtstr - 1;
+
+ /* Create a location for the width part of the directive,
+ pointing the caret at the first out-of-range digit. */
+ substring_loc dirloc (info.fmtloc, TREE_TYPE (info.format),
+ caret, begin, end);
+
+ fmtwarn (dirloc, NULL, NULL,
+ info.warnopt (), "%<%.*s%> directive width out of range",
+ dir.len, target_to_host (hostdir, sizeof hostdir, dir.beg));
+ }
+
+ dir.set_width (width);
+ }
if (star_precision)
{
@@ -3171,7 +3367,26 @@ parse_directive (pass_sprintf_length::call_info &info,
}
}
else
- dir.set_precision (precision);
+ {
+ if (precision == LONG_MAX && perange)
+ {
+ size_t begin = dir.beg - info.fmtstr + (pprec - pcnt) - 1;
+ size_t caret = dir.beg - info.fmtstr + (perange - pcnt) - 1;
+ size_t end = pf - info.fmtstr - 2;
+
+ /* Create a location for the precision part of the directive,
+ including the leading period, pointing the caret at the first
+ out-of-range digit . */
+ substring_loc dirloc (info.fmtloc, TREE_TYPE (info.format),
+ caret, begin, end);
+
+ fmtwarn (dirloc, NULL, NULL,
+ info.warnopt (), "%<%.*s%> directive precision out of range",
+ dir.len, target_to_host (hostdir, sizeof hostdir, dir.beg));
+ }
+
+ dir.set_precision (precision);
+ }
/* Extract the argument if the directive takes one and if it's
available (e.g., the function doesn't take a va_list). Treat
@@ -3181,9 +3396,6 @@ parse_directive (pass_sprintf_length::call_info &info,
&& *argno < gimple_call_num_args (info.callstmt))
dir.arg = gimple_call_arg (info.callstmt, dollar ? dollar : (*argno)++);
- /* Return the length of the format directive. */
- dir.len = pf - pcnt;
-
if (dump_file)
{
fprintf (dump_file, " Directive %u at offset %llu: \"%.*s\"",
@@ -3708,6 +3920,8 @@ pass_sprintf_length::handle_gimple_call (gimple_stmt_iterator *gsi)
unsigned int
pass_sprintf_length::execute (function *fun)
{
+ init_target_to_host_charmap ();
+
basic_block bb;
FOR_EACH_BB_FN (bb, fun)
{
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c
new file mode 100644
index 0000000..e9365c2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c
@@ -0,0 +1,141 @@
+/* PR tree-optimization/80523 - -Wformat-overflow doesn't consider
+ -fexec-charset
+ { dg-do compile }
+ { dg-require-iconv "IBM1047" }
+ { dg-options "-O2 -Wall -Wno-format -Wformat-overflow -fexec-charset=IBM1047 -ftrack-macro-expansion=0" } */
+
+char buf[1];
+void sink (void*);
+
+#define T(...) (__builtin_sprintf (buf + 1, __VA_ARGS__), sink (buf))
+
+/* Exercise all special C and POSIX characters. */
+
+void test_characters ()
+{
+ T ("%%"); /* { dg-warning ".%%. directive writing 1 byte" } */
+
+ T ("%A", 0.0); /* { dg-warning ".%A. directive writing between 6 and 20 " } */
+ T ("%a", 0.0); /* { dg-warning ".%a. directive writing between 6 and 20 " } */
+
+ T ("%C", 'a'); /* { dg-warning ".%C. directive writing 1 byte" "bug 80537" { xfail *-*-* } } */
+ T ("%c", 'a'); /* { dg-warning ".%c. directive writing 1 byte" } */
+
+ T ("%d", 12); /* { dg-warning ".%d. directive writing 2 bytes" } */
+ T ("% d", 12); /* { dg-warning ".% d. directive writing 3 bytes" } */
+ T ("%-d", 123); /* { dg-warning ".%-d. directive writing 3 bytes" } */
+ T ("%+d", 1234); /* { dg-warning ".%\\+d. directive writing 5 bytes" } */
+ T ("%'d", 1234); /* { dg-warning ".%'d. directive writing 5 bytes" "bug 80535" { xfail *-*-* } } */
+ T ("%1$d", 2345); /* { dg-warning ".%1\\\$d. directive writing 4 bytes" } */
+
+ /* Verify that digits are correctly interpreted as width and precision. */
+ T ("%0d", 12345); /* { dg-warning ".%0d. directive writing 5 bytes" } */
+ T ("%1d", 12345); /* { dg-warning ".%1d. directive writing 5 bytes" } */
+ T ("%2d", 12345); /* { dg-warning ".%2d. directive writing 5 bytes" } */
+ T ("%3d", 12345); /* { dg-warning ".%3d. directive writing 5 bytes" } */
+ T ("%4d", 12345); /* { dg-warning ".%4d. directive writing 5 bytes" } */
+ T ("%5d", 12345); /* { dg-warning ".%5d. directive writing 5 bytes" } */
+ T ("%6d", 12345); /* { dg-warning ".%6d. directive writing 6 bytes" } */
+ T ("%7d", 12345); /* { dg-warning ".%7d. directive writing 7 bytes" } */
+ T ("%8d", 12345); /* { dg-warning ".%8d. directive writing 8 bytes" } */
+ T ("%9d", 12345); /* { dg-warning ".%9d. directive writing 9 bytes" } */
+
+ T ("%.0d", 12345); /* { dg-warning ".%.0d. directive writing 5 bytes" } */
+ T ("%.1d", 12345); /* { dg-warning ".%.1d. directive writing 5 bytes" } */
+ T ("%.2d", 12345); /* { dg-warning ".%.2d. directive writing 5 bytes" } */
+ T ("%.3d", 12345); /* { dg-warning ".%.3d. directive writing 5 bytes" } */
+ T ("%.4d", 12345); /* { dg-warning ".%.4d. directive writing 5 bytes" } */
+ T ("%.5d", 12345); /* { dg-warning ".%.5d. directive writing 5 bytes" } */
+ T ("%.6d", 12345); /* { dg-warning ".%.6d. directive writing 6 bytes" } */
+ T ("%.7d", 12345); /* { dg-warning ".%.7d. directive writing 7 bytes" } */
+ T ("%.8d", 12345); /* { dg-warning ".%.8d. directive writing 8 bytes" } */
+ T ("%.9d", 12345); /* { dg-warning ".%.9d. directive writing 9 bytes" } */
+
+ T ("%hhd", 12); /* { dg-warning ".%hhd. directive writing 2 bytes" } */
+ T ("%hd", 234); /* { dg-warning ".%hd. directive writing 3 bytes" } */
+
+ {
+ const __PTRDIFF_TYPE__ i = 3456;
+ T ("%jd", i); /* { dg-warning ".%jd. directive writing 4 bytes" } */
+ }
+
+ T ("%ld", 45678L); /* { dg-warning ".%ld. directive writing 5 bytes" } */
+
+ {
+ const __PTRDIFF_TYPE__ i = 56789;
+ T ("%td", i); /* { dg-warning ".%td. directive writing 5 bytes" } */
+ }
+
+ {
+ const __SIZE_TYPE__ i = 67890;
+ T ("%zd", i); /* { dg-warning ".%zd. directive writing 5 bytes" } */
+ }
+
+ T ("%E", 0.0); /* { dg-warning ".%E. directive writing 12 bytes" } */
+ T ("%e", 0.0); /* { dg-warning ".%e. directive writing 12 bytes" } */
+ T ("%F", 0.0); /* { dg-warning ".%F. directive writing 8 bytes" } */
+ T ("%f", 0.0); /* { dg-warning ".%f. directive writing 8 bytes" } */
+ T ("%G", 0.0); /* { dg-warning ".%G. directive writing 1 byte" } */
+ T ("%g", 0.0); /* { dg-warning ".%g. directive writing 1 byte" } */
+
+ T ("%i", 123); /* { dg-warning ".%i. directive writing 3 bytes" } */
+
+ {
+ int n;
+
+ T ("%n", &n); /* { dg-warning "writing a terminating nul" } */
+ T ("%nH", &n); /* { dg-warning ".H. directive writing 1 byte" } */
+ }
+
+ T ("%o", 999); /* { dg-warning ".%o. directive writing 4 bytes" } */
+ T ("%#o", 999); /* { dg-warning ".%#o. directive writing 5 bytes" } */
+
+ T ("%x", 1234); /* { dg-warning ".%x. directive writing 3 bytes" } */
+ T ("%#X", 1235); /* { dg-warning ".%#X. directive writing 5 bytes" } */
+
+ T ("%S", L"1"); /* { dg-warning ".%S. directive writing 1 byte" } */
+ T ("%-s", "1"); /* { dg-warning ".%-s. directive writing 1 byte" } */
+
+ /* Verify that characters in the source character set appear in
+ the text of the warning unchanged (i.e., not as their equivalents
+ in the execution character set on the target). The trailing %%
+ disables sprintf->strcpy optimization. */
+ T ("ABCDEFGHIJ%%"); /* { dg-warning ".ABCDEFGHIJ. directive writing 10 bytes" } */
+ T ("KLMNOPQRST%%"); /* { dg-warning ".KLMNOPQRST. directive writing 10 bytes" } */
+ T ("UVWXYZ%%"); /* { dg-warning ".UVWXYZ. directive writing 6 bytes" } */
+
+ T ("abcdefghij%%"); /* { dg-warning ".abcdefghij. directive writing 10 bytes" } */
+ T ("klmnopqrst%%"); /* { dg-warning ".klmnopqrst. directive writing 10 bytes" } */
+ T ("uvwxyz%%"); /* { dg-warning ".uvwxyz. directive writing 6 bytes" } */
+}
+
+#undef T
+#define T(...) (__builtin_sprintf (d, __VA_ARGS__), sink (d))
+
+void test_width_and_precision_out_of_range (char *d)
+{
+#if __LONG_MAX__ == 2147483647
+# define MAX_P1_STR "2147483648"
+#elif __LONG_MAX__ == 9223372036854775807
+# define MAX_P1_STR "9223372036854775808"
+#endif
+
+ T ("%" MAX_P1_STR "i", 0); /* { dg-warning "width out of range" } */
+ /* { dg-warning "result to exceed .INT_MAX. " "" { target *-*-* } .-1 } */
+ T ("%." MAX_P1_STR "i", 0); /* { dg-warning "precision out of range" } */
+
+ /* The following is diagnosed by -Wformat (disabled here). */
+ /* T ("%" MAX_P1_STR "$i", 0); */
+}
+
+/* Verify that an excessively long directive is truncated and the truncation
+ is indicated by three trailing dots in the text of the warning. */
+
+void test_overlong_plain_string ()
+{
+ static const char longfmtstr[] =
+ "0123456789012345678901234567890123456789012345678901234567890123456789%%";
+
+ char d[1];
+ T (longfmtstr); /* { dg-warning ".0123\[0-9\]\*\.\.\.. directive writing 70 bytes" } */
+}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-04-28 18:32 ` Martin Sebor
@ 2017-04-28 19:14 ` Jeff Law
2017-04-29 20:44 ` Andreas Schwab
1 sibling, 0 replies; 21+ messages in thread
From: Jeff Law @ 2017-04-28 19:14 UTC (permalink / raw)
To: Martin Sebor, Jakub Jelinek, Joseph Myers; +Cc: Gcc Patch List
On 04/28/2017 12:05 PM, Martin Sebor wrote:
>> So the initialization could be done once per translation unit rather
>> than once per function -- assuming the target character set doesn't
>> change within a translation unit.
>>
>> That seems like it ought to be easy.
>
> It is easy. I was going to respond by saying "It already is done
> that way" because I implemented it in the first patch (by checking
> the mapping for '%'. But now I see I accidentally removed the code
> in the update while exploring ways to optimize it some more. Sigh.
> Let me put it back.
:-)
>
>>> + set corresponding to the string TARGSTR consisting of characters in
>>> + the execution character set. */
>>> +
>>> +static const char*
>>> +target_to_host (const char *targstr)
>>> +{
>>> + /* The interesting subset of source and execution characters are
>>> + the same so no conversion is necessary. */
>>> + if (target_to_host_charmap['\0'] == 1)
>>> + return targstr;
>>> +
>>> + /* Convert the initial substring of TARGSTR to the corresponding
>>> + characters in the host set, appending "..." if TARGSTR is too
>>> + long to fit. Using the static buffer assumes the function is
>>> + not called in between sequence points (which it isn't). */
>>> + static char hostr[32];
>>> + for (char *ph = hostr; ; ++targstr)
>>> + {
>>> + *ph++ = target_to_host (*targstr);
>>> + if (!*targstr)
>>> + break;
>>> +
>>> + if (ph - hostr == sizeof hostr - 4)
>>> + {
>>> + *ph = '\0';
>>> + strcat (ph, "...");
>>> + break;
>>> + }
>>> + }
>>> +
>>> + return hostr;
>> Ewww. I guess the alternative would be something like:
>>
>> Expand the return value to include a indicator of whether or not the
>> original string was returned (common case) or if a new string was
>> returned and thus needs to be deallocated by the caller.
>>
>> That's probably pretty unpleasant given we don't have a central place
>> where we call target_to_host, so the caller side would be much uglier.
>>
>> Are there any downstream impacts when the string is too long to covert
>> other than not warning for things which were unconverted?
>
> The function is only used when printing the text of the directive
> in the warning. It doesn't prevent the warnings, it just truncates
> them.
Noted. Thanks.
>
> I don't like returning a pointer to a static buffer but given that
> the scope of the function is limited to the pass it seems fairly
> safe. Another alternative would be to pass in a buffer and its
> size. That shouldn't complicate the caller too much. The easiest,
> cleanest, and safest solution by far is to return a std::string,
> but I have the impression that would be against GCC convention of
> avoiding the STL.
That's what caught my eye. I don't generally like that either.
Putting the buffer into the caller is better. Thanks for doing that.
>
> Attached is an updated patch with these two tweaks.
>
> Martin
>
> gcc-80523.diff
>
>
> PR tree-optimization/80523 - -Wformat-overflow doesn't consider -fexec-charset
>
> gcc/ChangeLog:
>
> PR tree-optimization/80523
> * gimple-ssa-sprintf.c (target_to_host_charmap): New global variable.
> (init_target_to_host_charmap, target_to_host, target_strtol10): New
> functions.
> (maybe_warn, format_directive, parse_directive): Use new functions.
> (pass_sprintf_length::execute): Call init_target_to_host_charmap.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/80523
> * gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: New test.
OK.
jeff
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-04-28 18:32 ` Martin Sebor
2017-04-28 19:14 ` Jeff Law
@ 2017-04-29 20:44 ` Andreas Schwab
2017-05-03 14:35 ` Christophe Lyon
1 sibling, 1 reply; 21+ messages in thread
From: Andreas Schwab @ 2017-04-29 20:44 UTC (permalink / raw)
To: Martin Sebor; +Cc: Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
On Apr 28 2017, Martin Sebor <msebor@gmail.com> wrote:
> +void test_width_and_precision_out_of_range (char *d)
> +{
> +#if __LONG_MAX__ == 2147483647
> +# define MAX_P1_STR "2147483648"
> +#elif __LONG_MAX__ == 9223372036854775807
> +# define MAX_P1_STR "9223372036854775808"
> +#endif
> +
> + T ("%" MAX_P1_STR "i", 0); /* { dg-warning "width out of range" } */
> + /* { dg-warning "result to exceed .INT_MAX. " "" { target *-*-* } .-1 } */
> + T ("%." MAX_P1_STR "i", 0); /* { dg-warning "precision out of range" } */
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line 123)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line 125)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for excess errors)
Excess errors:
/daten/aranym/gcc/gcc-20170429/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:125:3: warning: '%.2147483648i' directive output of 2147483648 bytes causes result to exceed 'INT_MAX' [-Wformat-overflow=]
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-04-29 20:44 ` Andreas Schwab
@ 2017-05-03 14:35 ` Christophe Lyon
2017-05-03 15:02 ` Martin Sebor
0 siblings, 1 reply; 21+ messages in thread
From: Christophe Lyon @ 2017-05-03 14:35 UTC (permalink / raw)
To: Andreas Schwab
Cc: Martin Sebor, Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
Hi,
On 29 April 2017 at 19:56, Andreas Schwab <schwab@linux-m68k.org> wrote:
> On Apr 28 2017, Martin Sebor <msebor@gmail.com> wrote:
>
>> +void test_width_and_precision_out_of_range (char *d)
>> +{
>> +#if __LONG_MAX__ == 2147483647
>> +# define MAX_P1_STR "2147483648"
>> +#elif __LONG_MAX__ == 9223372036854775807
>> +# define MAX_P1_STR "9223372036854775808"
>> +#endif
>> +
>> + T ("%" MAX_P1_STR "i", 0); /* { dg-warning "width out of range" } */
>> + /* { dg-warning "result to exceed .INT_MAX. " "" { target *-*-* } .-1 } */
>> + T ("%." MAX_P1_STR "i", 0); /* { dg-warning "precision out of range" } */
>
> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line 123)
> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line 125)
> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for excess errors)
> Excess errors:
> /daten/aranym/gcc/gcc-20170429/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:125:3: warning: '%.2147483648i' directive output of 2147483648 bytes causes result to exceed 'INT_MAX' [-Wformat-overflow=]
>
> Andreas.
I've noticed the same errors on arm* targets, if it's easier to reproduce.
Thanks,
Christophe
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-05-03 14:35 ` Christophe Lyon
@ 2017-05-03 15:02 ` Martin Sebor
2017-05-03 15:24 ` Christophe Lyon
0 siblings, 1 reply; 21+ messages in thread
From: Martin Sebor @ 2017-05-03 15:02 UTC (permalink / raw)
To: Christophe Lyon, Andreas Schwab
Cc: Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
On 05/03/2017 08:22 AM, Christophe Lyon wrote:
> Hi,
>
>
> On 29 April 2017 at 19:56, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> On Apr 28 2017, Martin Sebor <msebor@gmail.com> wrote:
>>
>>> +void test_width_and_precision_out_of_range (char *d)
>>> +{
>>> +#if __LONG_MAX__ == 2147483647
>>> +# define MAX_P1_STR "2147483648"
>>> +#elif __LONG_MAX__ == 9223372036854775807
>>> +# define MAX_P1_STR "9223372036854775808"
>>> +#endif
>>> +
>>> + T ("%" MAX_P1_STR "i", 0); /* { dg-warning "width out of range" } */
>>> + /* { dg-warning "result to exceed .INT_MAX. " "" { target *-*-* } .-1 } */
>>> + T ("%." MAX_P1_STR "i", 0); /* { dg-warning "precision out of range" } */
>>
>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line 123)
>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line 125)
>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for excess errors)
>> Excess errors:
>> /daten/aranym/gcc/gcc-20170429/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:125:3: warning: '%.2147483648i' directive output of 2147483648 bytes causes result to exceed 'INT_MAX' [-Wformat-overflow=]
>>
>> Andreas.
>
> I've noticed the same errors on arm* targets, if it's easier to reproduce.
Thanks. I committed a trivial fix for this on Monday
(https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00036.html).
I don't see the failures in recent test results for the few
ILP32 targets I've checked so I'm hoping they're gone but if
they persist on some others please let me know.
Martin
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-05-03 15:02 ` Martin Sebor
@ 2017-05-03 15:24 ` Christophe Lyon
2017-06-02 15:38 ` Renlin Li
0 siblings, 1 reply; 21+ messages in thread
From: Christophe Lyon @ 2017-05-03 15:24 UTC (permalink / raw)
To: Martin Sebor
Cc: Andreas Schwab, Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
On 3 May 2017 at 16:54, Martin Sebor <msebor@gmail.com> wrote:
> On 05/03/2017 08:22 AM, Christophe Lyon wrote:
>>
>> Hi,
>>
>>
>> On 29 April 2017 at 19:56, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>>
>>> On Apr 28 2017, Martin Sebor <msebor@gmail.com> wrote:
>>>
>>>> +void test_width_and_precision_out_of_range (char *d)
>>>> +{
>>>> +#if __LONG_MAX__ == 2147483647
>>>> +# define MAX_P1_STR "2147483648"
>>>> +#elif __LONG_MAX__ == 9223372036854775807
>>>> +# define MAX_P1_STR "9223372036854775808"
>>>> +#endif
>>>> +
>>>> + T ("%" MAX_P1_STR "i", 0); /* { dg-warning "width out of range" }
>>>> */
>>>> + /* { dg-warning "result to exceed .INT_MAX. " "" { target *-*-* } .-1
>>>> } */
>>>> + T ("%." MAX_P1_STR "i", 0); /* { dg-warning "precision out of
>>>> range" } */
>>>
>>>
>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line
>>> 123)
>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line
>>> 125)
>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for excess errors)
>>> Excess errors:
>>>
>>> /daten/aranym/gcc/gcc-20170429/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:125:3:
>>> warning: '%.2147483648i' directive output of 2147483648 bytes causes result
>>> to exceed 'INT_MAX' [-Wformat-overflow=]
>>>
>>> Andreas.
>>
>>
>> I've noticed the same errors on arm* targets, if it's easier to reproduce.
>
>
> Thanks. I committed a trivial fix for this on Monday
> (https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00036.html).
> I don't see the failures in recent test results for the few
> ILP32 targets I've checked so I'm hoping they're gone but if
> they persist on some others please let me know.
>
Indeed, I confirm your commit r247444 fixed the error.
I didn't notice your message because it was in a different thread.
Thanks,
Christophe
> Martin
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-05-03 15:24 ` Christophe Lyon
@ 2017-06-02 15:38 ` Renlin Li
2017-06-04 22:24 ` Martin Sebor
0 siblings, 1 reply; 21+ messages in thread
From: Renlin Li @ 2017-06-02 15:38 UTC (permalink / raw)
To: Christophe Lyon, Martin Sebor
Cc: Andreas Schwab, Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
Hi Martin,
After r247444, I saw the following two regressions in arm-linux-gnueabihf environment:
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line 119)
PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line 121)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line 121)
The warning message related to those two lines are:
testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
'%9223372036854775808i' directive width out of range [-Wformat-overflow=]
testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
'%.9223372036854775808i' directive precision out of range [-Wformat-overflow=]
testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
'%.9223372036854775808i' directive precision out of range [-Wformat-overflow=]
Did you notice similar things from your test environment, Christophe?
Regards,
Renlin
On 03/05/17 16:02, Christophe Lyon wrote:
> On 3 May 2017 at 16:54, Martin Sebor <msebor@gmail.com> wrote:
>> On 05/03/2017 08:22 AM, Christophe Lyon wrote:
>>>
>>> Hi,
>>>
>>>
>>> On 29 April 2017 at 19:56, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>>>
>>>> On Apr 28 2017, Martin Sebor <msebor@gmail.com> wrote:
>>>>
>>>>> +void test_width_and_precision_out_of_range (char *d)
>>>>> +{
>>>>> +#if __LONG_MAX__ == 2147483647
>>>>> +# define MAX_P1_STR "2147483648"
>>>>> +#elif __LONG_MAX__ == 9223372036854775807
>>>>> +# define MAX_P1_STR "9223372036854775808"
>>>>> +#endif
>>>>> +
>>>>> + T ("%" MAX_P1_STR "i", 0); /* { dg-warning "width out of range" }
>>>>> */
>>>>> + /* { dg-warning "result to exceed .INT_MAX. " "" { target *-*-* } .-1
>>>>> } */
>>>>> + T ("%." MAX_P1_STR "i", 0); /* { dg-warning "precision out of
>>>>> range" } */
>>>>
>>>>
>>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line
>>>> 123)
>>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings, line
>>>> 125)
>>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for excess errors)
>>>> Excess errors:
>>>>
>>>> /daten/aranym/gcc/gcc-20170429/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:125:3:
>>>> warning: '%.2147483648i' directive output of 2147483648 bytes causes result
>>>> to exceed 'INT_MAX' [-Wformat-overflow=]
>>>>
>>>> Andreas.
>>>
>>>
>>> I've noticed the same errors on arm* targets, if it's easier to reproduce.
>>
>>
>> Thanks. I committed a trivial fix for this on Monday
>> (https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00036.html).
>> I don't see the failures in recent test results for the few
>> ILP32 targets I've checked so I'm hoping they're gone but if
>> they persist on some others please let me know.
>>
> Indeed, I confirm your commit r247444 fixed the error.
> I didn't notice your message because it was in a different thread.
>
> Thanks,
>
> Christophe
>
>> Martin
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-06-02 15:38 ` Renlin Li
@ 2017-06-04 22:24 ` Martin Sebor
2017-06-13 8:16 ` Renlin Li
0 siblings, 1 reply; 21+ messages in thread
From: Martin Sebor @ 2017-06-04 22:24 UTC (permalink / raw)
To: Renlin Li, Christophe Lyon
Cc: Andreas Schwab, Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
On 06/02/2017 09:38 AM, Renlin Li wrote:
> Hi Martin,
>
> After r247444, I saw the following two regressions in
> arm-linux-gnueabihf environment:
>
> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
> line 119)
> PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
> line 121)
> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
> line 121)
>
> The warning message related to those two lines are:
> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
> '%9223372036854775808i' directive width out of range [-Wformat-overflow=]
>
> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
> '%.9223372036854775808i' directive precision out of range
> [-Wformat-overflow=]
>
> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
> '%.9223372036854775808i' directive precision out of range
> [-Wformat-overflow=]
>
> Did you notice similar things from your test environment, Christophe?
Looks like you're missing a couple of warnings. I see the following
output with both my arm-linux-gnueabihf cross compiler and my native
x86_64 GCC, both in 32-bit and 64-bit modes, as expected by the test,
so I don't see the same issue in my environment.
/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3:
warning: â%9223372036854775808iâ directive width out of range
[-Wformat-overflow=]
T ("%9223372036854775808i", 0); /* { dg-warning "width out of
range" } */
^
/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3:
warning: â%9223372036854775808iâ directive output of 9223372036854775807
bytes causes result to exceed âINT_MAXâ [-Wformat-overflow=]
/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3:
warning: â%.9223372036854775808iâ directive precision out of range
[-Wformat-overflow=]
T ("%.9223372036854775808i", 0); /* { dg-warning "precision out of
range" } */
^
/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3:
warning: â%.9223372036854775808iâ directive output of
9223372036854775807 bytes causes result to exceed âINT_MAXâ
[-Wformat-overflow=]
Martin
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-06-04 22:24 ` Martin Sebor
@ 2017-06-13 8:16 ` Renlin Li
2017-06-20 11:00 ` Renlin Li
0 siblings, 1 reply; 21+ messages in thread
From: Renlin Li @ 2017-06-13 8:16 UTC (permalink / raw)
To: Martin Sebor, Christophe Lyon
Cc: Andreas Schwab, Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
Hi Martin,
On 04/06/17 23:24, Martin Sebor wrote:
> On 06/02/2017 09:38 AM, Renlin Li wrote:
>> Hi Martin,
>>
>> After r247444, I saw the following two regressions in
>> arm-linux-gnueabihf environment:
>>
>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>> line 119)
>> PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>> line 121)
>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>> line 121)
>>
>> The warning message related to those two lines are:
>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
>> '%9223372036854775808i' directive width out of range [-Wformat-overflow=]
>>
>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>> '%.9223372036854775808i' directive precision out of range
>> [-Wformat-overflow=]
>>
>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>> '%.9223372036854775808i' directive precision out of range
>> [-Wformat-overflow=]
>>
>> Did you notice similar things from your test environment, Christophe?
>
> Looks like you're missing a couple of warnings. I see the following
> output with both my arm-linux-gnueabihf cross compiler and my native
> x86_64 GCC, both in 32-bit and 64-bit modes, as expected by the test,
> so I don't see the same issue in my environment.
Yes, it happens on arm-linux-gnueabihf native environment. the warnings with "INT_MAX"
line are missing. I don't know if the host environment will cause the difference.
Regards,
Renlin
>
> /ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
> â%9223372036854775808iâ directive width out of range [-Wformat-overflow=]
> T ("%9223372036854775808i", 0); /* { dg-warning "width out of range" } */
> ^
> /ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
> â%9223372036854775808iâ directive output of 9223372036854775807 bytes causes result to
> exceed âINT_MAXâ [-Wformat-overflow=]
> /ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
> â%.9223372036854775808iâ directive precision out of range [-Wformat-overflow=]
> T ("%.9223372036854775808i", 0); /* { dg-warning "precision out of range" } */
> ^
> /ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
> â%.9223372036854775808iâ directive output of 9223372036854775807 bytes causes result to
> exceed âINT_MAXâ [-Wformat-overflow=]
>
> Martin
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-06-13 8:16 ` Renlin Li
@ 2017-06-20 11:00 ` Renlin Li
2018-01-31 17:57 ` Renlin Li
0 siblings, 1 reply; 21+ messages in thread
From: Renlin Li @ 2017-06-20 11:00 UTC (permalink / raw)
To: Martin Sebor, Christophe Lyon
Cc: Andreas Schwab, Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
Hi Martin,
I did a little investigation into this. Please correct me if I missed anything.
I build a native arm-linux-gnueabihf toolchain in armhf hardware.
It's ILP32. So in this situation:
HOST_WIDE_INT is long, which is 32-bit.
integer type 32-bit as well, so target_int_max () == LONG_MAX
gimple-ssa-sprintf.c line 2887
> /* Has the likely and maximum directive output exceeded INT_MAX? */
> bool likelyximax = *dir.beg && res->range.likely > target_int_max ();
likelyximax will be false as the latter expression is always false.
res->range.likely is truncated to LONG_MAX (in target_strtol10 function)
I have checked in cross build environment (host x86_64), this variable is true.
Regards,
Renlin
On 13/06/17 09:16, Renlin Li wrote:
> Hi Martin,
>
> On 04/06/17 23:24, Martin Sebor wrote:
>> On 06/02/2017 09:38 AM, Renlin Li wrote:
>>> Hi Martin,
>>>
>>> After r247444, I saw the following two regressions in
>>> arm-linux-gnueabihf environment:
>>>
>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>> line 119)
>>> PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>> line 121)
>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>> line 121)
>>>
>>> The warning message related to those two lines are:
>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
>>> '%9223372036854775808i' directive width out of range [-Wformat-overflow=]
>>>
>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>>> '%.9223372036854775808i' directive precision out of range
>>> [-Wformat-overflow=]
>>>
>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>>> '%.9223372036854775808i' directive precision out of range
>>> [-Wformat-overflow=]
>>>
>>> Did you notice similar things from your test environment, Christophe?
>>
>> Looks like you're missing a couple of warnings. I see the following
>> output with both my arm-linux-gnueabihf cross compiler and my native
>> x86_64 GCC, both in 32-bit and 64-bit modes, as expected by the test,
>> so I don't see the same issue in my environment.
>
> Yes, it happens on arm-linux-gnueabihf native environment. the warnings with "INT_MAX"
> line are missing. I don't know if the host environment will cause the difference.
>
> Regards,
> Renlin
>
>>
>> /ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
>> â%9223372036854775808iâ directive width out of range [-Wformat-overflow=]
>> T ("%9223372036854775808i", 0); /* { dg-warning "width out of range" } */
>> ^
>> /ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
>> â%9223372036854775808iâ directive output of 9223372036854775807 bytes causes result to
>> exceed âINT_MAXâ [-Wformat-overflow=]
>> /ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>> â%.9223372036854775808iâ directive precision out of range [-Wformat-overflow=]
>> T ("%.9223372036854775808i", 0); /* { dg-warning "precision out of range" } */
>> ^
>> /ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>> â%.9223372036854775808iâ directive output of 9223372036854775807 bytes causes result to
>> exceed âINT_MAXâ [-Wformat-overflow=]
>>
>> Martin
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2017-06-20 11:00 ` Renlin Li
@ 2018-01-31 17:57 ` Renlin Li
2018-02-01 0:41 ` Martin Sebor
0 siblings, 1 reply; 21+ messages in thread
From: Renlin Li @ 2018-01-31 17:57 UTC (permalink / raw)
To: Renlin Li, Martin Sebor, Christophe Lyon
Cc: Andreas Schwab, Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
[-- Attachment #1: Type: text/plain, Size: 3323 bytes --]
Hi there,
I have a patch to fix to regressions we observed in armhf native environment.
To effectively check out of range for format string, a target type should be
used. And according to the standard, int type is used for "width" and "precision"
field of format string.
Here target_strtol10 is rename to target_strtoi10, and fixed to use
target_int_max () which is target dependent.
The value returned by target_strtol10 is (target_int_max () + 1) when it exceeds the range.
This is used to indicate its value exceeds target INT_MAX for the later warning.
Is it Okay to commit?
Regards,
Renlin
gcc/ChangeLog:
2018-01-31 Renlin Li <renlin.li@arm.com>
* gimple-ssa-sprintf.c (target_strtol10): Use target integer to check
the range. Rename it into ....
(target_strtoi10): This.
(parse_directive): Compare with target int max instead of LONG_MAX.
On 20/06/17 12:00, Renlin Li wrote:
> Hi Martin,
>
> I did a little investigation into this. Please correct me if I missed anything.
>
> I build a native arm-linux-gnueabihf toolchain in armhf hardware.
> It's ILP32. So in this situation:
>
> HOST_WIDE_INT is long, which is 32-bit.
> integer type 32-bit as well, so target_int_max () == LONG_MAX
>
>
> gimple-ssa-sprintf.c line 2887
>>  /* Has the likely and maximum directive output exceeded INT_MAX? */
>> Â bool likelyximax = *dir.beg && res->range.likely > target_int_max ();
>
> likelyximax will be false as the latter expression is always false.
> res->range.likely is truncated to LONG_MAX (in target_strtol10 function)
>
> I have checked in cross build environment (host x86_64), this variable is true.
>
> Regards,
> Renlin
>
>
> On 13/06/17 09:16, Renlin Li wrote:
>> Hi Martin,
>>
>> On 04/06/17 23:24, Martin Sebor wrote:
>>> On 06/02/2017 09:38 AM, Renlin Li wrote:
>>>> Hi Martin,
>>>>
>>>> After r247444, I saw the following two regressions in
>>>> arm-linux-gnueabihf environment:
>>>>
>>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>> line 119)
>>>> PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>> line 121)
>>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>> line 121)
>>>>
>>>> The warning message related to those two lines are:
>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
>>>> '%9223372036854775808i' directive width out of range [-Wformat-overflow=]
>>>>
>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>>>> '%.9223372036854775808i' directive precision out of range
>>>> [-Wformat-overflow=]
>>>>
>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>>>> '%.9223372036854775808i' directive precision out of range
>>>> [-Wformat-overflow=]
>>>>
>>>> Did you notice similar things from your test environment, Christophe?
>>>
>>> Looks like you're missing a couple of warnings. I see the following
>>> output with both my arm-linux-gnueabihf cross compiler and my native
>>> x86_64 GCC, both in 32-bit and 64-bit modes, as expected by the test,
>>> so I don't see the same issue in my environment.
>>
>> Yes, it happens on arm-linux-gnueabihf native environment. the warnings with "INT_MAX"
>> line are missing. I don't know if the host environment will cause the difference.
>>
>> Regards,
>> Renlin
[-- Attachment #2: patch.patch --]
[-- Type: text/x-patch, Size: 2857 bytes --]
diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 14b12191d9b16699b28541cb24914fa9f8d8fea9..3a903ffad8e524c6d2fd405812ebc0869bfed7a7 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -399,12 +399,12 @@ target_to_host (char *hostr, size_t hostsz, const char *targstr)
}
/* Convert the sequence of decimal digits in the execution character
- starting at S to a long, just like strtol does. Return the result
- and set *END to one past the last converted character. On range
- error set ERANGE to the digit that caused it. */
+ starting at S to a int. Return the result and set *END to one past the last
+ converted character.
+ On range error set ERANGE to the digit that caused it. */
-static inline long
-target_strtol10 (const char **ps, const char **erange)
+static inline HOST_WIDE_INT
+target_strtoi10 (const char **ps, const char **erange)
{
unsigned HOST_WIDE_INT val = 0;
for ( ; ; ++*ps)
@@ -415,9 +415,9 @@ target_strtol10 (const char **ps, const char **erange)
c -= '0';
/* Check for overflow. */
- if (val > (LONG_MAX - c) / 10LU)
+ if (val > (target_int_max () - c) / 10LU)
{
- val = LONG_MAX;
+ val = target_int_max () + 1;
*erange = *ps;
/* Skip the remaining digits. */
@@ -3082,7 +3082,7 @@ parse_directive (sprintf_dom_walker::call_info &info,
width and sort it out later after the next character has
been seen. */
pwidth = pf;
- width = target_strtol10 (&pf, &werange);
+ width = target_strtoi10 (&pf, &werange);
}
else if (target_to_host (*pf) == '*')
{
@@ -3164,7 +3164,7 @@ parse_directive (sprintf_dom_walker::call_info &info,
{
werange = 0;
pwidth = pf;
- width = target_strtol10 (&pf, &werange);
+ width = target_strtoi10 (&pf, &werange);
}
else if (target_to_host (*pf) == '*')
{
@@ -3197,7 +3197,7 @@ parse_directive (sprintf_dom_walker::call_info &info,
if (ISDIGIT (target_to_host (*pf)))
{
pprec = pf;
- precision = target_strtol10 (&pf, &perange);
+ precision = target_strtoi10 (&pf, &perange);
}
else if (target_to_host (*pf) == '*')
{
@@ -3348,7 +3348,7 @@ parse_directive (sprintf_dom_walker::call_info &info,
}
else
{
- if (width == LONG_MAX && werange)
+ if ((unsigned HOST_WIDE_INT) width > target_int_max () && werange)
{
size_t begin = dir.beg - info.fmtstr + (pwidth - pcnt);
size_t caret = begin + (werange - pcnt);
@@ -3381,7 +3381,7 @@ parse_directive (sprintf_dom_walker::call_info &info,
}
else
{
- if (precision == LONG_MAX && perange)
+ if ((unsigned HOST_WIDE_INT) precision > target_int_max () && perange)
{
size_t begin = dir.beg - info.fmtstr + (pprec - pcnt) - 1;
size_t caret = dir.beg - info.fmtstr + (perange - pcnt) - 1;
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2018-01-31 17:57 ` Renlin Li
@ 2018-02-01 0:41 ` Martin Sebor
2018-02-01 11:54 ` Renlin Li
0 siblings, 1 reply; 21+ messages in thread
From: Martin Sebor @ 2018-02-01 0:41 UTC (permalink / raw)
To: Renlin Li, Renlin Li, Christophe Lyon
Cc: Andreas Schwab, Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
On 01/31/2018 10:36 AM, Renlin Li wrote:
> Hi there,
>
> I have a patch to fix to regressions we observed in armhf native
> environment.
>
> To effectively check out of range for format string, a target type
> should be
> used. And according to the standard, int type is used for "width" and
> "precision"
> field of format string.
>
> Here target_strtol10 is rename to target_strtoi10, and fixed to use
> target_int_max () which is target dependent.
>
> The value returned by target_strtol10 is (target_int_max () + 1) when it
> exceeds the range.
> This is used to indicate its value exceeds target INT_MAX for the later
> warning.
Sorry for not responding to your original email. It's still
in my inbox, just buried under a pile of stuff.
Using LONG_MAX is not ideal but unless I missed something
(it's been a while) I think using INT_MAX for the target would
be even less optimal. Unless specified by the asterisk, width
and precision can be arbitrarily large. It seems to me that
constraining either to INT_MAX would trigger warnings on LP64
targets for safe calls like:
// INT_MAX is 2147483647
sprintf (d, "%.2147483648s", "");
I think we want to use the maximum value of an unsigned type
with the greatest precision on the host. It will still warn
for directives with precisions in excess of HOST_WIDE_INT_MAX
but short of using something like offset_int it's probably as
good as it's going to get. (It has been suggested that
the whole pass would benefit from using offset_int to track
large numbers so if/when it's enhanced to make this change
it should also make the code behave more consistently across
different hosts.)
Martin
>
> Is it Okay to commit?
>
> Regards,
> Renlin
>
> gcc/ChangeLog:
>
> 2018-01-31 Renlin Li <renlin.li@arm.com>
>
> * gimple-ssa-sprintf.c (target_strtol10): Use target integer to check
> the range. Rename it into ....
> (target_strtoi10): This.
> (parse_directive): Compare with target int max instead of LONG_MAX.
>
>
> On 20/06/17 12:00, Renlin Li wrote:
>> Hi Martin,
>>
>> I did a little investigation into this. Please correct me if I missed
>> anything.
>>
>> I build a native arm-linux-gnueabihf toolchain in armhf hardware.
>> It's ILP32. So in this situation:
>>
>> HOST_WIDE_INT is long, which is 32-bit.
>> integer type 32-bit as well, so target_int_max () == LONG_MAX
>>
>>
>> gimple-ssa-sprintf.c line 2887
>>> /* Has the likely and maximum directive output exceeded INT_MAX? */
>>> bool likelyximax = *dir.beg && res->range.likely > target_int_max ();
>>
>> likelyximax will be false as the latter expression is always false.
>> res->range.likely is truncated to LONG_MAX (in target_strtol10 function)
>>
>> I have checked in cross build environment (host x86_64), this variable
>> is true.
>>
>> Regards,
>> Renlin
>>
>>
>> On 13/06/17 09:16, Renlin Li wrote:
>>> Hi Martin,
>>>
>>> On 04/06/17 23:24, Martin Sebor wrote:
>>>> On 06/02/2017 09:38 AM, Renlin Li wrote:
>>>>> Hi Martin,
>>>>>
>>>>> After r247444, I saw the following two regressions in
>>>>> arm-linux-gnueabihf environment:
>>>>>
>>>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>>> line 119)
>>>>> PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>>> line 121)
>>>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>>> line 121)
>>>>>
>>>>> The warning message related to those two lines are:
>>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
>>>>> '%9223372036854775808i' directive width out of range
>>>>> [-Wformat-overflow=]
>>>>>
>>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>>>>> '%.9223372036854775808i' directive precision out of range
>>>>> [-Wformat-overflow=]
>>>>>
>>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>>>>> '%.9223372036854775808i' directive precision out of range
>>>>> [-Wformat-overflow=]
>>>>>
>>>>> Did you notice similar things from your test environment, Christophe?
>>>>
>>>> Looks like you're missing a couple of warnings. I see the following
>>>> output with both my arm-linux-gnueabihf cross compiler and my native
>>>> x86_64 GCC, both in 32-bit and 64-bit modes, as expected by the test,
>>>> so I don't see the same issue in my environment.
>>>
>>> Yes, it happens on arm-linux-gnueabihf native environment. the
>>> warnings with "INT_MAX"
>>> line are missing. I don't know if the host environment will cause the
>>> difference.
>>>
>>> Regards,
>>> Renlin
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2018-02-01 0:41 ` Martin Sebor
@ 2018-02-01 11:54 ` Renlin Li
2018-02-01 18:27 ` Martin Sebor
0 siblings, 1 reply; 21+ messages in thread
From: Renlin Li @ 2018-02-01 11:54 UTC (permalink / raw)
To: Martin Sebor, Renlin Li, Christophe Lyon
Cc: Andreas Schwab, Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
Hi Martin,
On 01/02/18 00:40, Martin Sebor wrote:
> On 01/31/2018 10:36 AM, Renlin Li wrote:
>> Hi there,
>>
>> I have a patch to fix to regressions we observed in armhf native
>> environment.
>>
>> To effectively check out of range for format string, a target type
>> should be
>> used. And according to the standard, int type is used for "width" and
>> "precision"
>> field of format string.
>>
>> Here target_strtol10 is rename to target_strtoi10, and fixed to use
>> target_int_max () which is target dependent.
>>
>> The value returned by target_strtol10 is (target_int_max () + 1) when it
>> exceeds the range.
>> This is used to indicate its value exceeds target INT_MAX for the later
>> warning.
>
> Sorry for not responding to your original email. It's still
> in my inbox, just buried under a pile of stuff.
>
> Using LONG_MAX is not ideal but unless I missed something
> (it's been a while) I think using INT_MAX for the target would
> be even less optimal. Unless specified by the asterisk, width
> and precision can be arbitrarily large.
I cannot find more document about this other than here: http://en.cppreference.com/w/c/io/fprintf
> (optional) integer value or * that specifies minimum field width.
> (optional) . followed by integer number or *, or neither that specifies precision of the conversion.
It only mentions a integer value, with no type. I assume it could be of type int?
It is indeed very unclear about the range of width and precision.
> constraining either to INT_MAX would trigger warnings on LP64
> targets for safe calls like:
>
> Â // INT_MAX is 2147483647
> Â sprintf (d, "%.2147483648s", "");
>
> I think we want to use the maximum value of an unsigned type
> with the greatest precision on the host.
It will still warn
> for directives with precisions in excess of HOST_WIDE_INT_MAX
Is it here a target type should be used to test the range against? Instead of the
host where the toolchain is built?
Regards,
Renlin
> but short of using something like offset_int it's probably as
> good as it's going to get. (It has been suggested that
> the whole pass would benefit from using offset_int to track
> large numbers so if/when it's enhanced to make this change
> it should also make the code behave more consistently across
> different hosts.)
>
> Martin
>
>>
>> Is it Okay to commit?
>>
>> Regards,
>> Renlin
>>
>> gcc/ChangeLog:
>>
>> 2018-01-31 Renlin Li <renlin.li@arm.com>
>>
>> Â Â Â * gimple-ssa-sprintf.c (target_strtol10): Use target integer to check
>> Â Â Â the range. Rename it into ....
>> Â Â Â (target_strtoi10): This.
>> Â Â Â (parse_directive): Compare with target int max instead of LONG_MAX.
>>
>>
>> On 20/06/17 12:00, Renlin Li wrote:
>>> Hi Martin,
>>>
>>> I did a little investigation into this. Please correct me if I missed
>>> anything.
>>>
>>> I build a native arm-linux-gnueabihf toolchain in armhf hardware.
>>> It's ILP32. So in this situation:
>>>
>>> HOST_WIDE_INT is long, which is 32-bit.
>>> integer type 32-bit as well, so target_int_max () == LONG_MAX
>>>
>>>
>>> gimple-ssa-sprintf.c line 2887
>>>>  /* Has the likely and maximum directive output exceeded INT_MAX? */
>>>> Â bool likelyximax = *dir.beg && res->range.likely > target_int_max ();
>>>
>>> likelyximax will be false as the latter expression is always false.
>>> res->range.likely is truncated to LONG_MAX (in target_strtol10 function)
>>>
>>> I have checked in cross build environment (host x86_64), this variable
>>> is true.
>>>
>>> Regards,
>>> Renlin
>>>
>>>
>>> On 13/06/17 09:16, Renlin Li wrote:
>>>> Hi Martin,
>>>>
>>>> On 04/06/17 23:24, Martin Sebor wrote:
>>>>> On 06/02/2017 09:38 AM, Renlin Li wrote:
>>>>>> Hi Martin,
>>>>>>
>>>>>> After r247444, I saw the following two regressions in
>>>>>> arm-linux-gnueabihf environment:
>>>>>>
>>>>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>>>> line 119)
>>>>>> PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>>>> line 121)
>>>>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>>>> line 121)
>>>>>>
>>>>>> The warning message related to those two lines are:
>>>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
>>>>>> '%9223372036854775808i' directive width out of range
>>>>>> [-Wformat-overflow=]
>>>>>>
>>>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>>>>>> '%.9223372036854775808i' directive precision out of range
>>>>>> [-Wformat-overflow=]
>>>>>>
>>>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>>>>>> '%.9223372036854775808i' directive precision out of range
>>>>>> [-Wformat-overflow=]
>>>>>>
>>>>>> Did you notice similar things from your test environment, Christophe?
>>>>>
>>>>> Looks like you're missing a couple of warnings. I see the following
>>>>> output with both my arm-linux-gnueabihf cross compiler and my native
>>>>> x86_64 GCC, both in 32-bit and 64-bit modes, as expected by the test,
>>>>> so I don't see the same issue in my environment.
>>>>
>>>> Yes, it happens on arm-linux-gnueabihf native environment. the
>>>> warnings with "INT_MAX"
>>>> line are missing. I don't know if the host environment will cause the
>>>> difference.
>>>>
>>>> Regards,
>>>> Renlin
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)
2018-02-01 11:54 ` Renlin Li
@ 2018-02-01 18:27 ` Martin Sebor
0 siblings, 0 replies; 21+ messages in thread
From: Martin Sebor @ 2018-02-01 18:27 UTC (permalink / raw)
To: Renlin Li, Renlin Li, Christophe Lyon
Cc: Andreas Schwab, Jeff Law, Jakub Jelinek, Joseph Myers, Gcc Patch List
On 02/01/2018 04:54 AM, Renlin Li wrote:
> Hi Martin,
>
>
> On 01/02/18 00:40, Martin Sebor wrote:
>> On 01/31/2018 10:36 AM, Renlin Li wrote:
>>> Hi there,
>>>
>>> I have a patch to fix to regressions we observed in armhf native
>>> environment.
>>>
>>> To effectively check out of range for format string, a target type
>>> should be
>>> used. And according to the standard, int type is used for "width" and
>>> "precision"
>>> field of format string.
>>>
>>> Here target_strtol10 is rename to target_strtoi10, and fixed to use
>>> target_int_max () which is target dependent.
>>>
>>> The value returned by target_strtol10 is (target_int_max () + 1) when it
>>> exceeds the range.
>>> This is used to indicate its value exceeds target INT_MAX for the later
>>> warning.
>>
>> Sorry for not responding to your original email. It's still
>> in my inbox, just buried under a pile of stuff.
>>
>> Using LONG_MAX is not ideal but unless I missed something
>> (it's been a while) I think using INT_MAX for the target would
>> be even less optimal. Unless specified by the asterisk, width
>> and precision can be arbitrarily large.
>
> I cannot find more document about this other than here:
> http://en.cppreference.com/w/c/io/fprintf
The best reference is the C standard. The most recent publicly
available draft of C11 is here:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
The next best (and sometimes better) reference is POSIX:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html
>> (optional) integer value or * that specifies minimum field width.
>> (optional) . followed by integer number or *, or neither that
>> specifies precision of the conversion.
>
> It only mentions a integer value, with no type. I assume it could be of
> type int?
> It is indeed very unclear about the range of width and precision.
Because the spec doesn't say what precision the number is
to be interpreted in, it must be taken to mean a number with
an arbitrary precision.
>> constraining either to INT_MAX would trigger warnings on LP64
>> targets for safe calls like:
>>
>> // INT_MAX is 2147483647
>> sprintf (d, "%.2147483648s", "");
>>
>> I think we want to use the maximum value of an unsigned type
>> with the greatest precision on the host.
> It will still warn
>> for directives with precisions in excess of HOST_WIDE_INT_MAX
>
> Is it here a target type should be used to test the range against?
> Instead of the
> host where the toolchain is built?
I think for the purposes of runtime evaluation by a conforming
implementation, both the width and the precision need to be
interpreted "as if" with infinite precision.
But after sleeping on it, I suppose that for the purposes of
the warning, it could be helpful to point out values that exceed
even INT_MAX on the host (rather than target) because they aren't
very meaningful and are most likely mistakes (e.g., hidden by
macro expansion).
It might still be useful to have the pass store the larger number
(up to its limit, whatever that is on the host), and use it for
its internal computation and subsequent diagnostics for accuracy.
Martin
>
> Regards,
> Renlin
>
>> but short of using something like offset_int it's probably as
>> good as it's going to get. (It has been suggested that
>> the whole pass would benefit from using offset_int to track
>> large numbers so if/when it's enhanced to make this change
>> it should also make the code behave more consistently across
>> different hosts.)
>
>
>
>>
>> Martin
>>
>>>
>>> Is it Okay to commit?
>>>
>>> Regards,
>>> Renlin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2018-01-31 Renlin Li <renlin.li@arm.com>
>>>
>>> * gimple-ssa-sprintf.c (target_strtol10): Use target integer to
>>> check
>>> the range. Rename it into ....
>>> (target_strtoi10): This.
>>> (parse_directive): Compare with target int max instead of LONG_MAX.
>>>
>>>
>>> On 20/06/17 12:00, Renlin Li wrote:
>>>> Hi Martin,
>>>>
>>>> I did a little investigation into this. Please correct me if I missed
>>>> anything.
>>>>
>>>> I build a native arm-linux-gnueabihf toolchain in armhf hardware.
>>>> It's ILP32. So in this situation:
>>>>
>>>> HOST_WIDE_INT is long, which is 32-bit.
>>>> integer type 32-bit as well, so target_int_max () == LONG_MAX
>>>>
>>>>
>>>> gimple-ssa-sprintf.c line 2887
>>>>> /* Has the likely and maximum directive output exceeded INT_MAX? */
>>>>> bool likelyximax = *dir.beg && res->range.likely > target_int_max
>>>>> ();
>>>>
>>>> likelyximax will be false as the latter expression is always false.
>>>> res->range.likely is truncated to LONG_MAX (in target_strtol10
>>>> function)
>>>>
>>>> I have checked in cross build environment (host x86_64), this variable
>>>> is true.
>>>>
>>>> Regards,
>>>> Renlin
>>>>
>>>>
>>>> On 13/06/17 09:16, Renlin Li wrote:
>>>>> Hi Martin,
>>>>>
>>>>> On 04/06/17 23:24, Martin Sebor wrote:
>>>>>> On 06/02/2017 09:38 AM, Renlin Li wrote:
>>>>>>> Hi Martin,
>>>>>>>
>>>>>>> After r247444, I saw the following two regressions in
>>>>>>> arm-linux-gnueabihf environment:
>>>>>>>
>>>>>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>>>>> line 119)
>>>>>>> PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>>>>> line 121)
>>>>>>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for warnings,
>>>>>>> line 121)
>>>>>>>
>>>>>>> The warning message related to those two lines are:
>>>>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
>>>>>>> '%9223372036854775808i' directive width out of range
>>>>>>> [-Wformat-overflow=]
>>>>>>>
>>>>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>>>>>>> '%.9223372036854775808i' directive precision out of range
>>>>>>> [-Wformat-overflow=]
>>>>>>>
>>>>>>> testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
>>>>>>> '%.9223372036854775808i' directive precision out of range
>>>>>>> [-Wformat-overflow=]
>>>>>>>
>>>>>>> Did you notice similar things from your test environment,
>>>>>>> Christophe?
>>>>>>
>>>>>> Looks like you're missing a couple of warnings. I see the following
>>>>>> output with both my arm-linux-gnueabihf cross compiler and my native
>>>>>> x86_64 GCC, both in 32-bit and 64-bit modes, as expected by the test,
>>>>>> so I don't see the same issue in my environment.
>>>>>
>>>>> Yes, it happens on arm-linux-gnueabihf native environment. the
>>>>> warnings with "INT_MAX"
>>>>> line are missing. I don't know if the host environment will cause the
>>>>> difference.
>>>>>
>>>>> Regards,
>>>>> Renlin
>>
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2018-02-01 18:27 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-26 22:27 [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503) Martin Sebor
2017-04-27 3:39 ` Joseph Myers
2017-04-27 5:07 ` Jakub Jelinek
2017-04-27 22:52 ` Martin Sebor
2017-04-28 16:35 ` Jeff Law
2017-04-28 16:37 ` Jakub Jelinek
2017-04-28 17:05 ` Jeff Law
2017-04-28 18:32 ` Martin Sebor
2017-04-28 19:14 ` Jeff Law
2017-04-29 20:44 ` Andreas Schwab
2017-05-03 14:35 ` Christophe Lyon
2017-05-03 15:02 ` Martin Sebor
2017-05-03 15:24 ` Christophe Lyon
2017-06-02 15:38 ` Renlin Li
2017-06-04 22:24 ` Martin Sebor
2017-06-13 8:16 ` Renlin Li
2017-06-20 11:00 ` Renlin Li
2018-01-31 17:57 ` Renlin Li
2018-02-01 0:41 ` Martin Sebor
2018-02-01 11:54 ` Renlin Li
2018-02-01 18:27 ` Martin Sebor
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).