* [PATCH] Make strlen range computations more conservative @ 2018-07-24 7:59 Bernd Edlinger 2018-07-24 14:50 ` Richard Biener ` (2 more replies) 0 siblings, 3 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-07-24 7:59 UTC (permalink / raw) To: GCC Patches; +Cc: Jeff Law, Richard Biener, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 701 bytes --] Hi! This patch makes strlen range computations more conservative. Firstly if there is a visible type cast from type A to B before passing then value to strlen, don't expect the type layout of B to restrict the possible return value range of strlen. Furthermore use the outermost enclosing array instead of the innermost one, because too aggressive optimization will likely convert harmless errors into security-relevant errors, because as the existing test cases demonstrate, this optimization is actively attacking string length checks in user code, while and not giving any warnings. Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd. [-- Attachment #2: changelog-range-strlen.txt --] [-- Type: text/plain, Size: 511 bytes --] gcc: 2018-07-24 Bernd Edlinger <bernd.edlinger@hotmail.de> * gimple-fold.c (get_range_strlen): Add a check for type casts. Use outermost enclosing array size instead of innermost one. * tree-ssa-strlen.c (maybe_set_strlen_range): Likewise. testsuite: 2018-07-24 Bernd Edlinger <bernd.edlinger@hotmail.de> * gcc.dg/strlenopt-40.c: Adjust test expectations. * gcc.dg/strlenopt-45.c: Likewise. * gcc.dg/strlenopt-48.c: Likewise. * gcc.dg/strlenopt-51.c: Likewise. * gcc.dg/strlenopt-54.c: New test. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: patch-range-strlen.diff --] [-- Type: text/x-patch; name="patch-range-strlen.diff", Size: 16526 bytes --] Index: gcc/gimple-fold.c =================================================================== --- gcc/gimple-fold.c (revision 262904) +++ gcc/gimple-fold.c (working copy) @@ -1339,19 +1339,33 @@ get_range_strlen (tree arg, tree length[2], bitmap if (TREE_CODE (arg) == ARRAY_REF) { - tree type = TREE_TYPE (TREE_OPERAND (arg, 0)); + /* Avoid arrays of pointers. */ + if (TREE_CODE (TREE_TYPE (arg)) == POINTER_TYPE) + return false; - /* Determine the "innermost" array type. */ - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); + /* Look for the outermost enclosing array. */ + while (TREE_CODE (arg) == ARRAY_REF + && TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 0))) + == ARRAY_TYPE) + arg = TREE_OPERAND (arg, 0); - /* Avoid arrays of pointers. */ - tree eltype = TREE_TYPE (type); - if (TREE_CODE (type) != ARRAY_TYPE - || !INTEGRAL_TYPE_P (eltype)) + tree base = arg; + while (TREE_CODE (base) == ARRAY_REF + || TREE_CODE (base) == ARRAY_RANGE_REF + || TREE_CODE (base) == COMPONENT_REF) + base = TREE_OPERAND (base, 0); + + /* If this looks like a type cast don't assume anything. */ + if ((TREE_CODE (base) == MEM_REF + && (! integer_zerop (TREE_OPERAND (base, 1)) + || TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0))) + != TREE_TYPE (base))) + || TREE_CODE (base) == VIEW_CONVERT_EXPR) return false; + tree type = TREE_TYPE (arg); + + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); if (!val || integer_zerop (val)) return false; @@ -1362,9 +1376,9 @@ get_range_strlen (tree arg, tree length[2], bitmap the array could have zero length. */ *minlen = ssize_int (0); - if (TREE_CODE (TREE_OPERAND (arg, 0)) == COMPONENT_REF - && type == TREE_TYPE (TREE_OPERAND (arg, 0)) - && array_at_struct_end_p (TREE_OPERAND (arg, 0))) + if (TREE_CODE (arg) == COMPONENT_REF + && type == TREE_TYPE (arg) + && array_at_struct_end_p (arg)) *flexp = true; } else if (TREE_CODE (arg) == COMPONENT_REF @@ -1371,6 +1385,20 @@ get_range_strlen (tree arg, tree length[2], bitmap && (TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 1))) == ARRAY_TYPE)) { + tree base = TREE_OPERAND (arg, 0); + while (TREE_CODE (base) == ARRAY_REF + || TREE_CODE (base) == ARRAY_RANGE_REF + || TREE_CODE (base) == COMPONENT_REF) + base = TREE_OPERAND (base, 0); + + /* If this looks like a type cast don't assume anything. */ + if ((TREE_CODE (base) == MEM_REF + && (! integer_zerop (TREE_OPERAND (base, 1)) + || TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0))) + != TREE_TYPE (base))) + || TREE_CODE (base) == VIEW_CONVERT_EXPR) + return false; + /* Use the type of the member array to determine the upper bound on the length of the array. This may be overly optimistic if the array itself isn't NUL-terminated and @@ -1386,10 +1414,6 @@ get_range_strlen (tree arg, tree length[2], bitmap tree type = TREE_TYPE (arg); - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); - /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); if (!val || integer_zerop (val)) Index: gcc/tree-ssa-strlen.c =================================================================== --- gcc/tree-ssa-strlen.c (revision 262904) +++ gcc/tree-ssa-strlen.c (working copy) @@ -1149,9 +1149,33 @@ maybe_set_strlen_range (tree lhs, tree src, tree b if (TREE_CODE (src) == ADDR_EXPR) { + src = TREE_OPERAND (src, 0); + + /* Avoid address of pointers. */ + if (TREE_CODE (TREE_TYPE (src)) == POINTER_TYPE) + goto done; + + /* Look for the outermost enclosing array. */ + while (TREE_CODE (src) == ARRAY_REF + && TREE_CODE (TREE_TYPE (TREE_OPERAND (src, 0))) == ARRAY_TYPE) + src = TREE_OPERAND (src, 0); + + tree base = src; + while (TREE_CODE (base) == ARRAY_REF + || TREE_CODE (base) == ARRAY_RANGE_REF + || TREE_CODE (base) == COMPONENT_REF) + base = TREE_OPERAND (base, 0); + + /* If this looks like a type cast don't assume anything. */ + if ((TREE_CODE (base) == MEM_REF + && (! integer_zerop (TREE_OPERAND (base, 1)) + || TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0))) + != TREE_TYPE (base))) + || TREE_CODE (base) == VIEW_CONVERT_EXPR) + goto done; + /* The last array member of a struct can be bigger than its size suggests if it's treated as a poor-man's flexible array member. */ - src = TREE_OPERAND (src, 0); bool src_is_array = TREE_CODE (TREE_TYPE (src)) == ARRAY_TYPE; if (src_is_array && !array_at_struct_end_p (src)) { @@ -1183,6 +1207,7 @@ maybe_set_strlen_range (tree lhs, tree src, tree b } } +done: if (bound) { /* For strnlen, adjust MIN and MAX as necessary. If the bound Index: gcc/testsuite/gcc.dg/strlenopt-40.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-40.c (revision 262904) +++ gcc/testsuite/gcc.dg/strlenopt-40.c (working copy) @@ -105,20 +105,20 @@ void elim_global_arrays (int i) /* Verify that the expression involving the strlen call as well as whatever depends on it is eliminated from the test output. All these expressions must be trivially true. */ - ELIM_TRUE (strlen (a7_3[0]) < sizeof a7_3[0]); - ELIM_TRUE (strlen (a7_3[1]) < sizeof a7_3[1]); - ELIM_TRUE (strlen (a7_3[6]) < sizeof a7_3[6]); - ELIM_TRUE (strlen (a7_3[i]) < sizeof a7_3[i]); + ELIM_TRUE (strlen (a7_3[0]) < sizeof a7_3); + ELIM_TRUE (strlen (a7_3[1]) < sizeof a7_3); + ELIM_TRUE (strlen (a7_3[6]) < sizeof a7_3); + ELIM_TRUE (strlen (a7_3[i]) < sizeof a7_3); - ELIM_TRUE (strlen (a5_7[0]) < sizeof a5_7[0]); - ELIM_TRUE (strlen (a5_7[1]) < sizeof a5_7[1]); - ELIM_TRUE (strlen (a5_7[4]) < sizeof a5_7[4]); - ELIM_TRUE (strlen (a5_7[i]) < sizeof a5_7[0]); + ELIM_TRUE (strlen (a5_7[0]) < sizeof a5_7); + ELIM_TRUE (strlen (a5_7[1]) < sizeof a5_7); + ELIM_TRUE (strlen (a5_7[4]) < sizeof a5_7); + ELIM_TRUE (strlen (a5_7[i]) < sizeof a5_7); - ELIM_TRUE (strlen (ax_3[0]) < sizeof ax_3[0]); - ELIM_TRUE (strlen (ax_3[1]) < sizeof ax_3[1]); - ELIM_TRUE (strlen (ax_3[9]) < sizeof ax_3[9]); - ELIM_TRUE (strlen (ax_3[i]) < sizeof ax_3[i]); + ELIM_TRUE (strlen (ax_3[0]) < DIFF_MAX - 1); + ELIM_TRUE (strlen (ax_3[1]) < DIFF_MAX - 1); + ELIM_TRUE (strlen (ax_3[9]) < DIFF_MAX - 1); + ELIM_TRUE (strlen (ax_3[i]) < DIFF_MAX - 1); ELIM_TRUE (strlen (a3) < sizeof a3); ELIM_TRUE (strlen (a7) < sizeof a7); @@ -134,17 +134,17 @@ void elim_pointer_to_arrays (void) ELIM_TRUE (strlen (*pa5) < 5); ELIM_TRUE (strlen (*pa3) < 3); - ELIM_TRUE (strlen ((*pa7_3)[0]) < 3); - ELIM_TRUE (strlen ((*pa7_3)[1]) < 3); - ELIM_TRUE (strlen ((*pa7_3)[6]) < 3); + ELIM_TRUE (strlen ((*pa7_3)[0]) < 21); + ELIM_TRUE (strlen ((*pa7_3)[1]) < 21); + ELIM_TRUE (strlen ((*pa7_3)[6]) < 21); - ELIM_TRUE (strlen ((*pax_3)[0]) < 3); - ELIM_TRUE (strlen ((*pax_3)[1]) < 3); - ELIM_TRUE (strlen ((*pax_3)[9]) < 3); + ELIM_TRUE (strlen ((*pax_3)[0]) < DIFF_MAX - 1); + ELIM_TRUE (strlen ((*pax_3)[1]) < DIFF_MAX - 1); + ELIM_TRUE (strlen ((*pax_3)[9]) < DIFF_MAX - 1); - ELIM_TRUE (strlen ((*pa5_7)[0]) < 7); - ELIM_TRUE (strlen ((*pa5_7)[1]) < 7); - ELIM_TRUE (strlen ((*pa5_7)[4]) < 7); + ELIM_TRUE (strlen ((*pa5_7)[0]) < 35); + ELIM_TRUE (strlen ((*pa5_7)[1]) < 35); + ELIM_TRUE (strlen ((*pa5_7)[4]) < 35); } void elim_global_arrays_and_strings (int i) @@ -198,11 +198,11 @@ void elim_member_arrays_obj (int i) ELIM_TRUE (strlen (ma0_3_5_7[1][1][0].a5) < 5); ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a7_3[0]) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a7_3[2]) < 3); + ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a7_3[0]) < 21); + ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a7_3[2]) < 21); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a5_7[4]) < 7); + ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a5_7[0]) < 35); + ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a5_7[4]) < 35); } void elim_member_arrays_ptr (struct MemArrays0 *ma0, @@ -210,19 +210,23 @@ void elim_member_arrays_ptr (struct MemArrays0 *ma struct MemArrays7 *ma7, int i) { - ELIM_TRUE (strlen (ma0->a7_3[0]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[1]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[6]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[6]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[i]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[i]) < 3); + ELIM_TRUE (strlen (ma0->a7_3[0]) < 21); + ELIM_TRUE (strlen (ma0->a7_3[1]) < 21); + ELIM_TRUE (strlen (ma0->a7_3[6]) < 21); + ELIM_TRUE (strlen (ma0->a7_3[6]) < 21); + ELIM_TRUE (strlen (ma0->a7_3[i]) < 21); + ELIM_TRUE (strlen (ma0->a7_3[i]) < 21); - ELIM_TRUE (strlen (ma0->a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[0].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[1].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[1].a5_7[4]) < 7); - ELIM_TRUE (strlen (ma0[9].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[9].a5_7[4]) < 7); + ELIM_TRUE (strlen (ma0->a5_7[0]) < 35); + ELIM_TRUE (strlen (ma0[0].a5_7[0]) < 35); +#if 0 + /* This is tranformed into strlen ((const char *) &(ma0 + 64)->a5_7[0]) + which looks like a type cast and fails the check in get_range_strlen. */ + ELIM_TRUE (strlen (ma0[1].a5_7[0]) < 35); + ELIM_TRUE (strlen (ma0[1].a5_7[4]) < 35); + ELIM_TRUE (strlen (ma0[9].a5_7[0]) < 35); + ELIM_TRUE (strlen (ma0[9].a5_7[4]) < 35); +#endif ELIM_TRUE (strlen (ma0->a3) < sizeof ma0->a3); ELIM_TRUE (strlen (ma0->a5) < sizeof ma0->a5); Index: gcc/testsuite/gcc.dg/strlenopt-45.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-45.c (revision 262904) +++ gcc/testsuite/gcc.dg/strlenopt-45.c (working copy) @@ -85,19 +85,19 @@ void elim_strnlen_arr_cst (void) ELIM (strnlen (a3_7[0], 1) < 2); ELIM (strnlen (a3_7[0], 2) < 3); ELIM (strnlen (a3_7[0], 3) < 4); - ELIM (strnlen (a3_7[0], 9) < 8); - ELIM (strnlen (a3_7[0], PTRDIFF_MAX) < 8); - ELIM (strnlen (a3_7[0], SIZE_MAX) < 8); - ELIM (strnlen (a3_7[0], -1) < 8); + ELIM (strnlen (a3_7[0], 9) < 10); + ELIM (strnlen (a3_7[0], PTRDIFF_MAX) < 21); + ELIM (strnlen (a3_7[0], SIZE_MAX) < 21); + ELIM (strnlen (a3_7[0], -1) < 21); ELIM (strnlen (a3_7[2], 0) == 0); ELIM (strnlen (a3_7[2], 1) < 2); ELIM (strnlen (a3_7[2], 2) < 3); ELIM (strnlen (a3_7[2], 3) < 4); - ELIM (strnlen (a3_7[2], 9) < 8); - ELIM (strnlen (a3_7[2], PTRDIFF_MAX) < 8); - ELIM (strnlen (a3_7[2], SIZE_MAX) < 8); - ELIM (strnlen (a3_7[2], -1) < 8); + ELIM (strnlen (a3_7[2], 9) < 10); + ELIM (strnlen (a3_7[2], PTRDIFF_MAX) < 21); + ELIM (strnlen (a3_7[2], SIZE_MAX) < 21); + ELIM (strnlen (a3_7[2], -1) < 21); ELIM (strnlen ((char*)a3_7, 0) == 0); ELIM (strnlen ((char*)a3_7, 1) < 2); @@ -106,10 +106,10 @@ void elim_strnlen_arr_cst (void) ELIM (strnlen ((char*)a3_7, 9) < 10); ELIM (strnlen ((char*)a3_7, 19) < 20); ELIM (strnlen ((char*)a3_7, 21) < 22); - ELIM (strnlen ((char*)a3_7, 23) < 22); - ELIM (strnlen ((char*)a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)a3_7, -1) < 22); + ELIM (strnlen ((char*)a3_7, 23) < 21); + ELIM (strnlen ((char*)a3_7, PTRDIFF_MAX) < 21); + ELIM (strnlen ((char*)a3_7, SIZE_MAX) < 21); + ELIM (strnlen ((char*)a3_7, -1) < 21); ELIM (strnlen (ax, 0) == 0); ELIM (strnlen (ax, 1) < 2); @@ -154,37 +154,39 @@ void elim_strnlen_memarr_cst (struct MemArrays *p, ELIM (strnlen (p->a3, 1) < 2); ELIM (strnlen (p->a3, 2) < 3); ELIM (strnlen (p->a3, 3) < 4); - ELIM (strnlen (p->a3, 9) < 4); - ELIM (strnlen (p->a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p->a3, SIZE_MAX) < 4); - ELIM (strnlen (p->a3, -1) < 4); + ELIM (strnlen (p->a3, 9) < 3); + ELIM (strnlen (p->a3, PTRDIFF_MAX) < 3); + ELIM (strnlen (p->a3, SIZE_MAX) < 3); + ELIM (strnlen (p->a3, -1) < 3); ELIM (strnlen (p[i].a3, 0) == 0); ELIM (strnlen (p[i].a3, 1) < 2); ELIM (strnlen (p[i].a3, 2) < 3); ELIM (strnlen (p[i].a3, 3) < 4); - ELIM (strnlen (p[i].a3, 9) < 4); - ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p[i].a3, SIZE_MAX) < 4); - ELIM (strnlen (p[i].a3, -1) < 4); + ELIM (strnlen (p[i].a3, 9) < 3); + ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 3); + ELIM (strnlen (p[i].a3, SIZE_MAX) < 3); + ELIM (strnlen (p[i].a3, -1) < 3); ELIM (strnlen (p->a3_7[0], 0) == 0); ELIM (strnlen (p->a3_7[0], 1) < 2); ELIM (strnlen (p->a3_7[0], 2) < 3); ELIM (strnlen (p->a3_7[0], 3) < 4); - ELIM (strnlen (p->a3_7[0], 9) < 8); - ELIM (strnlen (p->a3_7[0], PTRDIFF_MAX) < 8); - ELIM (strnlen (p->a3_7[0], SIZE_MAX) < 8); - ELIM (strnlen (p->a3_7[0], -1) < 8); + ELIM (strnlen (p->a3_7[0], 9) < 10); + ELIM (strnlen (p->a3_7[0], 21) < 22); + ELIM (strnlen (p->a3_7[0], PTRDIFF_MAX) < 21); + ELIM (strnlen (p->a3_7[0], SIZE_MAX) < 21); + ELIM (strnlen (p->a3_7[0], -1) < 21); ELIM (strnlen (p->a3_7[2], 0) == 0); ELIM (strnlen (p->a3_7[2], 1) < 2); ELIM (strnlen (p->a3_7[2], 2) < 3); ELIM (strnlen (p->a3_7[2], 3) < 4); - ELIM (strnlen (p->a3_7[2], 9) < 8); - ELIM (strnlen (p->a3_7[2], PTRDIFF_MAX) < 8); - ELIM (strnlen (p->a3_7[2], SIZE_MAX) < 8); - ELIM (strnlen (p->a3_7[2], -1) < 8); + ELIM (strnlen (p->a3_7[2], 9) < 10); + ELIM (strnlen (p->a3_7[2], 21) < 22); + ELIM (strnlen (p->a3_7[2], PTRDIFF_MAX) < 21); + ELIM (strnlen (p->a3_7[2], SIZE_MAX) < 21); + ELIM (strnlen (p->a3_7[2], -1) < 21); ELIM (strnlen (p->a3_7[i], 0) == 0); ELIM (strnlen (p->a3_7[i], 1) < 2); Index: gcc/testsuite/gcc.dg/strlenopt-48.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-48.c (revision 262904) +++ gcc/testsuite/gcc.dg/strlenopt-48.c (working copy) @@ -9,8 +9,8 @@ void f (void) { - extern char a[2][1]; - int n = strlen (a[1]); + extern char a[1][1]; + int n = strlen (a[0]); if (n) abort(); } @@ -17,8 +17,8 @@ void f (void) void g (void) { - extern char b[3][2][1]; - int n = strlen (b[2][1]); + extern char b[1][1][1]; + int n = strlen (b[0][0]); if (n) abort(); } @@ -25,8 +25,8 @@ void g (void) void h (void) { - extern char c[4][3][2][1]; - int n = strlen (c[3][2][1]); + extern char c[1][1][1][1]; + int n = strlen (c[0][0][0]); if (n) abort(); } Index: gcc/testsuite/gcc.dg/strlenopt-51.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-51.c (revision 262904) +++ gcc/testsuite/gcc.dg/strlenopt-51.c (working copy) @@ -101,7 +101,7 @@ void test_keep_a9_9 (int i) { #undef T #define T(I) \ - KEEP (strlen (&a9_9[i][I][0]) > (1 + I) % 9); \ + KEEP (strlen (&a9_9[i][I][0]) > (0 + I) % 9); \ KEEP (strlen (&a9_9[i][I][1]) > (1 + I) % 9); \ KEEP (strlen (&a9_9[i][I][2]) > (2 + I) % 9); \ KEEP (strlen (&a9_9[i][I][3]) > (3 + I) % 9); \ @@ -115,7 +115,7 @@ void test_keep_a9_9 (int i) } /* { dg-final { scan-tree-dump-times "strlen" 72 "gimple" } } - { dg-final { scan-tree-dump-times "strlen" 63 "optimized" } } + { dg-final { scan-tree-dump-times "strlen" 72 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 72 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } */ Index: gcc/testsuite/gcc.dg/strlenopt-54.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-54.c (revision 0) +++ gcc/testsuite/gcc.dg/strlenopt-54.c (working copy) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +typedef char A[6]; +typedef char B[2][3]; + +A a; + +void test (void) +{ + B* b = (B*) a; + if (__builtin_strlen ((*b)[0]) > 2) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "optimized" } } */ ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-24 7:59 [PATCH] Make strlen range computations more conservative Bernd Edlinger @ 2018-07-24 14:50 ` Richard Biener 2018-07-25 13:03 ` Bernd Edlinger 2018-07-24 16:14 ` Martin Sebor 2018-07-24 21:46 ` Jeff Law 2 siblings, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-07-24 14:50 UTC (permalink / raw) To: Bernd Edlinger; +Cc: GCC Patches, Jeff Law, Jakub Jelinek On Tue, 24 Jul 2018, Bernd Edlinger wrote: > Hi! > > This patch makes strlen range computations more conservative. > > Firstly if there is a visible type cast from type A to B before passing > then value to strlen, don't expect the type layout of B to restrict the > possible return value range of strlen. > > Furthermore use the outermost enclosing array instead of the > innermost one, because too aggressive optimization will likely > convert harmless errors into security-relevant errors, because > as the existing test cases demonstrate, this optimization is actively > attacking string length checks in user code, while and not giving > any warnings. > > > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? I'd like us to be explicit in what we support, not what we do not support, thus + /* Avoid arrays of pointers. */ + if (TREE_CODE (TREE_TYPE (arg)) == POINTER_TYPE) + return false; should become /* We handle arrays of integer types. */ if (TREE_CODE (TRE_TYPE (arg)) != INTEGER_TYPE) return false; + tree base = arg; + while (TREE_CODE (base) == ARRAY_REF + || TREE_CODE (base) == ARRAY_RANGE_REF + || TREE_CODE (base) == COMPONENT_REF) + base = TREE_OPERAND (base, 0); + + /* If this looks like a type cast don't assume anything. */ + if ((TREE_CODE (base) == MEM_REF + && (! integer_zerop (TREE_OPERAND (base, 1)) + || TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0))) + != TREE_TYPE (base))) + || TREE_CODE (base) == VIEW_CONVERT_EXPR) return false; likewise - you miss to handle BIT_FIELD_REF. So, instead if (!(DECL_P (base) || TREE_CODE (base) == STRING_CST || (TREE_CODE (base) == MEM_REF && ... you should look at comparing TYPE_MAIN_VARIANT in your type check, aligned/unaligned or const/non-const accesses shouldn't be considered a "type cast". Maybe even use types_compatible_p. Not sure why you enforce zero-offset MEMs? Do we in the end only handle &decl bases of MEMs? Given you strip arbitrary COMPONENT_REFs the offset in a MEM isn't so much different? It looks like the component-ref stripping plus type-check part could be factored out into sth like get_base_address? I don't have a good name or suggested semantics for it though. Richard. > > Thanks > Bernd. -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-24 14:50 ` Richard Biener @ 2018-07-25 13:03 ` Bernd Edlinger 0 siblings, 0 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-07-25 13:03 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches, Jeff Law, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 5422 bytes --] On 07/24/18 16:50, Richard Biener wrote: > On Tue, 24 Jul 2018, Bernd Edlinger wrote: > >> Hi! >> >> This patch makes strlen range computations more conservative. >> >> Firstly if there is a visible type cast from type A to B before passing >> then value to strlen, don't expect the type layout of B to restrict the >> possible return value range of strlen. >> >> Furthermore use the outermost enclosing array instead of the >> innermost one, because too aggressive optimization will likely >> convert harmless errors into security-relevant errors, because >> as the existing test cases demonstrate, this optimization is actively >> attacking string length checks in user code, while and not giving >> any warnings. >> >> >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? > > I'd like us to be explicit in what we support, not what we do not > support, thus > > + /* Avoid arrays of pointers. */ > + if (TREE_CODE (TREE_TYPE (arg)) == POINTER_TYPE) > + return false; > > should become > > /* We handle arrays of integer types. */ > if (TREE_CODE (TRE_TYPE (arg)) != INTEGER_TYPE) > return false; > Yes. I think I can also check the TYPE_MODE/PRECISION. > + tree base = arg; > + while (TREE_CODE (base) == ARRAY_REF > + || TREE_CODE (base) == ARRAY_RANGE_REF > + || TREE_CODE (base) == COMPONENT_REF) > + base = TREE_OPERAND (base, 0); > + > + /* If this looks like a type cast don't assume anything. */ > + if ((TREE_CODE (base) == MEM_REF > + && (! integer_zerop (TREE_OPERAND (base, 1)) > + || TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0))) > + != TREE_TYPE (base))) > + || TREE_CODE (base) == VIEW_CONVERT_EXPR) > return false; > > likewise - you miss to handle BIT_FIELD_REF. So, instead > I did not expect to see BIT_FIELD_REF in the inner tree elements, but you never know. The new version handles them, and bails out if they happen to be there. Is this handling now how you wanted it to be? > if (!(DECL_P (base) > || TREE_CODE (base) == STRING_CST > || (TREE_CODE (base) == MEM_REF > && ...> > you should look at comparing TYPE_MAIN_VARIANT in your type > check, aligned/unaligned or const/non-const accesses shouldn't > be considered a "type cast". Maybe even use Good point. TYPE_MAIN_VARIANT is my friend. > types_compatible_p. Not sure why you enforce zero-offset MEMs? > Do we in the end only handle &decl bases of MEMs? Given you > strip arbitrary COMPONENT_REFs the offset in a MEM isn't > so much different? > something like: ma0[1].a5_7[0] gets transformed into: (const char *) &(ma0 + 64)->a5_7[0] ma0 + 64 is a MEM_REF[&ma0, 64] I don't really think that happens too often, but I think other weirdo-type casts can look quite similar. But that happens very rarely. > It looks like the component-ref stripping plus type-check part > could be factored out into sth like get_base_address? I don't > have a good name or suggested semantics for it though. > Yes, done. While playing with the now more rigorous type checking I noticed something that is most likely a pre-existent programming error: @@ -1310,8 +1350,8 @@ get_range_strlen (tree arg, tree length[2], bitmap member. */ tree idx = TREE_OPERAND (op, 1); - arg = TREE_OPERAND (op, 0); - tree optype = TREE_TYPE (arg); + op = TREE_OPERAND (op, 0); + tree optype = TREE_TYPE (op); if (tree dom = TYPE_DOMAIN (optype)) if (tree bound = TYPE_MAX_VALUE (dom)) if (TREE_CODE (bound) == INTEGER_CST I believe this was not meant to change "arg". This is in a block that is guarded by: /* We can end up with &(*iftmp_1)[0] here as well, so handle it. */ if (TREE_CODE (arg) == ADDR_EXPR && TREE_CODE (TREE_OPERAND (arg, 0)) == ARRAY_REF) so this is entered with arg = &ma0_3_5_7[0][0][0].a5_7[4] op = ma0_3_5_7[0][0][0].a5_7[4] at this point, then arg is accidentally set to TREE_OPERAND (op, 0) which now makes arg = ma0_3_5_7[0][0][0].a5_7 this did not pass the type check in get_inner_char_array_unless_typecast, but more importantly this is passed to val = c_strlen (arg, 1), which will likely return the first array initializer instead of the fifth. I have also added an else here: @@ -1400,8 +1432,7 @@ get_range_strlen (tree arg, tree length[2], bitmap the array could have zero length. */ *minlen = ssize_int (0); } - - if (VAR_P (arg)) + else if (VAR_P (arg)) { tree type = TREE_TYPE (arg); if (POINTER_TYPE_P (type)) because I noticed that the control flow can reach this if from the previous if statement, but the range info has already been set at that point. Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Not asking for OK right now, since Jeff asked me to hold this patch for the moment, but I just wanted to keep you informed, where I am right now. Thanks Bernd. [-- Attachment #2: changelog-range-strlen-v2.txt --] [-- Type: text/plain, Size: 617 bytes --] gcc: 2018-07-24 Bernd Edlinger <bernd.edlinger@hotmail.de> * gimple-fold.c (get_inner_char_array_unless_typecast): Helper function for strlen range estimations. (get_range_strlen): Use get_inner_char_array_unless_typecast. * tree-ssa-strlen.c (maybe_set_strlen_range): Likewise. * gimple-fold.h (get_inner_char_array_unless_typecast): Declare. testsuite: 2018-07-24 Bernd Edlinger <bernd.edlinger@hotmail.de> * gcc.dg/strlenopt-40.c: Adjust test expectations. * gcc.dg/strlenopt-45.c: Likewise. * gcc.dg/strlenopt-48.c: Likewise. * gcc.dg/strlenopt-51.c: Likewise. * gcc.dg/strlenopt-54.c: New test. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: patch-range-strlen-v2.diff --] [-- Type: text/x-patch; name="patch-range-strlen-v2.diff", Size: 23726 bytes --] Index: gcc/gimple-fold.c =================================================================== --- gcc/gimple-fold.c (revision 262933) +++ gcc/gimple-fold.c (working copy) @@ -1257,7 +1257,47 @@ gimple_fold_builtin_memset (gimple_stmt_iterator * return true; } +/* Obtain the inner char array for strlen range estimations. + Return NULL if ARG is not a char array, or if the inner reference + chain goes through a type cast. */ +tree +get_inner_char_array_unless_typecast (tree arg) +{ + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (arg)) != ARRAY_TYPE + || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != INTEGER_TYPE + || TYPE_MODE (TREE_TYPE (TREE_TYPE (arg))) != TYPE_MODE (char_type_node) + || TYPE_PRECISION (TREE_TYPE (TREE_TYPE (arg))) + != TYPE_PRECISION (char_type_node)) + return NULL_TREE; + + /* Look for the innermost enclosing array. */ + while (TREE_CODE (arg) == ARRAY_REF + && TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 0))) == ARRAY_TYPE) + arg = TREE_OPERAND (arg, 0); + + tree base = arg; + while (TREE_CODE (base) == ARRAY_REF + || TREE_CODE (base) == ARRAY_RANGE_REF + || TREE_CODE (base) == COMPONENT_REF) + base = TREE_OPERAND (base, 0); + + /* If this looks like a type cast don't assume anything. */ + if ((TREE_CODE (base) == MEM_REF + && (! integer_zerop (TREE_OPERAND (base, 1)) + || TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0)))) + != TYPE_MAIN_VARIANT (TREE_TYPE (base)))) + || TREE_CODE (base) == VIEW_CONVERT_EXPR + /* Or other stuff that would be handled by get_inner_reference. */ + || TREE_CODE (base) == BIT_FIELD_REF + || TREE_CODE (base) == REALPART_EXPR + || TREE_CODE (base) == IMAGPART_EXPR) + return NULL_TREE; + + return arg; +} + /* Obtain the minimum and maximum string length or minimum and maximum value of ARG in LENGTH[0] and LENGTH[1], respectively. If ARG is an SSA name variable, follow its use-def chains. When @@ -1310,8 +1350,8 @@ get_range_strlen (tree arg, tree length[2], bitmap member. */ tree idx = TREE_OPERAND (op, 1); - arg = TREE_OPERAND (op, 0); - tree optype = TREE_TYPE (arg); + op = TREE_OPERAND (op, 0); + tree optype = TREE_TYPE (op); if (tree dom = TYPE_DOMAIN (optype)) if (tree bound = TYPE_MAX_VALUE (dom)) if (TREE_CODE (bound) == INTEGER_CST @@ -1339,19 +1379,13 @@ get_range_strlen (tree arg, tree length[2], bitmap if (TREE_CODE (arg) == ARRAY_REF) { - tree type = TREE_TYPE (TREE_OPERAND (arg, 0)); + arg = get_inner_char_array_unless_typecast (arg); + if (!arg) + return false; - /* Determine the "innermost" array type. */ - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); + tree type = TREE_TYPE (arg); - /* Avoid arrays of pointers. */ - tree eltype = TREE_TYPE (type); - if (TREE_CODE (type) != ARRAY_TYPE - || !INTEGRAL_TYPE_P (eltype)) - return false; - + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); if (!val || integer_zerop (val)) return false; @@ -1362,15 +1396,17 @@ get_range_strlen (tree arg, tree length[2], bitmap the array could have zero length. */ *minlen = ssize_int (0); - if (TREE_CODE (TREE_OPERAND (arg, 0)) == COMPONENT_REF - && type == TREE_TYPE (TREE_OPERAND (arg, 0)) - && array_at_struct_end_p (TREE_OPERAND (arg, 0))) + if (TREE_CODE (arg) == COMPONENT_REF + && type == TREE_TYPE (arg) + && array_at_struct_end_p (arg)) *flexp = true; } - else if (TREE_CODE (arg) == COMPONENT_REF - && (TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 1))) - == ARRAY_TYPE)) + else if (TREE_CODE (arg) == COMPONENT_REF) { + arg = get_inner_char_array_unless_typecast (arg); + if (!arg) + return false; + /* Use the type of the member array to determine the upper bound on the length of the array. This may be overly optimistic if the array itself isn't NUL-terminated and @@ -1386,10 +1422,6 @@ get_range_strlen (tree arg, tree length[2], bitmap tree type = TREE_TYPE (arg); - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); - /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); if (!val || integer_zerop (val)) @@ -1400,8 +1432,7 @@ get_range_strlen (tree arg, tree length[2], bitmap the array could have zero length. */ *minlen = ssize_int (0); } - - if (VAR_P (arg)) + else if (VAR_P (arg)) { tree type = TREE_TYPE (arg); if (POINTER_TYPE_P (type)) @@ -1409,13 +1440,16 @@ get_range_strlen (tree arg, tree length[2], bitmap if (TREE_CODE (type) == ARRAY_TYPE) { + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE) + return false; + + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); - if (!val - || TREE_CODE (val) != INTEGER_CST - || integer_zerop (val)) + if (!val || integer_zerop (val)) return false; - val = wide_int_to_tree (TREE_TYPE (val), - wi::sub (wi::to_wide (val), 1)); + val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, + integer_one_node); /* Set the minimum size to zero since the string in the array could have zero length. */ *minlen = ssize_int (0); Index: gcc/gimple-fold.h =================================================================== --- gcc/gimple-fold.h (revision 262933) +++ gcc/gimple-fold.h (working copy) @@ -61,6 +61,7 @@ extern bool gimple_fold_builtin_snprintf (gimple_s extern bool arith_code_with_undefined_signed_overflow (tree_code); extern gimple_seq rewrite_to_defined_overflow (gimple *); extern void replace_call_with_value (gimple_stmt_iterator *, tree); +extern tree get_inner_char_array_unless_typecast (tree); /* gimple_build, functionally matching fold_buildN, outputs stmts int the provided sequence, matching and simplifying them on-the-fly. Index: gcc/tree-ssa-strlen.c =================================================================== --- gcc/tree-ssa-strlen.c (revision 262933) +++ gcc/tree-ssa-strlen.c (working copy) @@ -1149,11 +1149,15 @@ maybe_set_strlen_range (tree lhs, tree src, tree b if (TREE_CODE (src) == ADDR_EXPR) { + src = TREE_OPERAND (src, 0); + + src = get_inner_char_array_unless_typecast (src); + + if (!src) + ; /* The last array member of a struct can be bigger than its size suggests if it's treated as a poor-man's flexible array member. */ - src = TREE_OPERAND (src, 0); - bool src_is_array = TREE_CODE (TREE_TYPE (src)) == ARRAY_TYPE; - if (src_is_array && !array_at_struct_end_p (src)) + else if (!array_at_struct_end_p (src)) { tree type = TREE_TYPE (src); if (tree size = TYPE_SIZE_UNIT (type)) @@ -1170,8 +1174,6 @@ maybe_set_strlen_range (tree lhs, tree src, tree b } else { - if (TREE_CODE (src) == COMPONENT_REF && !src_is_array) - src = TREE_OPERAND (src, 1); if (DECL_P (src)) { /* Handle the unlikely case of strlen (&c) where c is some Index: gcc/testsuite/gcc.dg/strlenopt-40.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-40.c (revision 262933) +++ gcc/testsuite/gcc.dg/strlenopt-40.c (working copy) @@ -105,20 +105,20 @@ void elim_global_arrays (int i) /* Verify that the expression involving the strlen call as well as whatever depends on it is eliminated from the test output. All these expressions must be trivially true. */ - ELIM_TRUE (strlen (a7_3[0]) < sizeof a7_3[0]); - ELIM_TRUE (strlen (a7_3[1]) < sizeof a7_3[1]); - ELIM_TRUE (strlen (a7_3[6]) < sizeof a7_3[6]); - ELIM_TRUE (strlen (a7_3[i]) < sizeof a7_3[i]); + ELIM_TRUE (strlen (a7_3[0]) < sizeof a7_3); + ELIM_TRUE (strlen (a7_3[1]) < sizeof a7_3); + ELIM_TRUE (strlen (a7_3[6]) < sizeof a7_3); + ELIM_TRUE (strlen (a7_3[i]) < sizeof a7_3); - ELIM_TRUE (strlen (a5_7[0]) < sizeof a5_7[0]); - ELIM_TRUE (strlen (a5_7[1]) < sizeof a5_7[1]); - ELIM_TRUE (strlen (a5_7[4]) < sizeof a5_7[4]); - ELIM_TRUE (strlen (a5_7[i]) < sizeof a5_7[0]); + ELIM_TRUE (strlen (a5_7[0]) < sizeof a5_7); + ELIM_TRUE (strlen (a5_7[1]) < sizeof a5_7); + ELIM_TRUE (strlen (a5_7[4]) < sizeof a5_7); + ELIM_TRUE (strlen (a5_7[i]) < sizeof a5_7); - ELIM_TRUE (strlen (ax_3[0]) < sizeof ax_3[0]); - ELIM_TRUE (strlen (ax_3[1]) < sizeof ax_3[1]); - ELIM_TRUE (strlen (ax_3[9]) < sizeof ax_3[9]); - ELIM_TRUE (strlen (ax_3[i]) < sizeof ax_3[i]); + ELIM_TRUE (strlen (ax_3[0]) < DIFF_MAX - 1); + ELIM_TRUE (strlen (ax_3[1]) < DIFF_MAX - 1); + ELIM_TRUE (strlen (ax_3[9]) < DIFF_MAX - 1); + ELIM_TRUE (strlen (ax_3[i]) < DIFF_MAX - 1); ELIM_TRUE (strlen (a3) < sizeof a3); ELIM_TRUE (strlen (a7) < sizeof a7); @@ -134,17 +134,17 @@ void elim_pointer_to_arrays (void) ELIM_TRUE (strlen (*pa5) < 5); ELIM_TRUE (strlen (*pa3) < 3); - ELIM_TRUE (strlen ((*pa7_3)[0]) < 3); - ELIM_TRUE (strlen ((*pa7_3)[1]) < 3); - ELIM_TRUE (strlen ((*pa7_3)[6]) < 3); + ELIM_TRUE (strlen ((*pa7_3)[0]) < 21); + ELIM_TRUE (strlen ((*pa7_3)[1]) < 21); + ELIM_TRUE (strlen ((*pa7_3)[6]) < 21); - ELIM_TRUE (strlen ((*pax_3)[0]) < 3); - ELIM_TRUE (strlen ((*pax_3)[1]) < 3); - ELIM_TRUE (strlen ((*pax_3)[9]) < 3); + ELIM_TRUE (strlen ((*pax_3)[0]) < DIFF_MAX - 1); + ELIM_TRUE (strlen ((*pax_3)[1]) < DIFF_MAX - 1); + ELIM_TRUE (strlen ((*pax_3)[9]) < DIFF_MAX - 1); - ELIM_TRUE (strlen ((*pa5_7)[0]) < 7); - ELIM_TRUE (strlen ((*pa5_7)[1]) < 7); - ELIM_TRUE (strlen ((*pa5_7)[4]) < 7); + ELIM_TRUE (strlen ((*pa5_7)[0]) < 35); + ELIM_TRUE (strlen ((*pa5_7)[1]) < 35); + ELIM_TRUE (strlen ((*pa5_7)[4]) < 35); } void elim_global_arrays_and_strings (int i) @@ -198,11 +198,11 @@ void elim_member_arrays_obj (int i) ELIM_TRUE (strlen (ma0_3_5_7[1][1][0].a5) < 5); ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a7_3[0]) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a7_3[2]) < 3); + ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a7_3[0]) < 21); + ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a7_3[2]) < 21); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a5_7[4]) < 7); + ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a5_7[0]) < 35); + ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a5_7[4]) < 35); } void elim_member_arrays_ptr (struct MemArrays0 *ma0, @@ -210,19 +210,23 @@ void elim_member_arrays_ptr (struct MemArrays0 *ma struct MemArrays7 *ma7, int i) { - ELIM_TRUE (strlen (ma0->a7_3[0]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[1]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[6]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[6]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[i]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[i]) < 3); + ELIM_TRUE (strlen (ma0->a7_3[0]) < 21); + ELIM_TRUE (strlen (ma0->a7_3[1]) < 21); + ELIM_TRUE (strlen (ma0->a7_3[6]) < 21); + ELIM_TRUE (strlen (ma0->a7_3[6]) < 21); + ELIM_TRUE (strlen (ma0->a7_3[i]) < 21); + ELIM_TRUE (strlen (ma0->a7_3[i]) < 21); - ELIM_TRUE (strlen (ma0->a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[0].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[1].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[1].a5_7[4]) < 7); - ELIM_TRUE (strlen (ma0[9].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[9].a5_7[4]) < 7); + ELIM_TRUE (strlen (ma0->a5_7[0]) < 35); + ELIM_TRUE (strlen (ma0[0].a5_7[0]) < 35); +#if 0 + /* This is tranformed into strlen ((const char *) &(ma0 + 64)->a5_7[0]) + which looks like a type cast and fails the check in get_range_strlen. */ + ELIM_TRUE (strlen (ma0[1].a5_7[0]) < 35); + ELIM_TRUE (strlen (ma0[1].a5_7[4]) < 35); + ELIM_TRUE (strlen (ma0[9].a5_7[0]) < 35); + ELIM_TRUE (strlen (ma0[9].a5_7[4]) < 35); +#endif ELIM_TRUE (strlen (ma0->a3) < sizeof ma0->a3); ELIM_TRUE (strlen (ma0->a5) < sizeof ma0->a5); Index: gcc/testsuite/gcc.dg/strlenopt-45.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-45.c (revision 262933) +++ gcc/testsuite/gcc.dg/strlenopt-45.c (working copy) @@ -43,7 +43,6 @@ extern size_t strnlen (const char *, size_t); else \ FAIL (made_in_false_branch) -extern char c; extern char a1[1]; extern char a3[3]; extern char a5[5]; @@ -52,18 +51,6 @@ extern char ax[]; void elim_strnlen_arr_cst (void) { - /* The length of a string stored in a one-element array must be zero. - The result reported by strnlen() for such an array can be non-zero - only when the bound is equal to 1 (in which case the result must - be one). */ - ELIM (strnlen (&c, 0) == 0); - ELIM (strnlen (&c, 1) < 2); - ELIM (strnlen (&c, 2) == 0); - ELIM (strnlen (&c, 9) == 0); - ELIM (strnlen (&c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&c, SIZE_MAX) == 0); - ELIM (strnlen (&c, -1) == 0); - ELIM (strnlen (a1, 0) == 0); ELIM (strnlen (a1, 1) < 2); ELIM (strnlen (a1, 2) == 0); @@ -85,31 +72,31 @@ void elim_strnlen_arr_cst (void) ELIM (strnlen (a3_7[0], 1) < 2); ELIM (strnlen (a3_7[0], 2) < 3); ELIM (strnlen (a3_7[0], 3) < 4); - ELIM (strnlen (a3_7[0], 9) < 8); - ELIM (strnlen (a3_7[0], PTRDIFF_MAX) < 8); - ELIM (strnlen (a3_7[0], SIZE_MAX) < 8); - ELIM (strnlen (a3_7[0], -1) < 8); + ELIM (strnlen (a3_7[0], 9) < 10); + ELIM (strnlen (a3_7[0], PTRDIFF_MAX) < 21); + ELIM (strnlen (a3_7[0], SIZE_MAX) < 21); + ELIM (strnlen (a3_7[0], -1) < 21); ELIM (strnlen (a3_7[2], 0) == 0); ELIM (strnlen (a3_7[2], 1) < 2); ELIM (strnlen (a3_7[2], 2) < 3); ELIM (strnlen (a3_7[2], 3) < 4); - ELIM (strnlen (a3_7[2], 9) < 8); - ELIM (strnlen (a3_7[2], PTRDIFF_MAX) < 8); - ELIM (strnlen (a3_7[2], SIZE_MAX) < 8); - ELIM (strnlen (a3_7[2], -1) < 8); + ELIM (strnlen (a3_7[2], 9) < 10); + ELIM (strnlen (a3_7[2], PTRDIFF_MAX) < 21); + ELIM (strnlen (a3_7[2], SIZE_MAX) < 21); + ELIM (strnlen (a3_7[2], -1) < 21); - ELIM (strnlen ((char*)a3_7, 0) == 0); - ELIM (strnlen ((char*)a3_7, 1) < 2); - ELIM (strnlen ((char*)a3_7, 2) < 3); - ELIM (strnlen ((char*)a3_7, 3) < 4); - ELIM (strnlen ((char*)a3_7, 9) < 10); - ELIM (strnlen ((char*)a3_7, 19) < 20); - ELIM (strnlen ((char*)a3_7, 21) < 22); - ELIM (strnlen ((char*)a3_7, 23) < 22); - ELIM (strnlen ((char*)a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)a3_7, -1) < 22); + ELIM (strnlen ((char*)a3_7[0], 0) == 0); + ELIM (strnlen ((char*)a3_7[0], 1) < 2); + ELIM (strnlen ((char*)a3_7[0], 2) < 3); + ELIM (strnlen ((char*)a3_7[0], 3) < 4); + ELIM (strnlen ((char*)a3_7[0], 9) < 10); + ELIM (strnlen ((char*)a3_7[0], 19) < 20); + ELIM (strnlen ((char*)a3_7[0], 21) < 22); + ELIM (strnlen ((char*)a3_7[0], 23) < 21); + ELIM (strnlen ((char*)a3_7[0], PTRDIFF_MAX) < 21); + ELIM (strnlen ((char*)a3_7[0], SIZE_MAX) < 21); + ELIM (strnlen ((char*)a3_7[0], -1) < 21); ELIM (strnlen (ax, 0) == 0); ELIM (strnlen (ax, 1) < 2); @@ -122,7 +109,6 @@ void elim_strnlen_arr_cst (void) struct MemArrays { - char c; char a0[0]; char a1[1]; char a3[3]; @@ -133,13 +119,6 @@ struct MemArrays void elim_strnlen_memarr_cst (struct MemArrays *p, int i) { - ELIM (strnlen (&p->c, 0) == 0); - ELIM (strnlen (&p->c, 1) < 2); - ELIM (strnlen (&p->c, 9) == 0); - ELIM (strnlen (&p->c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&p->c, SIZE_MAX) == 0); - ELIM (strnlen (&p->c, -1) == 0); - /* Other accesses to internal zero-length arrays are undefined. */ ELIM (strnlen (p->a0, 0) == 0); @@ -154,37 +133,39 @@ void elim_strnlen_memarr_cst (struct MemArrays *p, ELIM (strnlen (p->a3, 1) < 2); ELIM (strnlen (p->a3, 2) < 3); ELIM (strnlen (p->a3, 3) < 4); - ELIM (strnlen (p->a3, 9) < 4); - ELIM (strnlen (p->a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p->a3, SIZE_MAX) < 4); - ELIM (strnlen (p->a3, -1) < 4); + ELIM (strnlen (p->a3, 9) < 3); + ELIM (strnlen (p->a3, PTRDIFF_MAX) < 3); + ELIM (strnlen (p->a3, SIZE_MAX) < 3); + ELIM (strnlen (p->a3, -1) < 3); ELIM (strnlen (p[i].a3, 0) == 0); ELIM (strnlen (p[i].a3, 1) < 2); ELIM (strnlen (p[i].a3, 2) < 3); ELIM (strnlen (p[i].a3, 3) < 4); - ELIM (strnlen (p[i].a3, 9) < 4); - ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p[i].a3, SIZE_MAX) < 4); - ELIM (strnlen (p[i].a3, -1) < 4); + ELIM (strnlen (p[i].a3, 9) < 3); + ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 3); + ELIM (strnlen (p[i].a3, SIZE_MAX) < 3); + ELIM (strnlen (p[i].a3, -1) < 3); ELIM (strnlen (p->a3_7[0], 0) == 0); ELIM (strnlen (p->a3_7[0], 1) < 2); ELIM (strnlen (p->a3_7[0], 2) < 3); ELIM (strnlen (p->a3_7[0], 3) < 4); - ELIM (strnlen (p->a3_7[0], 9) < 8); - ELIM (strnlen (p->a3_7[0], PTRDIFF_MAX) < 8); - ELIM (strnlen (p->a3_7[0], SIZE_MAX) < 8); - ELIM (strnlen (p->a3_7[0], -1) < 8); + ELIM (strnlen (p->a3_7[0], 9) < 10); + ELIM (strnlen (p->a3_7[0], 21) < 22); + ELIM (strnlen (p->a3_7[0], PTRDIFF_MAX) < 21); + ELIM (strnlen (p->a3_7[0], SIZE_MAX) < 21); + ELIM (strnlen (p->a3_7[0], -1) < 21); ELIM (strnlen (p->a3_7[2], 0) == 0); ELIM (strnlen (p->a3_7[2], 1) < 2); ELIM (strnlen (p->a3_7[2], 2) < 3); ELIM (strnlen (p->a3_7[2], 3) < 4); - ELIM (strnlen (p->a3_7[2], 9) < 8); - ELIM (strnlen (p->a3_7[2], PTRDIFF_MAX) < 8); - ELIM (strnlen (p->a3_7[2], SIZE_MAX) < 8); - ELIM (strnlen (p->a3_7[2], -1) < 8); + ELIM (strnlen (p->a3_7[2], 9) < 10); + ELIM (strnlen (p->a3_7[2], 21) < 22); + ELIM (strnlen (p->a3_7[2], PTRDIFF_MAX) < 21); + ELIM (strnlen (p->a3_7[2], SIZE_MAX) < 21); + ELIM (strnlen (p->a3_7[2], -1) < 21); ELIM (strnlen (p->a3_7[i], 0) == 0); ELIM (strnlen (p->a3_7[i], 1) < 2); @@ -203,17 +184,17 @@ void elim_strnlen_memarr_cst (struct MemArrays *p, ELIM (strnlen (p->a3_7[i], 19) < 20); #endif - ELIM (strnlen ((char*)p->a3_7, 0) == 0); - ELIM (strnlen ((char*)p->a3_7, 1) < 2); - ELIM (strnlen ((char*)p->a3_7, 2) < 3); - ELIM (strnlen ((char*)p->a3_7, 3) < 4); - ELIM (strnlen ((char*)p->a3_7, 9) < 10); - ELIM (strnlen ((char*)p->a3_7, 19) < 20); - ELIM (strnlen ((char*)p->a3_7, 21) < 22); - ELIM (strnlen ((char*)p->a3_7, 23) < 22); - ELIM (strnlen ((char*)p->a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, -1) < 22); + ELIM (strnlen ((char*)p->a3_7[0], 0) == 0); + ELIM (strnlen ((char*)p->a3_7[0], 1) < 2); + ELIM (strnlen ((char*)p->a3_7[0], 2) < 3); + ELIM (strnlen ((char*)p->a3_7[0], 3) < 4); + ELIM (strnlen ((char*)p->a3_7[0], 9) < 10); + ELIM (strnlen ((char*)p->a3_7[0], 19) < 20); + ELIM (strnlen ((char*)p->a3_7[0], 21) < 22); + ELIM (strnlen ((char*)p->a3_7[0], 23) < 22); + ELIM (strnlen ((char*)p->a3_7[0], PTRDIFF_MAX) < 22); + ELIM (strnlen ((char*)p->a3_7[0], SIZE_MAX) < 22); + ELIM (strnlen ((char*)p->a3_7[0], -1) < 22); ELIM (strnlen (p->ax, 0) == 0); ELIM (strnlen (p->ax, 1) < 2); @@ -290,9 +271,6 @@ void elim_strnlen_range (char *s) void keep_strnlen_arr_cst (void) { - KEEP (strnlen (&c, 1) == 0); - KEEP (strnlen (&c, 1) == 1); - KEEP (strnlen (a1, 1) == 0); KEEP (strnlen (a1, 1) == 1); @@ -301,7 +279,6 @@ void keep_strnlen_arr_cst (void) struct FlexArrays { - char c; char a0[0]; /* Access to internal zero-length arrays are undefined. */ char a1[1]; }; @@ -308,9 +285,6 @@ struct FlexArrays void keep_strnlen_memarr_cst (struct FlexArrays *p) { - KEEP (strnlen (&p->c, 1) == 0); - KEEP (strnlen (&p->c, 1) == 1); - #if 0 /* Accesses to internal zero-length arrays are undefined so avoid exercising them. */ @@ -331,5 +305,5 @@ void keep_strnlen_memarr_cst (struct FlexArrays *p /* { dg-final { scan-tree-dump-times "call_in_true_branch_not_eliminated_" 0 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 13 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 13 "optimized" } } */ + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 9 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 9 "optimized" } } */ Index: gcc/testsuite/gcc.dg/strlenopt-48.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-48.c (revision 262933) +++ gcc/testsuite/gcc.dg/strlenopt-48.c (working copy) @@ -9,8 +9,8 @@ void f (void) { - extern char a[2][1]; - int n = strlen (a[1]); + extern char a[1][1]; + int n = strlen (a[0]); if (n) abort(); } @@ -17,8 +17,8 @@ void f (void) void g (void) { - extern char b[3][2][1]; - int n = strlen (b[2][1]); + extern char b[1][1][1]; + int n = strlen (b[0][0]); if (n) abort(); } @@ -25,8 +25,8 @@ void g (void) void h (void) { - extern char c[4][3][2][1]; - int n = strlen (c[3][2][1]); + extern char c[1][1][1][1]; + int n = strlen (c[0][0][0]); if (n) abort(); } Index: gcc/testsuite/gcc.dg/strlenopt-51.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-51.c (revision 262933) +++ gcc/testsuite/gcc.dg/strlenopt-51.c (working copy) @@ -101,7 +101,7 @@ void test_keep_a9_9 (int i) { #undef T #define T(I) \ - KEEP (strlen (&a9_9[i][I][0]) > (1 + I) % 9); \ + KEEP (strlen (&a9_9[i][I][0]) > (0 + I) % 9); \ KEEP (strlen (&a9_9[i][I][1]) > (1 + I) % 9); \ KEEP (strlen (&a9_9[i][I][2]) > (2 + I) % 9); \ KEEP (strlen (&a9_9[i][I][3]) > (3 + I) % 9); \ @@ -115,7 +115,7 @@ void test_keep_a9_9 (int i) } /* { dg-final { scan-tree-dump-times "strlen" 72 "gimple" } } - { dg-final { scan-tree-dump-times "strlen" 63 "optimized" } } + { dg-final { scan-tree-dump-times "strlen" 72 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 72 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } */ Index: gcc/testsuite/gcc.dg/strlenopt-54.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-54.c (revision 0) +++ gcc/testsuite/gcc.dg/strlenopt-54.c (working copy) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +typedef char A[6]; +typedef char B[2][3]; + +A a; + +void test (void) +{ + B* b = (B*) a; + if (__builtin_strlen ((*b)[0]) > 2) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "optimized" } } */ ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-24 7:59 [PATCH] Make strlen range computations more conservative Bernd Edlinger 2018-07-24 14:50 ` Richard Biener @ 2018-07-24 16:14 ` Martin Sebor 2018-07-24 21:46 ` Jeff Law 2 siblings, 0 replies; 121+ messages in thread From: Martin Sebor @ 2018-07-24 16:14 UTC (permalink / raw) To: Bernd Edlinger, GCC Patches; +Cc: Jeff Law, Richard Biener, Jakub Jelinek On 07/24/2018 01:59 AM, Bernd Edlinger wrote: > Hi! > > This patch makes strlen range computations more conservative. > > Firstly if there is a visible type cast from type A to B before passing > then value to strlen, don't expect the type layout of B to restrict the > possible return value range of strlen. > > Furthermore use the outermost enclosing array instead of the > innermost one, because too aggressive optimization will likely > convert harmless errors into security-relevant errors, because > as the existing test cases demonstrate, this optimization is actively > attacking string length checks in user code, while and not giving > any warnings. I strongly object to this change. As you know, I am actively working in this area -- I asked you to hold off on submitting patches for it until the review of bug 86532 has completed. It's not just unhelpful but disrespectful of you to ignore my request and to try to make changes you know I will likely have a strong opinion on in spite of it, and without as much as involving me in the proposal. As the author of this code and of many security improvements in GCC I also find your characterization above of "actively attacking" user code insulting. If security is your main concern then helping detect the invalid code you are trying to accommodate with this change would be the right thing to do. One of the reasons for the tight bound is to build a better foundation for the detection of buffer overflow in string functions. Relaxing the bound could make the detection more difficult. So again, I strongly object to both this change and to your conduct. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-24 7:59 [PATCH] Make strlen range computations more conservative Bernd Edlinger 2018-07-24 14:50 ` Richard Biener 2018-07-24 16:14 ` Martin Sebor @ 2018-07-24 21:46 ` Jeff Law 2018-07-24 23:18 ` Bernd Edlinger 2018-07-25 7:08 ` Richard Biener 2 siblings, 2 replies; 121+ messages in thread From: Jeff Law @ 2018-07-24 21:46 UTC (permalink / raw) To: Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 07/24/2018 01:59 AM, Bernd Edlinger wrote: > Hi! > > This patch makes strlen range computations more conservative. > > Firstly if there is a visible type cast from type A to B before passing > then value to strlen, don't expect the type layout of B to restrict the > possible return value range of strlen. Why do you think this is the right thing to do? ie, is there language in the standards that makes you think the code as it stands today is incorrect from a conformance standpoint? Is there a significant body of code that is affected in an adverse way by the current code? If so, what code? > > Furthermore use the outermost enclosing array instead of the > innermost one, because too aggressive optimization will likely > convert harmless errors into security-relevant errors, because > as the existing test cases demonstrate, this optimization is actively > attacking string length checks in user code, while and not giving > any warnings. Same questions here. I'll also note that Martin is *very* aware of the desire to avoid introducing security relevent errors. In fact his main focus is to help identify coding errors that have a security impact. So please don't characterize his work as "actively attacking string length checks in user code". Ultimately we want highly accurate string lengths to help improve the quality of the warnings we generate for potentially dangerous code. These changes seem to take us in the opposite direction. So ISTM that you really need a stronger justification using the standards compliance and/or real world code that is made less safe by keeping string lengths as accurate as possible. > > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? I'd like to ask we hold on this until I return from PTO (Aug 1) so that we can discuss the best thing to do here for each class of change. I think you, Martin, Richi and myself should hash through the technical issues raised by the patch. Obviously others can chime in, but I think the 4 of us probably need to drive the discussion. Thanks, Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-24 21:46 ` Jeff Law @ 2018-07-24 23:18 ` Bernd Edlinger 2018-07-25 4:52 ` Jeff Law ` (4 more replies) 2018-07-25 7:08 ` Richard Biener 1 sibling, 5 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-07-24 23:18 UTC (permalink / raw) To: Jeff Law, GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Martin Sebor On 07/24/18 23:46, Jeff Law wrote: > On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >> Hi! >> >> This patch makes strlen range computations more conservative. >> >> Firstly if there is a visible type cast from type A to B before passing >> then value to strlen, don't expect the type layout of B to restrict the >> possible return value range of strlen. > Why do you think this is the right thing to do? ie, is there language > in the standards that makes you think the code as it stands today is > incorrect from a conformance standpoint? Is there a significant body of > code that is affected in an adverse way by the current code? If so, > what code? > > I think if you have an object, of an effective type A say char[100], then you can cast the address of A to B, say typedef char (*B)[2] for instance and then to const char *, say for use in strlen. I may be wrong, but I think that we should at least try not to pick up char[2] from B, but instead use A for strlen ranges, or leave this range open. Currently the range info for strlen is [0..1] in this case, even if we see the type cast in the generic tree. One other example I have found in one of the test cases: char c; if (strlen(&c) != 0) abort(); this is now completely elided, but why? Is there a code base where that is used? I doubt, but why do we care to eliminate something stupid like that? If we would emit a warning for that I'm fine with it, But if we silently remove code like that I don't think that it will improve anything. So I ask, where is the code base which gets an improvement from that optimization? > >> >> Furthermore use the outermost enclosing array instead of the >> innermost one, because too aggressive optimization will likely >> convert harmless errors into security-relevant errors, because >> as the existing test cases demonstrate, this optimization is actively >> attacking string length checks in user code, while and not giving >> any warnings. > Same questions here. > > I'll also note that Martin is *very* aware of the desire to avoid > introducing security relevent errors. In fact his main focus is to help > identify coding errors that have a security impact. So please don't > characterize his work as "actively attacking string length checks in > user code". > I do fully respect Martin's valuable contributions over the years, and I did not intend to say anything about the quality of his work, for GCC, it is just breathtaking! What I meant is just, what this particular optimization can do. > Ultimately we want highly accurate string lengths to help improve the > quality of the warnings we generate for potentially dangerous code. > These changes seem to take us in the opposite direction. > No, I don't think so, we have full control on the direction, when I do what Richi requested on his response, we will have one function where the string length estimation is based upon, instead of several open coded tree walks. > So ISTM that you really need a stronger justification using the > standards compliance and/or real world code that is made less safe by > keeping string lengths as accurate as possible. > > This work concentrates mostly on avoiding to interfere with code that actually deserves warnings, but which is not being warned about. >> >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? > I'd like to ask we hold on this until I return from PTO (Aug 1) so that > we can discuss the best thing to do here for each class of change. > Okay. > I think you, Martin, Richi and myself should hash through the technical > issues raised by the patch. Obviously others can chime in, but I think > the 4 of us probably need to drive the discussion. > Yes, sure. I will try to help when I can. Currently I thought Martin is working on the string constant folding, (therefore I thought this range patch would not collide with his patch) and there are plenty of change requests, plus I think he has some more patches on hold. I would like to see the review comments resolved, and maybe also get to see the follow up patches, maybe as a patch series, so we can get a clearer picture? Thanks Bernd. > Thanks, > Jeff > ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-24 23:18 ` Bernd Edlinger @ 2018-07-25 4:52 ` Jeff Law 2018-07-25 7:23 ` Richard Biener ` (3 subsequent siblings) 4 siblings, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-07-25 4:52 UTC (permalink / raw) To: Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Martin Sebor On 07/24/2018 05:18 PM, Bernd Edlinger wrote: > On 07/24/18 23:46, Jeff Law wrote: >> I'd like to ask we hold on this until I return from PTO (Aug 1) so that >> we can discuss the best thing to do here for each class of change. >> > > Okay. > >> I think you, Martin, Richi and myself should hash through the technical >> issues raised by the patch. Obviously others can chime in, but I think >> the 4 of us probably need to drive the discussion. >> > > Yes, sure. I will try to help when I can. Thanks for understanding. I'll be back on Aug 1, slogging my way through the week's worth of patches... Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-24 23:18 ` Bernd Edlinger 2018-07-25 4:52 ` Jeff Law @ 2018-07-25 7:23 ` Richard Biener 2018-07-25 19:37 ` Martin Sebor 2018-08-03 7:19 ` Jeff Law 2018-07-25 17:31 ` Martin Sebor ` (2 subsequent siblings) 4 siblings, 2 replies; 121+ messages in thread From: Richard Biener @ 2018-07-25 7:23 UTC (permalink / raw) To: Bernd Edlinger; +Cc: Jeff Law, GCC Patches, Jakub Jelinek, Martin Sebor On Tue, 24 Jul 2018, Bernd Edlinger wrote: > On 07/24/18 23:46, Jeff Law wrote: > > On 07/24/2018 01:59 AM, Bernd Edlinger wrote: > >> Hi! > >> > >> This patch makes strlen range computations more conservative. > >> > >> Firstly if there is a visible type cast from type A to B before passing > >> then value to strlen, don't expect the type layout of B to restrict the > >> possible return value range of strlen. > > Why do you think this is the right thing to do? ie, is there language > > in the standards that makes you think the code as it stands today is > > incorrect from a conformance standpoint? Is there a significant body of > > code that is affected in an adverse way by the current code? If so, > > what code? > > > > > > I think if you have an object, of an effective type A say char[100], then > you can cast the address of A to B, say typedef char (*B)[2] for instance > and then to const char *, say for use in strlen. I may be wrong, but I think > that we should at least try not to pick up char[2] from B, but instead > use A for strlen ranges, or leave this range open. Currently the range > info for strlen is [0..1] in this case, even if we see the type cast > in the generic tree. You raise a valid point - namely that the middle-end allows any object (including storage with a declared type) to change its dynamic type (even of a piece of it). So unless you can prove that the dynamic type of the thing you are looking at matches your idea of that type you may not derive any string lengths (or ranges) from it. BUT - for the string_constant and c_strlen functions we are, in all cases we return something interesting, able to look at an initializer which then determines that type. Hopefully. I think the strlen() folding code when it sets SSA ranges now looks at types ...? Consider struct X { int i; char c[4]; int j;}; struct Y { char c[16]; }; void foo (struct X *p, struct Y *q) { memcpy (p, q, sizeof (struct Y)); if (strlen ((char *)(struct Y *)p + 4) < 7) abort (); } here the GIMPLE IL looks like const char * _1; <bb 2> [local count: 1073741825]: _5 = MEM[(char * {ref-all})q_4(D)]; MEM[(char * {ref-all})p_6(D)] = _5; _1 = p_6(D) + 4; _2 = __builtin_strlen (_1); and I guess Martin would argue that since p is of type struct X + 4 gets you to c[4] and thus strlen of that cannot be larger than 3. But of course the middle-end doesn't work like that and luckily we do not try to draw such conclusions or we are somehow lucky that for the testcase as written above we do not (I'm not sure whether Martins changes in this area would derive such conclusions in principle). NOTE - we do not know the dynamic type here since we do not know the dynamic type of the memory pointed-to by q! We can only derive that at q+4 there must be some object that we can validly call strlen on (where Martin again thinks strlen imposes constrains that memchr does not - sth I do not agree with from a QOI perspective) Richard. > One other example I have found in one of the test cases: > > char c; > > if (strlen(&c) != 0) abort(); > > this is now completely elided, but why? Is there a code base where > that is used? I doubt, but why do we care to eliminate something > stupid like that? If we would emit a warning for that I'm fine with it, > But if we silently remove code like that I don't think that it > will improve anything. So I ask, where is the code base which > gets an improvement from that optimization? > > > > > > >> > >> Furthermore use the outermost enclosing array instead of the > >> innermost one, because too aggressive optimization will likely > >> convert harmless errors into security-relevant errors, because > >> as the existing test cases demonstrate, this optimization is actively > >> attacking string length checks in user code, while and not giving > >> any warnings. > > Same questions here. > > > > I'll also note that Martin is *very* aware of the desire to avoid > > introducing security relevent errors. In fact his main focus is to help > > identify coding errors that have a security impact. So please don't > > characterize his work as "actively attacking string length checks in > > user code". > > > > I do fully respect Martin's valuable contributions over the years, > and I did not intend to say anything about the quality of his work, > for GCC, it is just breathtaking! > > What I meant is just, what this particular optimization can do. > > > Ultimately we want highly accurate string lengths to help improve the > > quality of the warnings we generate for potentially dangerous code. > > These changes seem to take us in the opposite direction. > > > > No, I don't think so, we have full control on the direction, when > I do what Richi requested on his response, we will have one function > where the string length estimation is based upon, instead of several > open coded tree walks. > > > So ISTM that you really need a stronger justification using the > > standards compliance and/or real world code that is made less safe by > > keeping string lengths as accurate as possible. > > > > > > This work concentrates mostly on avoiding to interfere with code that > actually deserves warnings, but which is not being warned about. > > >> > >> > >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > >> Is it OK for trunk? > > I'd like to ask we hold on this until I return from PTO (Aug 1) so that > > we can discuss the best thing to do here for each class of change. > > > > Okay. > > > I think you, Martin, Richi and myself should hash through the technical > > issues raised by the patch. Obviously others can chime in, but I think > > the 4 of us probably need to drive the discussion. > > > > Yes, sure. I will try to help when I can. > > Currently I thought Martin is working on the string constant folding, > (therefore I thought this range patch would not collide with his patch) > and there are plenty of change requests, plus I think he has some more > patches on hold. I would like to see the review comments resolved, > and maybe also get to see the follow up patches, maybe as a patch > series, so we can get a clearer picture? > > > Thanks > Bernd. > > > Thanks, > > Jeff > > > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-25 7:23 ` Richard Biener @ 2018-07-25 19:37 ` Martin Sebor 2018-07-26 8:55 ` Richard Biener 2018-08-03 7:29 ` Jeff Law 2018-08-03 7:19 ` Jeff Law 1 sibling, 2 replies; 121+ messages in thread From: Martin Sebor @ 2018-07-25 19:37 UTC (permalink / raw) To: Richard Biener, Bernd Edlinger; +Cc: Jeff Law, GCC Patches, Jakub Jelinek > BUT - for the string_constant and c_strlen functions we are, > in all cases we return something interesting, able to look > at an initializer which then determines that type. Hopefully. > I think the strlen() folding code when it sets SSA ranges > now looks at types ...? > > Consider > > struct X { int i; char c[4]; int j;}; > struct Y { char c[16]; }; > > void foo (struct X *p, struct Y *q) > { > memcpy (p, q, sizeof (struct Y)); > if (strlen ((char *)(struct Y *)p + 4) < 7) > abort (); > } > > here the GIMPLE IL looks like > > const char * _1; > > <bb 2> [local count: 1073741825]: > _5 = MEM[(char * {ref-all})q_4(D)]; > MEM[(char * {ref-all})p_6(D)] = _5; > _1 = p_6(D) + 4; > _2 = __builtin_strlen (_1); > > and I guess Martin would argue that since p is of type struct X > + 4 gets you to c[4] and thus strlen of that cannot be larger > than 3. But of course the middle-end doesn't work like that > and luckily we do not try to draw such conclusions or we > are somehow lucky that for the testcase as written above we do not > (I'm not sure whether Martins changes in this area would derive > such conclusions in principle). Only if the strlen argument were p->c. > NOTE - we do not know the dynamic type here since we do not know > the dynamic type of the memory pointed-to by q! We can only > derive that at q+4 there must be some object that we can > validly call strlen on (where Martin again thinks strlen > imposes constrains that memchr does not - sth I do not agree > with from a QOI perspective) The dynamic type is a murky area. As you said, above we don't know whether *p is an allocated object or not. Strictly speaking, we would need to treat it as such. It would basically mean throwing out all type information and treating objects simply as blobs of bytes. But that's not what GCC or other compilers do either. For instance, in the modified foo below, GCC eliminates the test because it assumes that *p and *q don't overlap. It does that because they are members of structs of unrelated types access to which cannot alias. I.e., not just the type of the access matters (here int and char) but so does the type of the enclosing object. If it were otherwise and only the type of the access mattered then eliminating the test below wouldn't be valid (objects can have their stored value accessed by either an lvalue of a compatible type or char). void foo (struct X *p, struct Y *q) { int j = p->j; q->c[__builtin_offsetof (struct X, j)] = 0; if (j != p->j) __builtin_abort (); } Clarifying (and adjusting if necessary) this area is among the goals of the C object model proposal and the ongoing study group. We have been talking about some of these cases there and trying to come up with ways to let code do what it needs to do without compromising existing language rules, which was the consensus position within WG14 when the study group was formed: i.e., to clarify or reaffirm existing rules and, in cases of ambiguity or where the standard is unintentionally overly permissive), favor tighter rules over looser ones. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-25 19:37 ` Martin Sebor @ 2018-07-26 8:55 ` Richard Biener 2018-08-07 2:24 ` Martin Sebor 2018-08-03 7:29 ` Jeff Law 1 sibling, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-07-26 8:55 UTC (permalink / raw) To: Martin Sebor; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Jakub Jelinek On Wed, 25 Jul 2018, Martin Sebor wrote: > > BUT - for the string_constant and c_strlen functions we are, > > in all cases we return something interesting, able to look > > at an initializer which then determines that type. Hopefully. > > I think the strlen() folding code when it sets SSA ranges > > now looks at types ...? > > > > Consider > > > > struct X { int i; char c[4]; int j;}; > > struct Y { char c[16]; }; > > > > void foo (struct X *p, struct Y *q) > > { > > memcpy (p, q, sizeof (struct Y)); > > if (strlen ((char *)(struct Y *)p + 4) < 7) > > abort (); > > } > > > > here the GIMPLE IL looks like > > > > const char * _1; > > > > <bb 2> [local count: 1073741825]: > > _5 = MEM[(char * {ref-all})q_4(D)]; > > MEM[(char * {ref-all})p_6(D)] = _5; > > _1 = p_6(D) + 4; > > _2 = __builtin_strlen (_1); > > > > and I guess Martin would argue that since p is of type struct X > > + 4 gets you to c[4] and thus strlen of that cannot be larger > > than 3. But of course the middle-end doesn't work like that > > and luckily we do not try to draw such conclusions or we > > are somehow lucky that for the testcase as written above we do not > > (I'm not sure whether Martins changes in this area would derive > > such conclusions in principle). > > Only if the strlen argument were p->c. > > > NOTE - we do not know the dynamic type here since we do not know > > the dynamic type of the memory pointed-to by q! We can only > > derive that at q+4 there must be some object that we can > > validly call strlen on (where Martin again thinks strlen > > imposes constrains that memchr does not - sth I do not agree > > with from a QOI perspective) > > The dynamic type is a murky area. It's well-specified in the middle-end. A store changes the dynamic type of the stored-to object. If that type is compatible with the surrounding objects dynamic type that one is not affected, if not then the surrounding objects dynamic type becomes unspecified. There is TYPE_TYPELESS_STORAGE to somewhat control "compatibility" of subobjects. > As you said, above we don't > know whether *p is an allocated object or not. Strictly speaking, > we would need to treat it as such. It would basically mean > throwing out all type information and treating objects simply > as blobs of bytes. But that's not what GCC or other compilers do > either. It is what GCC does unless it sees a store to the memory. Basically pointers carry no type information, only (visible!) stores (and loads to some extent) provide information about dynamic types of objects (allocated or declared - GCC doesn't make a difference there). For instance, in the modified foo below, GCC eliminates > the test because it assumes that *p and *q don't overlap. It > does that because they are members of structs of unrelated types > access to which cannot alias. I.e., not just the type of > the access matters (here int and char) but so does the type of > the enclosing object. If it were otherwise and only the type > of the access mattered then eliminating the test below wouldn't > be valid (objects can have their stored value accessed by either > an lvalue of a compatible type or char). > > void foo (struct X *p, struct Y *q) > { > int j = p->j; > q->c[__builtin_offsetof (struct X, j)] = 0; > if (j != p->j) > __builtin_abort (); > } Here GCC sees both a load and a store where it derives the information from. And yes, it looks at the full access structure which contains a dereference of p and of q. Because of that and the fact that the store to q->c[] (which for GCC implies a store to *q!) that changes the dynamic type. > Clarifying (and adjusting if necessary) this area is among > the goals of the C object model proposal and the ongoing study > group. We have been talking about some of these cases there > and trying to come up with ways to let code do what it needs > to do without compromising existing language rules, which was > the consensus position within WG14 when the study group was > formed: i.e., to clarify or reaffirm existing rules and, in > cases of ambiguity or where the standard is unintentionally > overly permissive), favor tighter rules over looser ones. There is also the C++ object model and the Ada object model and ... GCC already has an object model in its middle-end and that is not going to change. And obviously it was modeled after the requirements from the languages the middle-end supports. The latest change was made necessary by C++ (placement new and storage re-use specifically). Richard. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-26 8:55 ` Richard Biener @ 2018-08-07 2:24 ` Martin Sebor 2018-08-07 8:51 ` Richard Biener 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-07 2:24 UTC (permalink / raw) To: Richard Biener; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Jakub Jelinek On 07/26/2018 02:55 AM, Richard Biener wrote: > On Wed, 25 Jul 2018, Martin Sebor wrote: > >>> BUT - for the string_constant and c_strlen functions we are, >>> in all cases we return something interesting, able to look >>> at an initializer which then determines that type. Hopefully. >>> I think the strlen() folding code when it sets SSA ranges >>> now looks at types ...? >>> >>> Consider >>> >>> struct X { int i; char c[4]; int j;}; >>> struct Y { char c[16]; }; >>> >>> void foo (struct X *p, struct Y *q) >>> { >>> memcpy (p, q, sizeof (struct Y)); >>> if (strlen ((char *)(struct Y *)p + 4) < 7) >>> abort (); >>> } >>> >>> here the GIMPLE IL looks like >>> >>> const char * _1; >>> >>> <bb 2> [local count: 1073741825]: >>> _5 = MEM[(char * {ref-all})q_4(D)]; >>> MEM[(char * {ref-all})p_6(D)] = _5; >>> _1 = p_6(D) + 4; >>> _2 = __builtin_strlen (_1); >>> >>> and I guess Martin would argue that since p is of type struct X >>> + 4 gets you to c[4] and thus strlen of that cannot be larger >>> than 3. But of course the middle-end doesn't work like that >>> and luckily we do not try to draw such conclusions or we >>> are somehow lucky that for the testcase as written above we do not >>> (I'm not sure whether Martins changes in this area would derive >>> such conclusions in principle). >> >> Only if the strlen argument were p->c. >> >>> NOTE - we do not know the dynamic type here since we do not know >>> the dynamic type of the memory pointed-to by q! We can only >>> derive that at q+4 there must be some object that we can >>> validly call strlen on (where Martin again thinks strlen >>> imposes constrains that memchr does not - sth I do not agree >>> with from a QOI perspective) >> >> The dynamic type is a murky area. > > It's well-specified in the middle-end. A store changes the > dynamic type of the stored-to object. If that type is > compatible with the surrounding objects dynamic type that one > is not affected, if not then the surrounding objects dynamic > type becomes unspecified. There is TYPE_TYPELESS_STORAGE > to somewhat control "compatibility" of subobjects. I never responded to this. Using a dynamic (effective) type as you describe it would invalidate the aggressive loop optimization in the following: void foo (struct X *p) { struct Y y = { "12345678" }; memcpy (p, &y, sizeof (struct Y)); // *p effective type is now struct Y int n = 0; while (p->c[n]) ++n; if (n < 7) abort (); } GCC unconditionally aborts, just as it does with strlen(p->c). Why is that not wrong (in either case)? Because the code is invalid either way, for two reasons: 1) it accesses an object of (effective) type struct Y via an lvalue of type struct X (specifically via (*p).c) 2) it relies on p->c The loop optimization relies on the exact same requirement as the strlen one. Either they are both valid or neither is. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-07 2:24 ` Martin Sebor @ 2018-08-07 8:51 ` Richard Biener 2018-08-07 14:37 ` Martin Sebor 0 siblings, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-08-07 8:51 UTC (permalink / raw) To: Martin Sebor; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Jakub Jelinek On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >On 07/26/2018 02:55 AM, Richard Biener wrote: >> On Wed, 25 Jul 2018, Martin Sebor wrote: >> >>>> BUT - for the string_constant and c_strlen functions we are, >>>> in all cases we return something interesting, able to look >>>> at an initializer which then determines that type. Hopefully. >>>> I think the strlen() folding code when it sets SSA ranges >>>> now looks at types ...? >>>> >>>> Consider >>>> >>>> struct X { int i; char c[4]; int j;}; >>>> struct Y { char c[16]; }; >>>> >>>> void foo (struct X *p, struct Y *q) >>>> { >>>> memcpy (p, q, sizeof (struct Y)); >>>> if (strlen ((char *)(struct Y *)p + 4) < 7) >>>> abort (); >>>> } >>>> >>>> here the GIMPLE IL looks like >>>> >>>> const char * _1; >>>> >>>> <bb 2> [local count: 1073741825]: >>>> _5 = MEM[(char * {ref-all})q_4(D)]; >>>> MEM[(char * {ref-all})p_6(D)] = _5; >>>> _1 = p_6(D) + 4; >>>> _2 = __builtin_strlen (_1); >>>> >>>> and I guess Martin would argue that since p is of type struct X >>>> + 4 gets you to c[4] and thus strlen of that cannot be larger >>>> than 3. But of course the middle-end doesn't work like that >>>> and luckily we do not try to draw such conclusions or we >>>> are somehow lucky that for the testcase as written above we do not >>>> (I'm not sure whether Martins changes in this area would derive >>>> such conclusions in principle). >>> >>> Only if the strlen argument were p->c. >>> >>>> NOTE - we do not know the dynamic type here since we do not know >>>> the dynamic type of the memory pointed-to by q! We can only >>>> derive that at q+4 there must be some object that we can >>>> validly call strlen on (where Martin again thinks strlen >>>> imposes constrains that memchr does not - sth I do not agree >>>> with from a QOI perspective) >>> >>> The dynamic type is a murky area. >> >> It's well-specified in the middle-end. A store changes the >> dynamic type of the stored-to object. If that type is >> compatible with the surrounding objects dynamic type that one >> is not affected, if not then the surrounding objects dynamic >> type becomes unspecified. There is TYPE_TYPELESS_STORAGE >> to somewhat control "compatibility" of subobjects. > >I never responded to this. Using a dynamic (effective) type as >you describe it would invalidate the aggressive loop optimization >in the following: > > void foo (struct X *p) > { > struct Y y = { "12345678" }; > memcpy (p, &y, sizeof (struct Y)); > > // *p effective type is now struct Y > > int n = 0; > while (p->c[n]) > ++n; > > if (n < 7) > abort (); > } > >GCC unconditionally aborts, just as it does with strlen(p->c). >Why is that not wrong (in either case)? > >Because the code is invalid either way, for two reasons: No, because the storage has only 4 non-null characters starting at offset 4? >1) it accesses an object of (effective) type struct Y via > an lvalue of type struct X (specifically via (*p).c) >2) it relies on p->c > >The loop optimization relies on the exact same requirement >as the strlen one. Either they are both valid or neither is. > >Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-07 8:51 ` Richard Biener @ 2018-08-07 14:37 ` Martin Sebor 2018-08-07 17:44 ` Richard Biener 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-07 14:37 UTC (permalink / raw) To: Richard Biener; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Jakub Jelinek On 08/07/2018 02:51 AM, Richard Biener wrote: > On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >> On 07/26/2018 02:55 AM, Richard Biener wrote: >>> On Wed, 25 Jul 2018, Martin Sebor wrote: >>> >>>>> BUT - for the string_constant and c_strlen functions we are, >>>>> in all cases we return something interesting, able to look >>>>> at an initializer which then determines that type. Hopefully. >>>>> I think the strlen() folding code when it sets SSA ranges >>>>> now looks at types ...? >>>>> >>>>> Consider >>>>> >>>>> struct X { int i; char c[4]; int j;}; >>>>> struct Y { char c[16]; }; >>>>> >>>>> void foo (struct X *p, struct Y *q) >>>>> { >>>>> memcpy (p, q, sizeof (struct Y)); >>>>> if (strlen ((char *)(struct Y *)p + 4) < 7) >>>>> abort (); >>>>> } >>>>> >>>>> here the GIMPLE IL looks like >>>>> >>>>> const char * _1; >>>>> >>>>> <bb 2> [local count: 1073741825]: >>>>> _5 = MEM[(char * {ref-all})q_4(D)]; >>>>> MEM[(char * {ref-all})p_6(D)] = _5; >>>>> _1 = p_6(D) + 4; >>>>> _2 = __builtin_strlen (_1); >>>>> >>>>> and I guess Martin would argue that since p is of type struct X >>>>> + 4 gets you to c[4] and thus strlen of that cannot be larger >>>>> than 3. But of course the middle-end doesn't work like that >>>>> and luckily we do not try to draw such conclusions or we >>>>> are somehow lucky that for the testcase as written above we do not >>>>> (I'm not sure whether Martins changes in this area would derive >>>>> such conclusions in principle). >>>> >>>> Only if the strlen argument were p->c. >>>> >>>>> NOTE - we do not know the dynamic type here since we do not know >>>>> the dynamic type of the memory pointed-to by q! We can only >>>>> derive that at q+4 there must be some object that we can >>>>> validly call strlen on (where Martin again thinks strlen >>>>> imposes constrains that memchr does not - sth I do not agree >>>>> with from a QOI perspective) >>>> >>>> The dynamic type is a murky area. >>> >>> It's well-specified in the middle-end. A store changes the >>> dynamic type of the stored-to object. If that type is >>> compatible with the surrounding objects dynamic type that one >>> is not affected, if not then the surrounding objects dynamic >>> type becomes unspecified. There is TYPE_TYPELESS_STORAGE >>> to somewhat control "compatibility" of subobjects. >> >> I never responded to this. Using a dynamic (effective) type as >> you describe it would invalidate the aggressive loop optimization >> in the following: >> >> void foo (struct X *p) >> { >> struct Y y = { "12345678" }; >> memcpy (p, &y, sizeof (struct Y)); >> >> // *p effective type is now struct Y >> >> int n = 0; >> while (p->c[n]) >> ++n; >> >> if (n < 7) >> abort (); >> } >> >> GCC unconditionally aborts, just as it does with strlen(p->c). >> Why is that not wrong (in either case)? >> >> Because the code is invalid either way, for two reasons: > > No, because the storage has only 4 non-null characters starting at offset 4? No, for the reasons below. I made a mistake of making the initializer string too short. If we make it longer it still aborts. Say with this struct Y y = { "123456789012345" }; we end up with this DCE: struct Y y; <bb 2> : MEM[(char * {ref-all})p_6(D)] = 0x353433323130393837363534333231; __builtin_abort (); With -fdump-tree-cddce1-details (and a patch to show the upper bound) we see: Found loop 1 to be finite: upper bound found: 3. With -fno-aggressive-loop-optimizations the abort becomes conditional because the array bound isn't considered. I would expect you to know this since you implemented the feature. Martin > >> 1) it accesses an object of (effective) type struct Y via >> an lvalue of type struct X (specifically via (*p).c) >> 2) it relies on p->c >> >> The loop optimization relies on the exact same requirement >> as the strlen one. Either they are both valid or neither is. >> >> Martin > ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-07 14:37 ` Martin Sebor @ 2018-08-07 17:44 ` Richard Biener 2018-08-08 2:33 ` Martin Sebor 0 siblings, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-08-07 17:44 UTC (permalink / raw) To: Martin Sebor; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Jakub Jelinek On August 7, 2018 4:37:00 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >On 08/07/2018 02:51 AM, Richard Biener wrote: >> On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor ><msebor@gmail.com> wrote: >>> On 07/26/2018 02:55 AM, Richard Biener wrote: >>>> On Wed, 25 Jul 2018, Martin Sebor wrote: >>>> >>>>>> BUT - for the string_constant and c_strlen functions we are, >>>>>> in all cases we return something interesting, able to look >>>>>> at an initializer which then determines that type. Hopefully. >>>>>> I think the strlen() folding code when it sets SSA ranges >>>>>> now looks at types ...? >>>>>> >>>>>> Consider >>>>>> >>>>>> struct X { int i; char c[4]; int j;}; >>>>>> struct Y { char c[16]; }; >>>>>> >>>>>> void foo (struct X *p, struct Y *q) >>>>>> { >>>>>> memcpy (p, q, sizeof (struct Y)); >>>>>> if (strlen ((char *)(struct Y *)p + 4) < 7) >>>>>> abort (); >>>>>> } >>>>>> >>>>>> here the GIMPLE IL looks like >>>>>> >>>>>> const char * _1; >>>>>> >>>>>> <bb 2> [local count: 1073741825]: >>>>>> _5 = MEM[(char * {ref-all})q_4(D)]; >>>>>> MEM[(char * {ref-all})p_6(D)] = _5; >>>>>> _1 = p_6(D) + 4; >>>>>> _2 = __builtin_strlen (_1); >>>>>> >>>>>> and I guess Martin would argue that since p is of type struct X >>>>>> + 4 gets you to c[4] and thus strlen of that cannot be larger >>>>>> than 3. But of course the middle-end doesn't work like that >>>>>> and luckily we do not try to draw such conclusions or we >>>>>> are somehow lucky that for the testcase as written above we do >not >>>>>> (I'm not sure whether Martins changes in this area would derive >>>>>> such conclusions in principle). >>>>> >>>>> Only if the strlen argument were p->c. >>>>> >>>>>> NOTE - we do not know the dynamic type here since we do not know >>>>>> the dynamic type of the memory pointed-to by q! We can only >>>>>> derive that at q+4 there must be some object that we can >>>>>> validly call strlen on (where Martin again thinks strlen >>>>>> imposes constrains that memchr does not - sth I do not agree >>>>>> with from a QOI perspective) >>>>> >>>>> The dynamic type is a murky area. >>>> >>>> It's well-specified in the middle-end. A store changes the >>>> dynamic type of the stored-to object. If that type is >>>> compatible with the surrounding objects dynamic type that one >>>> is not affected, if not then the surrounding objects dynamic >>>> type becomes unspecified. There is TYPE_TYPELESS_STORAGE >>>> to somewhat control "compatibility" of subobjects. >>> >>> I never responded to this. Using a dynamic (effective) type as >>> you describe it would invalidate the aggressive loop optimization >>> in the following: >>> >>> void foo (struct X *p) >>> { >>> struct Y y = { "12345678" }; >>> memcpy (p, &y, sizeof (struct Y)); >>> >>> // *p effective type is now struct Y >>> >>> int n = 0; >>> while (p->c[n]) >>> ++n; >>> >>> if (n < 7) >>> abort (); >>> } >>> >>> GCC unconditionally aborts, just as it does with strlen(p->c). >>> Why is that not wrong (in either case)? >>> >>> Because the code is invalid either way, for two reasons: >> >> No, because the storage has only 4 non-null characters starting at >offset 4? > >No, for the reasons below. I made a mistake of making >the initializer string too short. If we make it longer it >still aborts. Say with this > > struct Y y = { "123456789012345" }; > >we end up with this DCE: > > struct Y y; > > <bb 2> : > MEM[(char * {ref-all})p_6(D)] = 0x353433323130393837363534333231; > __builtin_abort (); > >With -fdump-tree-cddce1-details (and a patch to show the upper >bound) we see: > > Found loop 1 to be finite: upper bound found: 3. > >With -fno-aggressive-loop-optimizations the abort becomes >conditional because the array bound isn't considered. I would >expect you to know this since you implemented the feature. Honza added the array indexing part and it may very well be too aggressive. I have to take a closer look after vacation to tell. Can you open a PR and CC me there? Richard. > >Martin >> >>> 1) it accesses an object of (effective) type struct Y via >>> an lvalue of type struct X (specifically via (*p).c) >>> 2) it relies on p->c >>> >>> The loop optimization relies on the exact same requirement >>> as the strlen one. Either they are both valid or neither is. >>> >>> Martin >> ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-07 17:44 ` Richard Biener @ 2018-08-08 2:33 ` Martin Sebor 2018-08-17 10:31 ` Richard Biener 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-08 2:33 UTC (permalink / raw) To: Richard Biener; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Jakub Jelinek On 08/07/2018 11:44 AM, Richard Biener wrote: > On August 7, 2018 4:37:00 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >> On 08/07/2018 02:51 AM, Richard Biener wrote: >>> On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor >> <msebor@gmail.com> wrote: >>>> On 07/26/2018 02:55 AM, Richard Biener wrote: >>>>> On Wed, 25 Jul 2018, Martin Sebor wrote: >>>>> >>>>>>> BUT - for the string_constant and c_strlen functions we are, >>>>>>> in all cases we return something interesting, able to look >>>>>>> at an initializer which then determines that type. Hopefully. >>>>>>> I think the strlen() folding code when it sets SSA ranges >>>>>>> now looks at types ...? >>>>>>> >>>>>>> Consider >>>>>>> >>>>>>> struct X { int i; char c[4]; int j;}; >>>>>>> struct Y { char c[16]; }; >>>>>>> >>>>>>> void foo (struct X *p, struct Y *q) >>>>>>> { >>>>>>> memcpy (p, q, sizeof (struct Y)); >>>>>>> if (strlen ((char *)(struct Y *)p + 4) < 7) >>>>>>> abort (); >>>>>>> } >>>>>>> >>>>>>> here the GIMPLE IL looks like >>>>>>> >>>>>>> const char * _1; >>>>>>> >>>>>>> <bb 2> [local count: 1073741825]: >>>>>>> _5 = MEM[(char * {ref-all})q_4(D)]; >>>>>>> MEM[(char * {ref-all})p_6(D)] = _5; >>>>>>> _1 = p_6(D) + 4; >>>>>>> _2 = __builtin_strlen (_1); >>>>>>> >>>>>>> and I guess Martin would argue that since p is of type struct X >>>>>>> + 4 gets you to c[4] and thus strlen of that cannot be larger >>>>>>> than 3. But of course the middle-end doesn't work like that >>>>>>> and luckily we do not try to draw such conclusions or we >>>>>>> are somehow lucky that for the testcase as written above we do >> not >>>>>>> (I'm not sure whether Martins changes in this area would derive >>>>>>> such conclusions in principle). >>>>>> >>>>>> Only if the strlen argument were p->c. >>>>>> >>>>>>> NOTE - we do not know the dynamic type here since we do not know >>>>>>> the dynamic type of the memory pointed-to by q! We can only >>>>>>> derive that at q+4 there must be some object that we can >>>>>>> validly call strlen on (where Martin again thinks strlen >>>>>>> imposes constrains that memchr does not - sth I do not agree >>>>>>> with from a QOI perspective) >>>>>> >>>>>> The dynamic type is a murky area. >>>>> >>>>> It's well-specified in the middle-end. A store changes the >>>>> dynamic type of the stored-to object. If that type is >>>>> compatible with the surrounding objects dynamic type that one >>>>> is not affected, if not then the surrounding objects dynamic >>>>> type becomes unspecified. There is TYPE_TYPELESS_STORAGE >>>>> to somewhat control "compatibility" of subobjects. >>>> >>>> I never responded to this. Using a dynamic (effective) type as >>>> you describe it would invalidate the aggressive loop optimization >>>> in the following: >>>> >>>> void foo (struct X *p) >>>> { >>>> struct Y y = { "12345678" }; >>>> memcpy (p, &y, sizeof (struct Y)); >>>> >>>> // *p effective type is now struct Y >>>> >>>> int n = 0; >>>> while (p->c[n]) >>>> ++n; >>>> >>>> if (n < 7) >>>> abort (); >>>> } >>>> >>>> GCC unconditionally aborts, just as it does with strlen(p->c). >>>> Why is that not wrong (in either case)? >>>> >>>> Because the code is invalid either way, for two reasons: >>> >>> No, because the storage has only 4 non-null characters starting at >> offset 4? >> >> No, for the reasons below. I made a mistake of making >> the initializer string too short. If we make it longer it >> still aborts. Say with this >> >> struct Y y = { "123456789012345" }; >> >> we end up with this DCE: >> >> struct Y y; >> >> <bb 2> : >> MEM[(char * {ref-all})p_6(D)] = 0x353433323130393837363534333231; >> __builtin_abort (); >> >> With -fdump-tree-cddce1-details (and a patch to show the upper >> bound) we see: >> >> Found loop 1 to be finite: upper bound found: 3. >> >> With -fno-aggressive-loop-optimizations the abort becomes >> conditional because the array bound isn't considered. I would >> expect you to know this since you implemented the feature. > > Honza added the array indexing part and it may very well be too aggressive. I have to take a closer look after vacation to tell. Can you open a PR and CC me there? I opened bug 86884. Martin > > Richard. > >> >> Martin >>> >>>> 1) it accesses an object of (effective) type struct Y via >>>> an lvalue of type struct X (specifically via (*p).c) >>>> 2) it relies on p->c >>>> >>>> The loop optimization relies on the exact same requirement >>>> as the strlen one. Either they are both valid or neither is. >>>> >>>> Martin >>> > ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-08 2:33 ` Martin Sebor @ 2018-08-17 10:31 ` Richard Biener 2018-08-17 15:49 ` Martin Sebor 0 siblings, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-08-17 10:31 UTC (permalink / raw) To: Martin Sebor; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Jakub Jelinek On Tue, 7 Aug 2018, Martin Sebor wrote: > On 08/07/2018 11:44 AM, Richard Biener wrote: > > On August 7, 2018 4:37:00 PM GMT+02:00, Martin Sebor <msebor@gmail.com> > > wrote: > > > On 08/07/2018 02:51 AM, Richard Biener wrote: > > > > On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor > > > <msebor@gmail.com> wrote: > > > > > On 07/26/2018 02:55 AM, Richard Biener wrote: > > > > > > On Wed, 25 Jul 2018, Martin Sebor wrote: > > > > > > > > > > > > > > BUT - for the string_constant and c_strlen functions we are, > > > > > > > > in all cases we return something interesting, able to look > > > > > > > > at an initializer which then determines that type. Hopefully. > > > > > > > > I think the strlen() folding code when it sets SSA ranges > > > > > > > > now looks at types ...? > > > > > > > > > > > > > > > > Consider > > > > > > > > > > > > > > > > struct X { int i; char c[4]; int j;}; > > > > > > > > struct Y { char c[16]; }; > > > > > > > > > > > > > > > > void foo (struct X *p, struct Y *q) > > > > > > > > { > > > > > > > > memcpy (p, q, sizeof (struct Y)); > > > > > > > > if (strlen ((char *)(struct Y *)p + 4) < 7) > > > > > > > > abort (); > > > > > > > > } > > > > > > > > > > > > > > > > here the GIMPLE IL looks like > > > > > > > > > > > > > > > > const char * _1; > > > > > > > > > > > > > > > > <bb 2> [local count: 1073741825]: > > > > > > > > _5 = MEM[(char * {ref-all})q_4(D)]; > > > > > > > > MEM[(char * {ref-all})p_6(D)] = _5; > > > > > > > > _1 = p_6(D) + 4; > > > > > > > > _2 = __builtin_strlen (_1); > > > > > > > > > > > > > > > > and I guess Martin would argue that since p is of type struct X > > > > > > > > + 4 gets you to c[4] and thus strlen of that cannot be larger > > > > > > > > than 3. But of course the middle-end doesn't work like that > > > > > > > > and luckily we do not try to draw such conclusions or we > > > > > > > > are somehow lucky that for the testcase as written above we do > > > not > > > > > > > > (I'm not sure whether Martins changes in this area would derive > > > > > > > > such conclusions in principle). > > > > > > > > > > > > > > Only if the strlen argument were p->c. > > > > > > > > > > > > > > > NOTE - we do not know the dynamic type here since we do not know > > > > > > > > the dynamic type of the memory pointed-to by q! We can only > > > > > > > > derive that at q+4 there must be some object that we can > > > > > > > > validly call strlen on (where Martin again thinks strlen > > > > > > > > imposes constrains that memchr does not - sth I do not agree > > > > > > > > with from a QOI perspective) > > > > > > > > > > > > > > The dynamic type is a murky area. > > > > > > > > > > > > It's well-specified in the middle-end. A store changes the > > > > > > dynamic type of the stored-to object. If that type is > > > > > > compatible with the surrounding objects dynamic type that one > > > > > > is not affected, if not then the surrounding objects dynamic > > > > > > type becomes unspecified. There is TYPE_TYPELESS_STORAGE > > > > > > to somewhat control "compatibility" of subobjects. > > > > > > > > > > I never responded to this. Using a dynamic (effective) type as > > > > > you describe it would invalidate the aggressive loop optimization > > > > > in the following: > > > > > > > > > > void foo (struct X *p) > > > > > { > > > > > struct Y y = { "12345678" }; > > > > > memcpy (p, &y, sizeof (struct Y)); > > > > > > > > > > // *p effective type is now struct Y > > > > > > > > > > int n = 0; > > > > > while (p->c[n]) > > > > > ++n; > > > > > > > > > > if (n < 7) > > > > > abort (); > > > > > } > > > > > > > > > > GCC unconditionally aborts, just as it does with strlen(p->c). > > > > > Why is that not wrong (in either case)? > > > > > > > > > > Because the code is invalid either way, for two reasons: > > > > > > > > No, because the storage has only 4 non-null characters starting at > > > offset 4? > > > > > > No, for the reasons below. I made a mistake of making > > > the initializer string too short. If we make it longer it > > > still aborts. Say with this > > > > > > struct Y y = { "123456789012345" }; > > > > > > we end up with this DCE: > > > > > > struct Y y; > > > > > > <bb 2> : > > > MEM[(char * {ref-all})p_6(D)] = 0x353433323130393837363534333231; > > > __builtin_abort (); > > > > > > With -fdump-tree-cddce1-details (and a patch to show the upper > > > bound) we see: > > > > > > Found loop 1 to be finite: upper bound found: 3. > > > > > > With -fno-aggressive-loop-optimizations the abort becomes > > > conditional because the array bound isn't considered. I would > > > expect you to know this since you implemented the feature. > > > > Honza added the array indexing part and it may very well be too aggressive. > > I have to take a closer look after vacation to tell. Can you open a PR and > > CC me there? > > I opened bug 86884. Now that I returned from vacation the testcase is simply bogus. The access p->c[n] requires an affective type of X. Richard. > Martin > > > > > Richard. > > > > > > > > Martin > > > > > > > > > 1) it accesses an object of (effective) type struct Y via > > > > > an lvalue of type struct X (specifically via (*p).c) > > > > > 2) it relies on p->c > > > > > > > > > > The loop optimization relies on the exact same requirement > > > > > as the strlen one. Either they are both valid or neither is. > > > > > > > > > > Martin > > > > > > > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-17 10:31 ` Richard Biener @ 2018-08-17 15:49 ` Martin Sebor 2018-08-19 15:55 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-17 15:49 UTC (permalink / raw) To: Richard Biener; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Jakub Jelinek On 08/17/2018 04:31 AM, Richard Biener wrote: > On Tue, 7 Aug 2018, Martin Sebor wrote: > >> On 08/07/2018 11:44 AM, Richard Biener wrote: >>> On August 7, 2018 4:37:00 PM GMT+02:00, Martin Sebor <msebor@gmail.com> >>> wrote: >>>> On 08/07/2018 02:51 AM, Richard Biener wrote: >>>>> On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor >>>> <msebor@gmail.com> wrote: >>>>>> On 07/26/2018 02:55 AM, Richard Biener wrote: >>>>>>> On Wed, 25 Jul 2018, Martin Sebor wrote: >>>>>>> >>>>>>>>> BUT - for the string_constant and c_strlen functions we are, >>>>>>>>> in all cases we return something interesting, able to look >>>>>>>>> at an initializer which then determines that type. Hopefully. >>>>>>>>> I think the strlen() folding code when it sets SSA ranges >>>>>>>>> now looks at types ...? >>>>>>>>> >>>>>>>>> Consider >>>>>>>>> >>>>>>>>> struct X { int i; char c[4]; int j;}; >>>>>>>>> struct Y { char c[16]; }; >>>>>>>>> >>>>>>>>> void foo (struct X *p, struct Y *q) >>>>>>>>> { >>>>>>>>> memcpy (p, q, sizeof (struct Y)); >>>>>>>>> if (strlen ((char *)(struct Y *)p + 4) < 7) >>>>>>>>> abort (); >>>>>>>>> } >>>>>>>>> >>>>>>>>> here the GIMPLE IL looks like >>>>>>>>> >>>>>>>>> const char * _1; >>>>>>>>> >>>>>>>>> <bb 2> [local count: 1073741825]: >>>>>>>>> _5 = MEM[(char * {ref-all})q_4(D)]; >>>>>>>>> MEM[(char * {ref-all})p_6(D)] = _5; >>>>>>>>> _1 = p_6(D) + 4; >>>>>>>>> _2 = __builtin_strlen (_1); >>>>>>>>> >>>>>>>>> and I guess Martin would argue that since p is of type struct X >>>>>>>>> + 4 gets you to c[4] and thus strlen of that cannot be larger >>>>>>>>> than 3. But of course the middle-end doesn't work like that >>>>>>>>> and luckily we do not try to draw such conclusions or we >>>>>>>>> are somehow lucky that for the testcase as written above we do >>>> not >>>>>>>>> (I'm not sure whether Martins changes in this area would derive >>>>>>>>> such conclusions in principle). >>>>>>>> >>>>>>>> Only if the strlen argument were p->c. >>>>>>>> >>>>>>>>> NOTE - we do not know the dynamic type here since we do not know >>>>>>>>> the dynamic type of the memory pointed-to by q! We can only >>>>>>>>> derive that at q+4 there must be some object that we can >>>>>>>>> validly call strlen on (where Martin again thinks strlen >>>>>>>>> imposes constrains that memchr does not - sth I do not agree >>>>>>>>> with from a QOI perspective) >>>>>>>> >>>>>>>> The dynamic type is a murky area. >>>>>>> >>>>>>> It's well-specified in the middle-end. A store changes the >>>>>>> dynamic type of the stored-to object. If that type is >>>>>>> compatible with the surrounding objects dynamic type that one >>>>>>> is not affected, if not then the surrounding objects dynamic >>>>>>> type becomes unspecified. There is TYPE_TYPELESS_STORAGE >>>>>>> to somewhat control "compatibility" of subobjects. >>>>>> >>>>>> I never responded to this. Using a dynamic (effective) type as >>>>>> you describe it would invalidate the aggressive loop optimization >>>>>> in the following: >>>>>> >>>>>> void foo (struct X *p) >>>>>> { >>>>>> struct Y y = { "12345678" }; >>>>>> memcpy (p, &y, sizeof (struct Y)); >>>>>> >>>>>> // *p effective type is now struct Y >>>>>> >>>>>> int n = 0; >>>>>> while (p->c[n]) >>>>>> ++n; >>>>>> >>>>>> if (n < 7) >>>>>> abort (); >>>>>> } >>>>>> >>>>>> GCC unconditionally aborts, just as it does with strlen(p->c). >>>>>> Why is that not wrong (in either case)? >>>>>> >>>>>> Because the code is invalid either way, for two reasons: >>>>> >>>>> No, because the storage has only 4 non-null characters starting at >>>> offset 4? >>>> >>>> No, for the reasons below. I made a mistake of making >>>> the initializer string too short. If we make it longer it >>>> still aborts. Say with this >>>> >>>> struct Y y = { "123456789012345" }; >>>> >>>> we end up with this DCE: >>>> >>>> struct Y y; >>>> >>>> <bb 2> : >>>> MEM[(char * {ref-all})p_6(D)] = 0x353433323130393837363534333231; >>>> __builtin_abort (); >>>> >>>> With -fdump-tree-cddce1-details (and a patch to show the upper >>>> bound) we see: >>>> >>>> Found loop 1 to be finite: upper bound found: 3. >>>> >>>> With -fno-aggressive-loop-optimizations the abort becomes >>>> conditional because the array bound isn't considered. I would >>>> expect you to know this since you implemented the feature. >>> >>> Honza added the array indexing part and it may very well be too aggressive. >>> I have to take a closer look after vacation to tell. Can you open a PR and >>> CC me there? >> >> I opened bug 86884. > > Now that I returned from vacation the testcase is simply bogus. The > access p->c[n] requires an affective type of X. I agree. And so does strlen(p->c). It was your example (with strlen) to illustrate a valid use case. Saying that the loop is invalid when p->c doesn't contain zero: int n = 0; while (p->c[n]) ++n; means that the equivalent loop below is also invalid: int n = 0; for (const char *q = p->c; *q; ++q) ++n; The three (the two loops and the strlen(p->c) call) are all equivalent and semantically identical. Since one of them is invalid, the other two must be as well. It's not wrong to emit different code for equivalent invalid constructs, but as a matter of a QoI, it makes no sense to _guarantee_ different semantics for constructs that, at the C/C++ language level, are semantically equivalent, just because they use subtly different syntax. Put another way, a programmer who uses strlen(p->c) in this context and who is assured that it's valid code should be free to replace that call with a loop and have it continue to work without change. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-17 15:49 ` Martin Sebor @ 2018-08-19 15:55 ` Bernd Edlinger 2018-08-20 10:24 ` Richard Biener 2018-08-21 22:43 ` Jeff Law 0 siblings, 2 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-08-19 15:55 UTC (permalink / raw) To: Martin Sebor, Richard Biener; +Cc: Jeff Law, GCC Patches, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 1186 bytes --] Hi, I rebased my range computation patch to current trunk, and updated it according to what was discussed here. That means get_range_strlen has already a parameter that is used to differentiate between ranges for warnings and ranges for code-gen. That is called "strict", in the 4-parameter overload and "fuzzy" in the internally used 7-parameter overload. So I added an "optimistic" parameter to my get_inner_char_array_unless_typecast helper function. That's it. Therefore at this time, there is only one warning regression in one test case and one xfailed warning test case fixed. So that is par on the warning regression side. The failed test case is gcc/testsuite/gcc.dg/pr83373.c which uses -fassume-zero-terminated-char-arrays, to enable the (unsafe) feedback from string-length information to VRP to suppress the warning. The 5 test cases that were designed to check the optimized tree dump have to use the new -fassume-zero-terminated-char-arrays option, but that is what we agreed upon. The patch is not dependent on any other patches. Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd. [-- Attachment #2: changelog-range-strlen-v4.txt --] [-- Type: text/plain, Size: 1311 bytes --] gcc: 2018-08-19 Bernd Edlinger <bernd.edlinger@hotmail.de> * common.opt: Add new optimization option -fassume-zero-terminated-char-arrays. * opts.c (default_options): Enable -fassume-zero-terminated-char-arrays with -Ofast. * gimple-fold.c (get_inner_char_array_unless_typecast): Helper function for strlen range estimations. (get_range_strlen): Use get_inner_char_array_unless_typecast. * gimple-fold.h (get_inner_char_array_unless_typecast): Declare. * tree-ssa-strlen.c (maybe_set_strlen_range): Likewise. (adjust_last_stmt): Avoid folding away undefined behaviour. (get_min_string_length): Avoid not NUL terminated string literals. * doc/invoke.texi: Document -fassume-zero-terminated-char-arrays. * tree-ssa-dse.c (compute_trims): Avoid folding away undefined behaviour. testsuite: 2018-08-19 Bernd Edlinger <bernd.edlinger@hotmail.de> * gcc.dg/pr83373.c: Add -fassume-zero-terminated-char-arrays. * gcc.dg/Wstringop-overflow-6.c: Remove xfail. * gcc.dg/strlenopt-36.c: Adjust test expectations. * gcc.dg/strlenopt-40.c: Likewise. * gcc.dg/strlenopt-45.c: Likewise. * gcc.dg/strlenopt-48.c: Likewise. * gcc.dg/strlenopt-51.c: Likewise. * gcc.dg/strlenopt-57.c: New test. * gcc.dg/strlenopt-58.c: New test. * gcc.dg/strlenopt-59.c: New test. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: patch-range-strlen-v4.diff --] [-- Type: text/x-patch; name="patch-range-strlen-v4.diff", Size: 27024 bytes --] diff -Npur gcc/common.opt gcc/common.opt --- gcc/common.opt 2018-08-18 06:59:47.000000000 +0200 +++ gcc/common.opt 2018-08-18 23:31:11.241191079 +0200 @@ -1025,6 +1025,10 @@ fsanitize-undefined-trap-on-error Common Driver Report Var(flag_sanitize_undefined_trap_on_error) Init(0) Use trap instead of a library function for undefined behavior sanitization. +fassume-zero-terminated-char-arrays +Common Var(flag_assume_zero_terminated_char_arrays) Optimization Init(0) +Optimize under the assumption that char arrays must always be zero terminated. + fasynchronous-unwind-tables Common Report Var(flag_asynchronous_unwind_tables) Optimization Generate unwind tables that are exact at each instruction boundary. diff -Npur gcc/doc/invoke.texi gcc/doc/invoke.texi --- gcc/doc/invoke.texi 2018-08-18 06:59:32.000000000 +0200 +++ gcc/doc/invoke.texi 2018-08-18 23:31:11.248190982 +0200 @@ -388,7 +388,8 @@ Objective-C and Objective-C++ Dialects}. -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol --fassociative-math -fauto-profile -fauto-profile[=@var{path}] @gol +-fassociative-math -fassume-zero-terminated-char-arrays @gol +-fauto-profile -fauto-profile[=@var{path}] @gol -fauto-inc-dec -fbranch-probabilities @gol -fbranch-target-load-optimize -fbranch-target-load-optimize2 @gol -fbtr-bb-exclusive -fcaller-saves @gol @@ -9977,6 +9978,17 @@ is automatically enabled when both @opti The default is @option{-fno-associative-math}. +@item -fassume-zero-terminated-char-arrays +@opindex fassume-zero-terminated-char-arrays + +Optimize under the assumption that char arrays must always be zero +terminated. This may have an effect on code that uses strlen to +check the string length, for instance in assertions. Under certain +conditions such checks can be optimized away. This option is enabled +by default at optimization level @option{-Ofast}. + +The default is @option{-fno-assume-zero-terminated-char-arrays}. + @item -freciprocal-math @opindex freciprocal-math diff -Npur gcc/gimple-fold.c gcc/gimple-fold.c --- gcc/gimple-fold.c 2018-08-17 05:00:53.000000000 +0200 +++ gcc/gimple-fold.c 2018-08-19 08:06:47.597624090 +0200 @@ -1257,6 +1257,45 @@ gimple_fold_builtin_memset (gimple_stmt_ return true; } +/* Obtain the inner char array for strlen range estimations. + Return NULL if ARG is not a char array, or if the inner reference + chain goes through a type cast. + OPTIMISTIC is true when the result is used for warnings only. */ + +tree +get_inner_char_array_unless_typecast (tree arg, bool optimistic) +{ + if (!flag_assume_zero_terminated_char_arrays && !optimistic) + return NULL_TREE; + + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (arg)) != ARRAY_TYPE + || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != INTEGER_TYPE + || TYPE_MODE (TREE_TYPE (TREE_TYPE (arg))) != TYPE_MODE (char_type_node) + || TYPE_PRECISION (TREE_TYPE (TREE_TYPE (arg))) + != TYPE_PRECISION (char_type_node)) + return NULL_TREE; + + tree base = arg; + while (TREE_CODE (base) == ARRAY_REF + || TREE_CODE (base) == ARRAY_RANGE_REF + || TREE_CODE (base) == COMPONENT_REF) + base = TREE_OPERAND (base, 0); + + /* If this looks like a type cast don't assume anything. */ + if ((TREE_CODE (base) == MEM_REF + && (! integer_zerop (TREE_OPERAND (base, 1)) + || TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0)))) + != TYPE_MAIN_VARIANT (TREE_TYPE (base)))) + || TREE_CODE (base) == VIEW_CONVERT_EXPR + /* Or other stuff that would be handled by get_inner_reference. */ + || TREE_CODE (base) == BIT_FIELD_REF + || TREE_CODE (base) == REALPART_EXPR + || TREE_CODE (base) == IMAGPART_EXPR) + return NULL_TREE; + + return arg; +} /* Obtain the minimum and maximum string length or minimum and maximum value of ARG in LENGTH[0] and LENGTH[1], respectively. @@ -1272,6 +1311,7 @@ gimple_fold_builtin_memset (gimple_stmt_ PHIs and COND_EXPRs optimistically, if we can determine string length minimum and maximum, it will use the minimum from the ones where it can be determined. + TYPE == 2 and FUZZY != 0 cannot be used together. Set *FLEXP to true if the range of the string lengths has been obtained from the upper bound of an array at the end of a struct. Such an array may hold a string that's longer than its upper bound @@ -1312,8 +1352,8 @@ get_range_strlen (tree arg, tree length[ member. */ tree idx = TREE_OPERAND (op, 1); - arg = TREE_OPERAND (op, 0); - tree optype = TREE_TYPE (arg); + op = TREE_OPERAND (op, 0); + tree optype = TREE_TYPE (op); if (tree dom = TYPE_DOMAIN (optype)) if (tree bound = TYPE_MAX_VALUE (dom)) if (TREE_CODE (bound) == INTEGER_CST @@ -1339,23 +1379,22 @@ get_range_strlen (tree arg, tree length[ return get_range_strlen (TREE_OPERAND (arg, 0), length, visited, type, fuzzy, flexp, eltsize); + if (eltsize != 1) + return false; + if (TREE_CODE (arg) == ARRAY_REF) { - tree type = TREE_TYPE (TREE_OPERAND (arg, 0)); - - /* Determine the "innermost" array type. */ - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); - - /* Avoid arrays of pointers. */ - tree eltype = TREE_TYPE (type); - if (TREE_CODE (type) != ARRAY_TYPE - || !INTEGRAL_TYPE_P (eltype)) + arg = get_inner_char_array_unless_typecast (arg, fuzzy == 2); + if (!arg) return false; + tree type = TREE_TYPE (arg); + + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); - if (!val || integer_zerop (val)) + if (!val + || TREE_CODE (val) != INTEGER_CST + || integer_zerop (val)) return false; val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, @@ -1364,15 +1403,17 @@ get_range_strlen (tree arg, tree length[ the array could have zero length. */ *minlen = ssize_int (0); - if (TREE_CODE (TREE_OPERAND (arg, 0)) == COMPONENT_REF - && type == TREE_TYPE (TREE_OPERAND (arg, 0)) - && array_at_struct_end_p (TREE_OPERAND (arg, 0))) + if (TREE_CODE (arg) == COMPONENT_REF + && type == TREE_TYPE (arg) + && array_at_struct_end_p (arg)) *flexp = true; } - else if (TREE_CODE (arg) == COMPONENT_REF - && (TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 1))) - == ARRAY_TYPE)) + else if (TREE_CODE (arg) == COMPONENT_REF) { + arg = get_inner_char_array_unless_typecast (arg, fuzzy == 2); + if (!arg) + return false; + /* Use the type of the member array to determine the upper bound on the length of the array. This may be overly optimistic if the array itself isn't NUL-terminated and @@ -1388,22 +1429,21 @@ get_range_strlen (tree arg, tree length[ tree type = TREE_TYPE (arg); - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); - /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); - if (!val || integer_zerop (val)) + if (!val + || TREE_CODE (val) != INTEGER_CST + || integer_zerop (val)) return false; + val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, integer_one_node); /* Set the minimum size to zero since the string in the array could have zero length. */ *minlen = ssize_int (0); } - - if (VAR_P (arg)) + else if (VAR_P (arg) + && (flag_assume_zero_terminated_char_arrays || fuzzy == 2)) { tree type = TREE_TYPE (arg); if (POINTER_TYPE_P (type)) @@ -1411,13 +1451,23 @@ get_range_strlen (tree arg, tree length[ if (TREE_CODE (type) == ARRAY_TYPE) { + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE + || TYPE_MODE (TREE_TYPE (type)) + != TYPE_MODE (char_type_node) + || TYPE_PRECISION (TREE_TYPE (type)) + != TYPE_PRECISION (char_type_node)) + return false; + + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop (val)) return false; - val = wide_int_to_tree (TREE_TYPE (val), - wi::sub (wi::to_wide (val), 1)); + + val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, + integer_one_node); /* Set the minimum size to zero since the string in the array could have zero length. */ *minlen = ssize_int (0); @@ -1550,6 +1600,7 @@ get_range_strlen (tree arg, tree length[ if we can determine string length minimum and maximum; it will use the minimum from the ones where it can be determined. STRICT false should be only used for warning code. + STRICT is by default false. ELTSIZE is 1 for normal single byte character strings, and 2 or 4 for wide characer strings. ELTSIZE is by default 1. */ diff -Npur gcc/gimple-fold.h gcc/gimple-fold.h --- gcc/gimple-fold.h 2018-08-17 05:00:53.000000000 +0200 +++ gcc/gimple-fold.h 2018-08-18 23:32:00.962501815 +0200 @@ -61,6 +61,7 @@ extern bool gimple_fold_builtin_snprintf extern bool arith_code_with_undefined_signed_overflow (tree_code); extern gimple_seq rewrite_to_defined_overflow (gimple *); extern void replace_call_with_value (gimple_stmt_iterator *, tree); +extern tree get_inner_char_array_unless_typecast (tree, bool); /* gimple_build, functionally matching fold_buildN, outputs stmts int the provided sequence, matching and simplifying them on-the-fly. diff -Npur gcc/opts.c gcc/opts.c --- gcc/opts.c 2018-08-17 05:02:32.000000000 +0200 +++ gcc/opts.c 2018-08-18 23:31:11.245191023 +0200 @@ -547,6 +547,7 @@ static const struct default_options defa /* -Ofast adds optimizations to -O3. */ { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 }, + { OPT_LEVELS_FAST, OPT_fassume_zero_terminated_char_arrays, NULL, 1 }, { OPT_LEVELS_NONE, 0, NULL, 0 } }; diff -Npur gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c --- gcc/tree-ssa-dse.c 2018-07-18 21:21:34.000000000 +0200 +++ gcc/tree-ssa-dse.c 2018-08-19 14:29:32.344498771 +0200 @@ -248,6 +248,12 @@ compute_trims (ao_ref *ref, sbitmap live residual handling in mem* and str* functions is usually reasonably efficient. */ *trim_tail = last_orig - last_live; + /* Don't fold away an out of bounds access, as this defeats proper + warnings. */ + if (*trim_tail + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (ref->base)), + last_orig) <= 0) + *trim_tail = 0; } else *trim_tail = 0; diff -Npur gcc/tree-ssa-strlen.c gcc/tree-ssa-strlen.c --- gcc/tree-ssa-strlen.c 2018-08-10 15:15:29.000000000 +0200 +++ gcc/tree-ssa-strlen.c 2018-08-19 10:56:24.331470762 +0200 @@ -1107,6 +1107,13 @@ adjust_last_stmt (strinfo *si, gimple *s to store the extra '\0' in that case. */ if ((tree_to_uhwi (len) & 3) == 0) return; + + /* Don't fold away an out of bounds access, as this defeats proper + warnings. */ + tree dst = gimple_call_arg (last.stmt, 0); + tree size = compute_objsize (dst, 0); + if (size && tree_int_cst_lt (size, len)) + return; } else if (TREE_CODE (len) == SSA_NAME) { @@ -1149,11 +1156,15 @@ maybe_set_strlen_range (tree lhs, tree s if (TREE_CODE (src) == ADDR_EXPR) { + src = TREE_OPERAND (src, 0); + + src = get_inner_char_array_unless_typecast (src, false); + + if (!src) + ; /* The last array member of a struct can be bigger than its size suggests if it's treated as a poor-man's flexible array member. */ - src = TREE_OPERAND (src, 0); - bool src_is_array = TREE_CODE (TREE_TYPE (src)) == ARRAY_TYPE; - if (src_is_array && !array_at_struct_end_p (src)) + else if (!array_at_struct_end_p (src)) { tree type = TREE_TYPE (src); if (tree size = TYPE_SIZE_UNIT (type)) @@ -1170,8 +1181,6 @@ maybe_set_strlen_range (tree lhs, tree s } else { - if (TREE_CODE (src) == COMPONENT_REF && !src_is_array) - src = TREE_OPERAND (src, 1); if (DECL_P (src)) { /* Handle the unlikely case of strlen (&c) where c is some @@ -3185,7 +3194,9 @@ get_min_string_length (tree rhs, bool *f && TREE_READONLY (rhs)) rhs = DECL_INITIAL (rhs); - if (rhs && TREE_CODE (rhs) == STRING_CST) + if (rhs && TREE_CODE (rhs) == STRING_CST + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)), + TREE_STRING_LENGTH (rhs)) >= 0) { *full_string_p = true; return strlen (TREE_STRING_POINTER (rhs)); diff -Npur gcc/testsuite/gcc.dg/pr83373.c gcc/testsuite/gcc.dg/pr83373.c --- gcc/testsuite/gcc.dg/pr83373.c 2018-08-08 20:46:22.000000000 +0200 +++ gcc/testsuite/gcc.dg/pr83373.c 2018-08-19 09:28:37.938079391 +0200 @@ -1,6 +1,6 @@ /* PR middle-end/83373 - False positive reported by -Wstringop-overflow { dg-do compile } - { dg-options "-O2 -Wstringop-overflow" } */ + { dg-options "-O2 -Wstringop-overflow -fassume-zero-terminated-char-arrays" } */ typedef __SIZE_TYPE__ size_t; diff -Npur gcc/testsuite/gcc.dg/strlenopt-36.c gcc/testsuite/gcc.dg/strlenopt-36.c --- gcc/testsuite/gcc.dg/strlenopt-36.c 2018-08-08 20:46:22.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-36.c 2018-08-19 09:21:09.425298094 +0200 @@ -1,7 +1,7 @@ /* PR tree-optimization/78450 - strlen(s) return value can be assumed to be less than the size of s { dg-do compile } - { dg-options "-O2 -fdump-tree-optimized" } */ + { dg-options "-O2 -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" diff -Npur gcc/testsuite/gcc.dg/strlenopt-40.c gcc/testsuite/gcc.dg/strlenopt-40.c --- gcc/testsuite/gcc.dg/strlenopt-40.c 2018-08-08 20:46:22.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-40.c 2018-08-18 23:31:11.249190968 +0200 @@ -1,7 +1,7 @@ /* PR tree-optimization/83671 - fix for false positive reported by -Wstringop-overflow does not work with inlining { dg-do compile } - { dg-options "-O1 -fdump-tree-optimized" } */ + { dg-options "-O1 -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" @@ -219,10 +219,15 @@ void elim_member_arrays_ptr (struct MemA ELIM_TRUE (strlen (ma0->a5_7[0]) < 7); ELIM_TRUE (strlen (ma0[0].a5_7[0]) < 7); +#if 0 + /* This is transformed into strlen ((const char *) &(ma0 + 64)->a5_7[0]) + which looks like a type cast and fails the check in + get_inner_char_array_unless_typecast. */ ELIM_TRUE (strlen (ma0[1].a5_7[0]) < 7); ELIM_TRUE (strlen (ma0[1].a5_7[4]) < 7); ELIM_TRUE (strlen (ma0[9].a5_7[0]) < 7); ELIM_TRUE (strlen (ma0[9].a5_7[4]) < 7); +#endif ELIM_TRUE (strlen (ma0->a3) < sizeof ma0->a3); ELIM_TRUE (strlen (ma0->a5) < sizeof ma0->a5); diff -Npur gcc/testsuite/gcc.dg/strlenopt-45.c gcc/testsuite/gcc.dg/strlenopt-45.c --- gcc/testsuite/gcc.dg/strlenopt-45.c 2018-08-08 20:46:22.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-45.c 2018-08-18 23:31:11.249190968 +0200 @@ -2,7 +2,7 @@ Test to verify that strnlen built-in expansion works correctly in the absence of tree strlen optimization. { dg-do compile } - { dg-options "-O2 -Wall -fdump-tree-optimized" } */ + { dg-options "-O2 -Wall -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" @@ -43,7 +43,6 @@ extern size_t strnlen (const char *, siz else \ FAIL (made_in_false_branch) -extern char c; extern char a1[1]; extern char a3[3]; extern char a5[5]; @@ -52,18 +51,6 @@ extern char ax[]; void elim_strnlen_arr_cst (void) { - /* The length of a string stored in a one-element array must be zero. - The result reported by strnlen() for such an array can be non-zero - only when the bound is equal to 1 (in which case the result must - be one). */ - ELIM (strnlen (&c, 0) == 0); - ELIM (strnlen (&c, 1) < 2); - ELIM (strnlen (&c, 2) == 0); - ELIM (strnlen (&c, 9) == 0); - ELIM (strnlen (&c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&c, SIZE_MAX) == 0); - ELIM (strnlen (&c, -1) == 0); - ELIM (strnlen (a1, 0) == 0); ELIM (strnlen (a1, 1) < 2); ELIM (strnlen (a1, 2) == 0); @@ -99,17 +86,18 @@ void elim_strnlen_arr_cst (void) ELIM (strnlen (a3_7[2], SIZE_MAX) < 8); ELIM (strnlen (a3_7[2], -1) < 8); - ELIM (strnlen ((char*)a3_7, 0) == 0); - ELIM (strnlen ((char*)a3_7, 1) < 2); - ELIM (strnlen ((char*)a3_7, 2) < 3); - ELIM (strnlen ((char*)a3_7, 3) < 4); - ELIM (strnlen ((char*)a3_7, 9) < 10); - ELIM (strnlen ((char*)a3_7, 19) < 20); - ELIM (strnlen ((char*)a3_7, 21) < 22); - ELIM (strnlen ((char*)a3_7, 23) < 22); - ELIM (strnlen ((char*)a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)a3_7, -1) < 22); + ELIM (strnlen ((char*)a3_7[0], 0) == 0); + ELIM (strnlen ((char*)a3_7[0], 1) < 2); + ELIM (strnlen ((char*)a3_7[0], 2) < 3); + ELIM (strnlen ((char*)a3_7[0], 3) < 4); + ELIM (strnlen ((char*)a3_7[0], 7) < 8); + ELIM (strnlen ((char*)a3_7[0], 9) < 7); + ELIM (strnlen ((char*)a3_7[0], 19) < 7); + ELIM (strnlen ((char*)a3_7[0], 21) < 7); + ELIM (strnlen ((char*)a3_7[0], 23) < 7); + ELIM (strnlen ((char*)a3_7[0], PTRDIFF_MAX) < 7); + ELIM (strnlen ((char*)a3_7[0], SIZE_MAX) < 7); + ELIM (strnlen ((char*)a3_7[0], -1) < 7); ELIM (strnlen (ax, 0) == 0); ELIM (strnlen (ax, 1) < 2); @@ -122,7 +110,6 @@ void elim_strnlen_arr_cst (void) struct MemArrays { - char c; char a0[0]; char a1[1]; char a3[3]; @@ -133,13 +120,6 @@ struct MemArrays void elim_strnlen_memarr_cst (struct MemArrays *p, int i) { - ELIM (strnlen (&p->c, 0) == 0); - ELIM (strnlen (&p->c, 1) < 2); - ELIM (strnlen (&p->c, 9) == 0); - ELIM (strnlen (&p->c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&p->c, SIZE_MAX) == 0); - ELIM (strnlen (&p->c, -1) == 0); - /* Other accesses to internal zero-length arrays are undefined. */ ELIM (strnlen (p->a0, 0) == 0); @@ -154,19 +134,19 @@ void elim_strnlen_memarr_cst (struct Mem ELIM (strnlen (p->a3, 1) < 2); ELIM (strnlen (p->a3, 2) < 3); ELIM (strnlen (p->a3, 3) < 4); - ELIM (strnlen (p->a3, 9) < 4); - ELIM (strnlen (p->a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p->a3, SIZE_MAX) < 4); - ELIM (strnlen (p->a3, -1) < 4); + ELIM (strnlen (p->a3, 9) < 3); + ELIM (strnlen (p->a3, PTRDIFF_MAX) < 3); + ELIM (strnlen (p->a3, SIZE_MAX) < 3); + ELIM (strnlen (p->a3, -1) < 3); ELIM (strnlen (p[i].a3, 0) == 0); ELIM (strnlen (p[i].a3, 1) < 2); ELIM (strnlen (p[i].a3, 2) < 3); ELIM (strnlen (p[i].a3, 3) < 4); - ELIM (strnlen (p[i].a3, 9) < 4); - ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p[i].a3, SIZE_MAX) < 4); - ELIM (strnlen (p[i].a3, -1) < 4); + ELIM (strnlen (p[i].a3, 9) < 3); + ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 3); + ELIM (strnlen (p[i].a3, SIZE_MAX) < 3); + ELIM (strnlen (p[i].a3, -1) < 3); ELIM (strnlen (p->a3_7[0], 0) == 0); ELIM (strnlen (p->a3_7[0], 1) < 2); @@ -203,17 +183,18 @@ void elim_strnlen_memarr_cst (struct Mem ELIM (strnlen (p->a3_7[i], 19) < 20); #endif - ELIM (strnlen ((char*)p->a3_7, 0) == 0); - ELIM (strnlen ((char*)p->a3_7, 1) < 2); - ELIM (strnlen ((char*)p->a3_7, 2) < 3); - ELIM (strnlen ((char*)p->a3_7, 3) < 4); - ELIM (strnlen ((char*)p->a3_7, 9) < 10); - ELIM (strnlen ((char*)p->a3_7, 19) < 20); - ELIM (strnlen ((char*)p->a3_7, 21) < 22); - ELIM (strnlen ((char*)p->a3_7, 23) < 22); - ELIM (strnlen ((char*)p->a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, -1) < 22); + ELIM (strnlen ((char*)p->a3_7[0], 0) == 0); + ELIM (strnlen ((char*)p->a3_7[0], 1) < 2); + ELIM (strnlen ((char*)p->a3_7[0], 2) < 3); + ELIM (strnlen ((char*)p->a3_7[0], 3) < 4); + ELIM (strnlen ((char*)p->a3_7[0], 7) < 8); + ELIM (strnlen ((char*)p->a3_7[0], 9) < 7); + ELIM (strnlen ((char*)p->a3_7[0], 19) < 7); + ELIM (strnlen ((char*)p->a3_7[0], 21) < 7); + ELIM (strnlen ((char*)p->a3_7[0], 23) < 7); + ELIM (strnlen ((char*)p->a3_7[0], PTRDIFF_MAX) < 7); + ELIM (strnlen ((char*)p->a3_7[0], SIZE_MAX) < 7); + ELIM (strnlen ((char*)p->a3_7[0], -1) < 7); ELIM (strnlen (p->ax, 0) == 0); ELIM (strnlen (p->ax, 1) < 2); @@ -290,9 +271,6 @@ void elim_strnlen_range (char *s) void keep_strnlen_arr_cst (void) { - KEEP (strnlen (&c, 1) == 0); - KEEP (strnlen (&c, 1) == 1); - KEEP (strnlen (a1, 1) == 0); KEEP (strnlen (a1, 1) == 1); @@ -301,16 +279,12 @@ void keep_strnlen_arr_cst (void) struct FlexArrays { - char c; char a0[0]; /* Access to internal zero-length arrays are undefined. */ char a1[1]; }; void keep_strnlen_memarr_cst (struct FlexArrays *p) { - KEEP (strnlen (&p->c, 1) == 0); - KEEP (strnlen (&p->c, 1) == 1); - #if 0 /* Accesses to internal zero-length arrays are undefined so avoid exercising them. */ @@ -331,5 +305,5 @@ void keep_strnlen_memarr_cst (struct Fle /* { dg-final { scan-tree-dump-times "call_in_true_branch_not_eliminated_" 0 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 13 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 13 "optimized" } } */ + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 9 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 9 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-48.c gcc/testsuite/gcc.dg/strlenopt-48.c --- gcc/testsuite/gcc.dg/strlenopt-48.c 2018-08-08 20:46:22.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-48.c 2018-08-19 09:22:35.473103694 +0200 @@ -3,7 +3,7 @@ Verify that strlen() calls with one-character array elements of multidimensional arrays are still folded. { dg-do compile } - { dg-options "-O2 -Wall -fdump-tree-optimized" } */ + { dg-options "-O2 -Wall -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" diff -Npur gcc/testsuite/gcc.dg/strlenopt-51.c gcc/testsuite/gcc.dg/strlenopt-51.c --- gcc/testsuite/gcc.dg/strlenopt-51.c 2018-08-08 20:46:22.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-51.c 2018-08-18 23:31:11.249190968 +0200 @@ -101,7 +101,7 @@ void test_keep_a9_9 (int i) { #undef T #define T(I) \ - KEEP (strlen (&a9_9[i][I][0]) > (1 + I) % 9); \ + KEEP (strlen (&a9_9[i][I][0]) > (0 + I) % 9); \ KEEP (strlen (&a9_9[i][I][1]) > (1 + I) % 9); \ KEEP (strlen (&a9_9[i][I][2]) > (2 + I) % 9); \ KEEP (strlen (&a9_9[i][I][3]) > (3 + I) % 9); \ @@ -115,7 +115,7 @@ void test_keep_a9_9 (int i) } /* { dg-final { scan-tree-dump-times "strlen" 72 "gimple" } } - { dg-final { scan-tree-dump-times "strlen" 63 "optimized" } } + { dg-final { scan-tree-dump-times "strlen" 72 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 72 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-57.c gcc/testsuite/gcc.dg/strlenopt-57.c --- gcc/testsuite/gcc.dg/strlenopt-57.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.dg/strlenopt-57.c 2018-08-18 23:31:11.250190954 +0200 @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +#define assert(x) do { if (!(x)) __builtin_abort (); } while (0) +extern int system (const char *); +static int fun (char *p) +{ + char buf[16]; + + assert (__builtin_strlen (p) < 4); + + __builtin_sprintf (buf, "echo %s - %s", p, p); + return system (buf); +} + +void test (void) +{ + char b[2] = "ab"; + fun (b); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_sprintf" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "system" 1 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-58.c gcc/testsuite/gcc.dg/strlenopt-58.c --- gcc/testsuite/gcc.dg/strlenopt-58.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.dg/strlenopt-58.c 2018-08-19 08:00:34.462811696 +0200 @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-optimized" } */ + +typedef char A[6]; +typedef char B[2][3]; + +A a; + +void test (void) +{ + B* b = (B*) a; + if (__builtin_strlen ((*b)[0]) > 2) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-59.c gcc/testsuite/gcc.dg/strlenopt-59.c --- gcc/testsuite/gcc.dg/strlenopt-59.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.dg/strlenopt-59.c 2018-08-19 08:00:47.832625824 +0200 @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-optimized" } */ + +typedef char B[2][3]; + +B b; + +void test (void) +{ + if (__builtin_strlen (b[0]) > 2) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump-not "__builtin_strlen" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "__builtin_abort" "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/Wstringop-overflow-6.c gcc/testsuite/gcc.dg/Wstringop-overflow-6.c --- gcc/testsuite/gcc.dg/Wstringop-overflow-6.c 2018-06-15 21:13:15.000000000 +0200 +++ gcc/testsuite/gcc.dg/Wstringop-overflow-6.c 2018-08-19 14:07:49.267391775 +0200 @@ -25,7 +25,7 @@ void test_strcpy_strcat_1 (void) void test_strcpy_strcat_2 (void) { - strcpy (a2, "12"), strcat (a2, "3"); /* { dg-warning "\\\[-Wstringop-overflow=]" "bug 86121" { xfail *-*-* } } */ + strcpy (a2, "12"), strcat (a2, "3"); /* { dg-warning "\\\[-Wstringop-overflow=]" } */ } void test_strcpy_strcat_3 (void) ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-19 15:55 ` Bernd Edlinger @ 2018-08-20 10:24 ` Richard Biener 2018-08-20 17:23 ` Bernd Edlinger 2018-08-21 22:43 ` Jeff Law 1 sibling, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-08-20 10:24 UTC (permalink / raw) To: Bernd Edlinger; +Cc: Martin Sebor, Jeff Law, GCC Patches, Jakub Jelinek On Sun, 19 Aug 2018, Bernd Edlinger wrote: > Hi, > > > I rebased my range computation patch to current trunk, > and updated it according to what was discussed here. > > That means get_range_strlen has already a parameter > that is used to differentiate between ranges for warnings > and ranges for code-gen. > > That is called "strict", in the 4-parameter overload > and "fuzzy" in the internally used 7-parameter overload. > > So I added an "optimistic" parameter to my > get_inner_char_array_unless_typecast helper function. > That's it. > > Therefore at this time, there is only one warning regression > in one test case and one xfailed warning test case fixed. > > So that is par on the warning regression side. > > The failed test case is gcc/testsuite/gcc.dg/pr83373.c which > uses -fassume-zero-terminated-char-arrays, to enable the > (unsafe) feedback from string-length information to VRP to > suppress the warning. > > The 5 test cases that were designed to check the optimized > tree dump have to use the new -fassume-zero-terminated-char-arrays > option, but that is what we agreed upon. > > The patch is not dependent on any other patches. > > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? + tree base = arg; + while (TREE_CODE (base) == ARRAY_REF + || TREE_CODE (base) == ARRAY_RANGE_REF + || TREE_CODE (base) == COMPONENT_REF) + base = TREE_OPERAND (base, 0); + + /* If this looks like a type cast don't assume anything. */ + if ((TREE_CODE (base) == MEM_REF + && (! integer_zerop (TREE_OPERAND (base, 1)) + || TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0)))) + != TYPE_MAIN_VARIANT (TREE_TYPE (base)))) I'm not convinced you are testing anything useful here. TREE_OPERAND (base, 1) might be a pointer which means it's type doesn't have any semantics so you are testing the access type against "random". If you'd restrict this to ADDR_EXPRs and look at the objects declared type then you'd still miss type-changes from a dynamic type that is different from what is declared. So my conclusion is if you really want to not want to return arg for things that look like a type cast then you have to unconditionally return NULL_TREE. + || TREE_CODE (base) == VIEW_CONVERT_EXPR + /* Or other stuff that would be handled by get_inner_reference. */ simply use || handled_component_p (base) for the above and the rest to be sure to handle everything that is not stripped above. + || TREE_CODE (base) == BIT_FIELD_REF + || TREE_CODE (base) == REALPART_EXPR + || TREE_CODE (base) == IMAGPART_EXPR) + return NULL_TREE; Btw, you are always returning the passed arg or NULL_TREE so formulating this as a predicate function makes uses easier. Not sure why it is called "inner" char array? There do seem to be independently useful fixes in the patch that I'd approve immediately. Btw, I don't think we want sth like flag_assume_zero_terminated_char_arrays or even make it default at -Ofast. Richard. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-20 10:24 ` Richard Biener @ 2018-08-20 17:23 ` Bernd Edlinger 2018-08-21 8:46 ` Richard Biener 0 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-20 17:23 UTC (permalink / raw) To: Richard Biener; +Cc: Martin Sebor, Jeff Law, GCC Patches, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 4353 bytes --] On 08/20/18 12:23, Richard Biener wrote: > On Sun, 19 Aug 2018, Bernd Edlinger wrote: > >> Hi, >> >> >> I rebased my range computation patch to current trunk, >> and updated it according to what was discussed here. >> >> That means get_range_strlen has already a parameter >> that is used to differentiate between ranges for warnings >> and ranges for code-gen. >> >> That is called "strict", in the 4-parameter overload >> and "fuzzy" in the internally used 7-parameter overload. >> >> So I added an "optimistic" parameter to my >> get_inner_char_array_unless_typecast helper function. >> That's it. >> >> Therefore at this time, there is only one warning regression >> in one test case and one xfailed warning test case fixed. >> >> So that is par on the warning regression side. >> >> The failed test case is gcc/testsuite/gcc.dg/pr83373.c which >> uses -fassume-zero-terminated-char-arrays, to enable the >> (unsafe) feedback from string-length information to VRP to >> suppress the warning. >> >> The 5 test cases that were designed to check the optimized >> tree dump have to use the new -fassume-zero-terminated-char-arrays >> option, but that is what we agreed upon. >> >> The patch is not dependent on any other patches. >> >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? > > + tree base = arg; > + while (TREE_CODE (base) == ARRAY_REF > + || TREE_CODE (base) == ARRAY_RANGE_REF > + || TREE_CODE (base) == COMPONENT_REF) > + base = TREE_OPERAND (base, 0); > + > + /* If this looks like a type cast don't assume anything. */ > + if ((TREE_CODE (base) == MEM_REF > + && (! integer_zerop (TREE_OPERAND (base, 1)) > + || TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, > 0)))) > + != TYPE_MAIN_VARIANT (TREE_TYPE (base)))) > > I'm not convinced you are testing anything useful here. > TREE_OPERAND (base, 1) might be a pointer which means it's type > doesn't have any semantics so you are testing the access type > against "random". If you'd restrict this to ADDR_EXPRs and > look at the objects declared type then you'd still miss > type-changes from a dynamic type that is different from what > is declared. > This whole function is only used for warnings or if the test case asks for unsafe optimization via -ffassume-zero-terminated-char-arrays. So yes, it is understood, but it has proven to be an oracle with 99.9% likelihood to give the right answer. > So my conclusion is if you really want to not want to return > arg for things that look like a type cast then you have to > unconditionally return NULL_TREE. Yes. :-) > > + || TREE_CODE (base) == VIEW_CONVERT_EXPR > + /* Or other stuff that would be handled by get_inner_reference. */ > > simply use || handled_component_p (base) for the above and the rest > to be sure to handle everything that is not stripped above. > > + || TREE_CODE (base) == BIT_FIELD_REF > + || TREE_CODE (base) == REALPART_EXPR > + || TREE_CODE (base) == IMAGPART_EXPR) > + return NULL_TREE; > Yes, good point. > Btw, you are always returning the passed arg or NULL_TREE so > formulating this as a predicate function makes uses easier. > Not sure why it is called "inner" char array? > ; Yes, in a previous version of this patch, this function actually walked towards the innermost array, and returned that, but I dropped that part, as it caused too many test cases regress. So agreed, I think I will convert that to a _p function and think of a better name. > There do seem to be independently useful fixes in the patch that > I'd approve immediately. > Yes, I found some peanuts on my way. For instance this fix for PR middle-end/86121 survives bootstrap on it's own, and fixes one xfail. Is it OK for trunk? > Btw, I don't think we want sth like > flag_assume_zero_terminated_char_arrays or even make it default at > -Ofast. > Yes, I agree. Is there a consensus about this? If yes, I go ahead and remove that option again. BTW: I needed this option in a few test cases, that insist in checking the optimizer to eliminate stuff, based on the VRP info. (6 +/-1 or so). But we can as well remove those test cases. Bernd. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: patch-pr86121.diff --] [-- Type: text/x-patch; name="patch-pr86121.diff", Size: 1509 bytes --] 2018-08-20 Bernd Edlinger <bernd.edlinger@hotmail.de> PR middle-end/86121 * tree-ssa-strlen.c (adjust_last_stmt): Avoid folding away undefined behaviour. testsuite: 2018-08-20 Bernd Edlinger <bernd.edlinger@hotmail.de> PR middle-end/86121 * gcc.dg/Wstringop-overflow-6.c: Remove xfail. diff -ur gcc/testsuite/gcc.dg/Wstringop-overflow-6.c gcc/testsuite/gcc.dg/Wstringop-overflow-6.c --- gcc/testsuite/gcc.dg/Wstringop-overflow-6.c 2018-06-12 20:05:13.000000000 +0200 +++ gcc/testsuite/gcc.dg/Wstringop-overflow-6.c 2018-08-20 14:53:55.605350343 +0200 @@ -25,7 +25,7 @@ void test_strcpy_strcat_2 (void) { - strcpy (a2, "12"), strcat (a2, "3"); /* { dg-warning "\\\[-Wstringop-overflow=]" "bug 86121" { xfail *-*-* } } */ + strcpy (a2, "12"), strcat (a2, "3"); /* { dg-warning "\\\[-Wstringop-overflow=]" } */ } void test_strcpy_strcat_3 (void) diff -ur gcc/tree-ssa-strlen.c gcc/tree-ssa-strlen.c --- gcc/tree-ssa-strlen.c 2018-08-02 01:39:35.000000000 +0200 +++ gcc/tree-ssa-strlen.c 2018-08-20 12:41:23.352955874 +0200 @@ -1107,6 +1107,13 @@ to store the extra '\0' in that case. */ if ((tree_to_uhwi (len) & 3) == 0) return; + + /* Don't fold away an out of bounds access, as this defeats proper + warnings. */ + tree dst = gimple_call_arg (last.stmt, 0); + tree size = compute_objsize (dst, 0); + if (size && tree_int_cst_lt (size, len)) + return; } else if (TREE_CODE (len) == SSA_NAME) { ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-20 17:23 ` Bernd Edlinger @ 2018-08-21 8:46 ` Richard Biener 2018-08-21 22:25 ` Jeff Law 0 siblings, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-08-21 8:46 UTC (permalink / raw) To: Bernd Edlinger; +Cc: Martin Sebor, Jeff Law, GCC Patches, Jakub Jelinek On Mon, 20 Aug 2018, Bernd Edlinger wrote: > > > On 08/20/18 12:23, Richard Biener wrote: > > On Sun, 19 Aug 2018, Bernd Edlinger wrote: > > > >> Hi, > >> > >> > >> I rebased my range computation patch to current trunk, > >> and updated it according to what was discussed here. > >> > >> That means get_range_strlen has already a parameter > >> that is used to differentiate between ranges for warnings > >> and ranges for code-gen. > >> > >> That is called "strict", in the 4-parameter overload > >> and "fuzzy" in the internally used 7-parameter overload. > >> > >> So I added an "optimistic" parameter to my > >> get_inner_char_array_unless_typecast helper function. > >> That's it. > >> > >> Therefore at this time, there is only one warning regression > >> in one test case and one xfailed warning test case fixed. > >> > >> So that is par on the warning regression side. > >> > >> The failed test case is gcc/testsuite/gcc.dg/pr83373.c which > >> uses -fassume-zero-terminated-char-arrays, to enable the > >> (unsafe) feedback from string-length information to VRP to > >> suppress the warning. > >> > >> The 5 test cases that were designed to check the optimized > >> tree dump have to use the new -fassume-zero-terminated-char-arrays > >> option, but that is what we agreed upon. > >> > >> The patch is not dependent on any other patches. > >> > >> > >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > >> Is it OK for trunk? > > > > + tree base = arg; > > + while (TREE_CODE (base) == ARRAY_REF > > + || TREE_CODE (base) == ARRAY_RANGE_REF > > + || TREE_CODE (base) == COMPONENT_REF) > > + base = TREE_OPERAND (base, 0); > > + > > + /* If this looks like a type cast don't assume anything. */ > > + if ((TREE_CODE (base) == MEM_REF > > + && (! integer_zerop (TREE_OPERAND (base, 1)) > > + || TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, > > 0)))) > > + != TYPE_MAIN_VARIANT (TREE_TYPE (base)))) > > > > I'm not convinced you are testing anything useful here. > > TREE_OPERAND (base, 1) might be a pointer which means it's type > > doesn't have any semantics so you are testing the access type > > against "random". If you'd restrict this to ADDR_EXPRs and > > look at the objects declared type then you'd still miss > > type-changes from a dynamic type that is different from what > > is declared. > > > > This whole function is only used for warnings or if the test case > asks for unsafe optimization via -ffassume-zero-terminated-char-arrays. > > So yes, it is understood, but it has proven to be an oracle with > 99.9% likelihood to give the right answer. > > > So my conclusion is if you really want to not want to return > > arg for things that look like a type cast then you have to > > unconditionally return NULL_TREE. > > Yes. :-) > > > > > + || TREE_CODE (base) == VIEW_CONVERT_EXPR > > + /* Or other stuff that would be handled by get_inner_reference. */ > > > > simply use || handled_component_p (base) for the above and the rest > > to be sure to handle everything that is not stripped above. > > > > + || TREE_CODE (base) == BIT_FIELD_REF > > + || TREE_CODE (base) == REALPART_EXPR > > + || TREE_CODE (base) == IMAGPART_EXPR) > > + return NULL_TREE; > > > > Yes, good point. > > > Btw, you are always returning the passed arg or NULL_TREE so > > formulating this as a predicate function makes uses easier. > > Not sure why it is called "inner" char array? > > > ; > > Yes, in a previous version of this patch, this function actually > walked towards the innermost array, and returned that, but I dropped > that part, as it caused too many test cases regress. > > So agreed, I think I will convert that to a _p function and think > of a better name. > > > > There do seem to be independently useful fixes in the patch that > > I'd approve immediately. > > > > Yes, I found some peanuts on my way. > > For instance this fix for PR middle-end/86121 survives bootstrap on > it's own, and fixes one xfail. > > Is it OK for trunk? Yes, that's OK for trunk. > > Btw, I don't think we want sth like > > flag_assume_zero_terminated_char_arrays or even make it default at > > -Ofast. > > > > Yes, I agree. Is there a consensus about this? Well, it's my own opinion of course. Show me a benchmark that improves with -fassume-zero-terminated-char-arrays. Certainly for security reasons it sounds a dangerous thing (and the documentation needs a more thorough description of what it really means). > If yes, I go ahead and remove that option again. > > BTW: I needed this option in a few test cases, that insist in checking the > optimizer to eliminate stuff, based on the VRP info. (6 +/-1 or so). Any example? > But we can as well remove those test cases. > > Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-21 8:46 ` Richard Biener @ 2018-08-21 22:25 ` Jeff Law 2018-08-22 4:05 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-08-21 22:25 UTC (permalink / raw) To: Richard Biener, Bernd Edlinger; +Cc: Martin Sebor, GCC Patches, Jakub Jelinek On 08/21/2018 02:43 AM, Richard Biener wrote: > On Mon, 20 Aug 2018, Bernd Edlinger wrote: [ snip. ] >> Yes, I found some peanuts on my way. >> >> For instance this fix for PR middle-end/86121 survives bootstrap on >> it's own, and fixes one xfail. >> >> Is it OK for trunk? > > Yes, that's OK for trunk. Agreed. Seems like a nice independent bugfix and I don't think it adversely affects anything else under current discussion. In fact, not folding here makes it easier to warn about incorrect code elsewhere. > >>> Btw, I don't think we want sth like >>> flag_assume_zero_terminated_char_arrays or even make it default at >>> -Ofast. >>> >> >> Yes, I agree. Is there a consensus about this? > > Well, it's my own opinion of course. Show me a benchmark that > improves with -fassume-zero-terminated-char-arrays. Certainly > for security reasons it sounds a dangerous thing (and the documentation > needs a more thorough description of what it really means). I certainly don't want to see a flag. We've already got way too many; adding another for marginal behavior just seems wrong. > >> If yes, I go ahead and remove that option again. >> >> BTW: I needed this option in a few test cases, that insist in checking the >> optimizer to eliminate stuff, based on the VRP info. (6 +/-1 or so). > > Any example? > >> But we can as well remove those test cases. Bernd, if there are specific tests that you want to see removed, we should discuss them. I think we can all agree that if a test depends on C semantics rather than GIMPLE semantics for optimization/codegen, then removing it seems like the right thing to do as the test is just wrong for GCC. If the test is meant to issue a warning or avoid a false positive, then we should defer -- I'd like to not lose the warnings if at all possible. A test which verifies correct optimization seems like it should be discussed. I'd be more inclined to xfail those rather than remove them completely -- particularly for a test which isn't a good indicator of the real world code typically seen by gcc. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-21 22:25 ` Jeff Law @ 2018-08-22 4:05 ` Bernd Edlinger 2018-08-22 16:05 ` Martin Sebor 0 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-22 4:05 UTC (permalink / raw) To: Jeff Law, Richard Biener; +Cc: Martin Sebor, GCC Patches, Jakub Jelinek On 08/22/18 00:25, Jeff Law wrote: > On 08/21/2018 02:43 AM, Richard Biener wrote: >> On Mon, 20 Aug 2018, Bernd Edlinger wrote: > [ snip. ] > >>> Yes, I found some peanuts on my way. >>> >>> For instance this fix for PR middle-end/86121 survives bootstrap on >>> it's own, and fixes one xfail. >>> >>> Is it OK for trunk? >> >> Yes, that's OK for trunk. > Agreed. Seems like a nice independent bugfix and I don't think it > adversely affects anything else under current discussion. In fact, not > folding here makes it easier to warn about incorrect code elsewhere. > >> >>>> Btw, I don't think we want sth like >>>> flag_assume_zero_terminated_char_arrays or even make it default at >>>> -Ofast. >>>> >>> >>> Yes, I agree. Is there a consensus about this? >> >> Well, it's my own opinion of course. Show me a benchmark that >> improves with -fassume-zero-terminated-char-arrays. Certainly >> for security reasons it sounds a dangerous thing (and the documentation >> needs a more thorough description of what it really means). > I certainly don't want to see a flag. We've already got way too many; > adding another for marginal behavior just seems wrong. > >> >>> If yes, I go ahead and remove that option again. >>> >>> BTW: I needed this option in a few test cases, that insist in checking the >>> optimizer to eliminate stuff, based on the VRP info. (6 +/-1 or so). >> >> Any example? >> >>> But we can as well remove those test cases. > Bernd, if there are specific tests that you want to see removed, we > should discuss them. > The test cases are: gcc.dg/strlenopt-36.c gcc.dg/strlenopt-40.c gcc.dg/strlenopt-45.c gcc.dg/strlenopt-48.c gcc.dg/strlenopt-51.c I see no way how to fix those, as they test the information flow from the get_range_string to VRP info, which has to go away. For the developing this patch it was fine to tweak the test cases with the compiler flag, but I'd prefer to get rid of them. There is one test that tests a warning, gcc.dg/pr83373.c. I used the flag there, but could as well have simply xfailed that: size_t len = __builtin_strlen (src); if (len < size) __builtin_memcpy (dst, src, len + 1); else { __builtin_memcpy (dst, src, size - 1); /* { dg-bogus "\\\[-Wstringop-oveflow]" } */ dst[size - 1] = '\0'; } I have not fully debugged that, but believe the test case works because unsafe range infos are used to eliminate the memcpy call before it is diagnosed. > I think we can all agree that if a test depends on C semantics rather > than GIMPLE semantics for optimization/codegen, then removing it seems > like the right thing to do as the test is just wrong for GCC. > > If the test is meant to issue a warning or avoid a false positive, then > we should defer -- I'd like to not lose the warnings if at all possible. > yes, this patch is looking pretty healthy right now. except the one regression above, there are no further regressions on the warnings. Just the code generation does change. There are however still more correctness issues: There is still an information flow from get_range_strlen -> sprintf return code -> VRP. And generally, even if using the range info from the sprintf function is fine from the posix spec, I still have a bad feeling about that, because "sprintf" is a rather complex piece of software that can even have bugs or implementation details, I think of non-posix environments like linux or ecos which come with an own implementation of sprintf and those may have subtle differences, or simply bugs, but this optimization takes those the right away to return an error code. > A test which verifies correct optimization seems like it should be > discussed. I'd be more inclined to xfail those rather than remove them > completely -- particularly for a test which isn't a good indicator of > the real world code typically seen by gcc. > xfail would work for me, but I doubt we will ever be able to fix a test case that does so many different things at the same time, and checks just if all was folded or nothing. Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-22 4:05 ` Bernd Edlinger @ 2018-08-22 16:05 ` Martin Sebor 2018-08-22 17:22 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-22 16:05 UTC (permalink / raw) To: Bernd Edlinger, Jeff Law, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 08/21/2018 10:05 PM, Bernd Edlinger wrote: > On 08/22/18 00:25, Jeff Law wrote: >> On 08/21/2018 02:43 AM, Richard Biener wrote: >>> On Mon, 20 Aug 2018, Bernd Edlinger wrote: >> [ snip. ] >> >>>> Yes, I found some peanuts on my way. >>>> >>>> For instance this fix for PR middle-end/86121 survives bootstrap on >>>> it's own, and fixes one xfail. >>>> >>>> Is it OK for trunk? >>> >>> Yes, that's OK for trunk. >> Agreed. Seems like a nice independent bugfix and I don't think it >> adversely affects anything else under current discussion. In fact, not >> folding here makes it easier to warn about incorrect code elsewhere. >> >>> >>>>> Btw, I don't think we want sth like >>>>> flag_assume_zero_terminated_char_arrays or even make it default at >>>>> -Ofast. >>>>> >>>> >>>> Yes, I agree. Is there a consensus about this? >>> >>> Well, it's my own opinion of course. Show me a benchmark that >>> improves with -fassume-zero-terminated-char-arrays. Certainly >>> for security reasons it sounds a dangerous thing (and the documentation >>> needs a more thorough description of what it really means). >> I certainly don't want to see a flag. We've already got way too many; >> adding another for marginal behavior just seems wrong. >> >>> >>>> If yes, I go ahead and remove that option again. >>>> >>>> BTW: I needed this option in a few test cases, that insist in checking the >>>> optimizer to eliminate stuff, based on the VRP info. (6 +/-1 or so). >>> >>> Any example? >>> >>>> But we can as well remove those test cases. >> Bernd, if there are specific tests that you want to see removed, we >> should discuss them. >> > > The test cases are: > gcc.dg/strlenopt-36.c There are plenty of valid test cases in this test. For example: extern char a7[7]; if (strlen (a7) >= 7) // fold to false abort (); Even if we wanted to accommodate common definitions the array declarations could be changed to static and the tests would be useful: static char a7[7]; There is no valid program where the if condition could be true. > gcc.dg/strlenopt-40.c There are even more completely uncontroversial test cases here, such as: if (strlen (i < 0 ? "123" : "4321") > 4) // fold to false abort (); > gcc.dg/strlenopt-45.c Even more here. extern char c; if (strnlen (&c, 0) > 0) // fold to false abort (); if (strnlen (&c, 9) > 0) // likewise abort (); > gcc.dg/strlenopt-48.c > gcc.dg/strlenopt-51.c All the test cases here include constant character arrays of known length. I see nothing controversial about any of them. > > I see no way how to fix those, as they test the information flow > from the get_range_string to VRP info, which has to go away. > > For the developing this patch it was fine to tweak the test cases > with the compiler flag, but I'd prefer to get rid of them. > > There is one test that tests a warning, gcc.dg/pr83373.c. > I used the flag there, but could as well have simply xfailed that: "Simply xfailing" tests for warnings would be inappropriate: it would cause regressions and make the reporters unhappy. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-22 16:05 ` Martin Sebor @ 2018-08-22 17:22 ` Bernd Edlinger 2018-08-22 22:34 ` Jeff Law 0 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-22 17:22 UTC (permalink / raw) To: Martin Sebor, Jeff Law, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 08/22/18 18:05, Martin Sebor wrote: > On 08/21/2018 10:05 PM, Bernd Edlinger wrote: >> On 08/22/18 00:25, Jeff Law wrote: >>> On 08/21/2018 02:43 AM, Richard Biener wrote: >>>> On Mon, 20 Aug 2018, Bernd Edlinger wrote: >>> [ snip. ] >>> >>>>> Yes, I found some peanuts on my way. >>>>> >>>>> For instance this fix for PR middle-end/86121 survives bootstrap on >>>>> it's own, and fixes one xfail. >>>>> >>>>> Is it OK for trunk? >>>> >>>> Yes, that's OK for trunk. >>> Agreed. Seems like a nice independent bugfix and I don't think it >>> adversely affects anything else under current discussion. In fact, not >>> folding here makes it easier to warn about incorrect code elsewhere. >>> >>>> >>>>>> Btw, I don't think we want sth like >>>>>> flag_assume_zero_terminated_char_arrays or even make it default at >>>>>> -Ofast. >>>>>> >>>>> >>>>> Yes, I agree. Is there a consensus about this? >>>> >>>> Well, it's my own opinion of course. Show me a benchmark that >>>> improves with -fassume-zero-terminated-char-arrays. Certainly >>>> for security reasons it sounds a dangerous thing (and the documentation >>>> needs a more thorough description of what it really means). >>> I certainly don't want to see a flag. We've already got way too many; >>> adding another for marginal behavior just seems wrong. >>> >>>> >>>>> If yes, I go ahead and remove that option again. >>>>> >>>>> BTW: I needed this option in a few test cases, that insist in checking the >>>>> optimizer to eliminate stuff, based on the VRP info. (6 +/-1 or so). >>>> >>>> Any example? >>>> >>>>> But we can as well remove those test cases. >>> Bernd, if there are specific tests that you want to see removed, we >>> should discuss them. >>> >> >> The test cases are: >> gcc.dg/strlenopt-36.c > > There are plenty of valid test cases in this test. For example: > > extern char a7[7]; > if (strlen (a7) >= 7) // fold to false > abort (); > > Even if we wanted to accommodate common definitions the array > declarations could be changed to static and the tests would > be useful: > > static char a7[7]; > > There is no valid program where the if condition could be true. > >> gcc.dg/strlenopt-40.c > > There are even more completely uncontroversial test cases here, > such as: > > if (strlen (i < 0 ? "123" : "4321") > 4) // fold to false > abort (); > I see, the trouble is that the test case mixes valid cases with cases that depend on type info in GIMPLE. >> gcc.dg/strlenopt-45.c > > Even more here. > > extern char c; > if (strnlen (&c, 0) > 0) // fold to false > abort (); > if (strnlen (&c, 9) > 0) // likewise > abort (); > >> gcc.dg/strlenopt-48.c >> gcc.dg/strlenopt-51.c > > All the test cases here include constant character arrays of > known length. I see nothing controversial about any of them. > Ah, sorry, a mistake in my changelog entry. The strlenopt-51.c test case does not need the flag. I just changed this: diff -Npur gcc/testsuite/gcc.dg/strlenopt-51.c gcc/testsuite/gcc.dg/strlenopt-51.c --- gcc/testsuite/gcc.dg/strlenopt-51.c 2018-08-19 17:11:34.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-51.c 2018-08-22 09:04:53.768302320 +0200 @@ -101,7 +101,7 @@ void test_keep_a9_9 (int i) { #undef T #define T(I) \ - KEEP (strlen (&a9_9[i][I][0]) > (1 + I) % 9); \ + KEEP (strlen (&a9_9[i][I][0]) > (0 + I) % 9); \ KEEP (strlen (&a9_9[i][I][1]) > (1 + I) % 9); \ KEEP (strlen (&a9_9[i][I][2]) > (2 + I) % 9); \ KEEP (strlen (&a9_9[i][I][3]) > (3 + I) % 9); \ @@ -115,7 +115,7 @@ void test_keep_a9_9 (int i) } /* { dg-final { scan-tree-dump-times "strlen" 72 "gimple" } } - { dg-final { scan-tree-dump-times "strlen" 63 "optimized" } } + { dg-final { scan-tree-dump-times "strlen" 72 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 72 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } */ ... which looks like the patch fixed something here, although I cannot tell what. >> >> I see no way how to fix those, as they test the information flow >> from the get_range_string to VRP info, which has to go away. >> >> For the developing this patch it was fine to tweak the test cases >> with the compiler flag, but I'd prefer to get rid of them. >> >> There is one test that tests a warning, gcc.dg/pr83373.c. >> I used the flag there, but could as well have simply xfailed that: > > "Simply xfailing" tests for warnings would be inappropriate: > it would cause regressions and make the reporters unhappy. > Okay, but that is just one single test case. Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-22 17:22 ` Bernd Edlinger @ 2018-08-22 22:34 ` Jeff Law 2018-08-22 22:57 ` Bernd Edlinger 2018-08-22 22:57 ` Martin Sebor 0 siblings, 2 replies; 121+ messages in thread From: Jeff Law @ 2018-08-22 22:34 UTC (permalink / raw) To: Bernd Edlinger, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 08/22/2018 11:22 AM, Bernd Edlinger wrote: > On 08/22/18 18:05, Martin Sebor wrote: >> On 08/21/2018 10:05 PM, Bernd Edlinger wrote: >>> On 08/22/18 00:25, Jeff Law wrote: >>>> On 08/21/2018 02:43 AM, Richard Biener wrote: >>>>> On Mon, 20 Aug 2018, Bernd Edlinger wrote: >>>> [ snip. ] >>>> >>>>>> Yes, I found some peanuts on my way. >>>>>> >>>>>> For instance this fix for PR middle-end/86121 survives bootstrap on >>>>>> it's own, and fixes one xfail. >>>>>> >>>>>> Is it OK for trunk? >>>>> >>>>> Yes, that's OK for trunk. >>>> Agreed. Seems like a nice independent bugfix and I don't think it >>>> adversely affects anything else under current discussion. In fact, not >>>> folding here makes it easier to warn about incorrect code elsewhere. >>>> >>>>> >>>>>>> Btw, I don't think we want sth like >>>>>>> flag_assume_zero_terminated_char_arrays or even make it default at >>>>>>> -Ofast. >>>>>>> >>>>>> >>>>>> Yes, I agree. Is there a consensus about this? >>>>> >>>>> Well, it's my own opinion of course. Show me a benchmark that >>>>> improves with -fassume-zero-terminated-char-arrays. Certainly >>>>> for security reasons it sounds a dangerous thing (and the documentation >>>>> needs a more thorough description of what it really means). >>>> I certainly don't want to see a flag. We've already got way too many; >>>> adding another for marginal behavior just seems wrong. >>>> >>>>> >>>>>> If yes, I go ahead and remove that option again. >>>>>> >>>>>> BTW: I needed this option in a few test cases, that insist in checking the >>>>>> optimizer to eliminate stuff, based on the VRP info. (6 +/-1 or so). >>>>> >>>>> Any example? >>>>> >>>>>> But we can as well remove those test cases. >>>> Bernd, if there are specific tests that you want to see removed, we >>>> should discuss them. >>>> >>> >>> The test cases are: >>> gcc.dg/strlenopt-36.c >> >> There are plenty of valid test cases in this test. For example: >> >>  extern char a7[7]; >>  if (strlen (a7) >= 7)  // fold to false >>    abort (); >> >> Even if we wanted to accommodate common definitions the array >> declarations could be changed to static and the tests would >> be useful: >> >>  static char a7[7]; >> >> There is no valid program where the if condition could be true. >> >>> gcc.dg/strlenopt-40.c >> >> There are even more completely uncontroversial test cases here, >> such as: >> >>  if (strlen (i < 0 ? "123" : "4321") > 4)  // fold to false >>    abort (); >> > > I see, the trouble is that the test case mixes valid cases with > cases that depend on type info in GIMPLE. I believe Martin's point is that there are tests within those files that are still valid. We don't want to zap the entire test unless all the subtests are invalid. We need to look at each sub-test and determine if it's valid or not. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-22 22:34 ` Jeff Law @ 2018-08-22 22:57 ` Bernd Edlinger 2018-08-22 22:57 ` Martin Sebor 1 sibling, 0 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-08-22 22:57 UTC (permalink / raw) To: Jeff Law, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 08/23/18 00:34, Jeff Law wrote: > On 08/22/2018 11:22 AM, Bernd Edlinger wrote: >> On 08/22/18 18:05, Martin Sebor wrote: >>> On 08/21/2018 10:05 PM, Bernd Edlinger wrote: >>>> On 08/22/18 00:25, Jeff Law wrote: >>>>> On 08/21/2018 02:43 AM, Richard Biener wrote: >>>>>> On Mon, 20 Aug 2018, Bernd Edlinger wrote: >>>>> [ snip. ] >>>>> >>>>>>> Yes, I found some peanuts on my way. >>>>>>> >>>>>>> For instance this fix for PR middle-end/86121 survives bootstrap on >>>>>>> it's own, and fixes one xfail. >>>>>>> >>>>>>> Is it OK for trunk? >>>>>> >>>>>> Yes, that's OK for trunk. >>>>> Agreed. Seems like a nice independent bugfix and I don't think it >>>>> adversely affects anything else under current discussion. In fact, not >>>>> folding here makes it easier to warn about incorrect code elsewhere. >>>>> >>>>>> >>>>>>>> Btw, I don't think we want sth like >>>>>>>> flag_assume_zero_terminated_char_arrays or even make it default at >>>>>>>> -Ofast. >>>>>>>> >>>>>>> >>>>>>> Yes, I agree. Is there a consensus about this? >>>>>> >>>>>> Well, it's my own opinion of course. Show me a benchmark that >>>>>> improves with -fassume-zero-terminated-char-arrays. Certainly >>>>>> for security reasons it sounds a dangerous thing (and the documentation >>>>>> needs a more thorough description of what it really means). >>>>> I certainly don't want to see a flag. We've already got way too many; >>>>> adding another for marginal behavior just seems wrong. >>>>> >>>>>> >>>>>>> If yes, I go ahead and remove that option again. >>>>>>> >>>>>>> BTW: I needed this option in a few test cases, that insist in checking the >>>>>>> optimizer to eliminate stuff, based on the VRP info. (6 +/-1 or so). >>>>>> >>>>>> Any example? >>>>>> >>>>>>> But we can as well remove those test cases. >>>>> Bernd, if there are specific tests that you want to see removed, we >>>>> should discuss them. >>>>> >>>> >>>> The test cases are: >>>> gcc.dg/strlenopt-36.c >>> >>> There are plenty of valid test cases in this test. For example: >>> >>> extern char a7[7]; >>> if (strlen (a7) >= 7) // fold to false >>> abort (); >>> >>> Even if we wanted to accommodate common definitions the array >>> declarations could be changed to static and the tests would >>> be useful: >>> >>> static char a7[7]; >>> >>> There is no valid program where the if condition could be true. >>> >>>> gcc.dg/strlenopt-40.c >>> >>> There are even more completely uncontroversial test cases here, >>> such as: >>> >>> if (strlen (i < 0 ? "123" : "4321") > 4) // fold to false >>> abort (); >>> >> >> I see, the trouble is that the test case mixes valid cases with >> cases that depend on type info in GIMPLE. > I believe Martin's point is that there are tests within those files that > are still valid. We don't want to zap the entire test unless all the > subtests are invalid. We need to look at each sub-test and determine if > it's valid or not. > I can try to just keep test cases like the above, and delete what doen't work, especially in this test case that seems doable. Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-22 22:34 ` Jeff Law 2018-08-22 22:57 ` Bernd Edlinger @ 2018-08-22 22:57 ` Martin Sebor 2018-08-22 23:08 ` Bernd Edlinger 1 sibling, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-22 22:57 UTC (permalink / raw) To: Jeff Law, Bernd Edlinger, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 08/22/2018 04:34 PM, Jeff Law wrote: > On 08/22/2018 11:22 AM, Bernd Edlinger wrote: >> On 08/22/18 18:05, Martin Sebor wrote: >>> On 08/21/2018 10:05 PM, Bernd Edlinger wrote: >>>> On 08/22/18 00:25, Jeff Law wrote: >>>>> On 08/21/2018 02:43 AM, Richard Biener wrote: >>>>>> On Mon, 20 Aug 2018, Bernd Edlinger wrote: >>>>> [ snip. ] >>>>> >>>>>>> Yes, I found some peanuts on my way. >>>>>>> >>>>>>> For instance this fix for PR middle-end/86121 survives bootstrap on >>>>>>> it's own, and fixes one xfail. >>>>>>> >>>>>>> Is it OK for trunk? >>>>>> >>>>>> Yes, that's OK for trunk. >>>>> Agreed. Seems like a nice independent bugfix and I don't think it >>>>> adversely affects anything else under current discussion. In fact, not >>>>> folding here makes it easier to warn about incorrect code elsewhere. >>>>> >>>>>> >>>>>>>> Btw, I don't think we want sth like >>>>>>>> flag_assume_zero_terminated_char_arrays or even make it default at >>>>>>>> -Ofast. >>>>>>>> >>>>>>> >>>>>>> Yes, I agree. Is there a consensus about this? >>>>>> >>>>>> Well, it's my own opinion of course. Show me a benchmark that >>>>>> improves with -fassume-zero-terminated-char-arrays. Certainly >>>>>> for security reasons it sounds a dangerous thing (and the documentation >>>>>> needs a more thorough description of what it really means). >>>>> I certainly don't want to see a flag. We've already got way too many; >>>>> adding another for marginal behavior just seems wrong. >>>>> >>>>>> >>>>>>> If yes, I go ahead and remove that option again. >>>>>>> >>>>>>> BTW: I needed this option in a few test cases, that insist in checking the >>>>>>> optimizer to eliminate stuff, based on the VRP info. (6 +/-1 or so). >>>>>> >>>>>> Any example? >>>>>> >>>>>>> But we can as well remove those test cases. >>>>> Bernd, if there are specific tests that you want to see removed, we >>>>> should discuss them. >>>>> >>>> >>>> The test cases are: >>>> gcc.dg/strlenopt-36.c >>> >>> There are plenty of valid test cases in this test. For example: >>> >>> extern char a7[7]; >>> if (strlen (a7) >= 7) // fold to false >>> abort (); >>> >>> Even if we wanted to accommodate common definitions the array >>> declarations could be changed to static and the tests would >>> be useful: >>> >>> static char a7[7]; >>> >>> There is no valid program where the if condition could be true. >>> >>>> gcc.dg/strlenopt-40.c >>> >>> There are even more completely uncontroversial test cases here, >>> such as: >>> >>> if (strlen (i < 0 ? "123" : "4321") > 4) // fold to false >>> abort (); >>> >> >> I see, the trouble is that the test case mixes valid cases with >> cases that depend on type info in GIMPLE. > I believe Martin's point is that there are tests within those files that > are still valid. We don't want to zap the entire test unless all the > subtests are invalid. We need to look at each sub-test and determine if > it's valid or not. Right. And if these changes extend to sprintf as I expect will be the case there will be many more adjustments to make to those tests. Those tests are quite delicate. As I think Jeff already implied, I would really prefer to tackle this work myself, both to make sure that it's done without compromising existing warnings, and that the future enhancements we have planned in these areas are made possible without too much churn. I expect to be able to get to it after I get back from Cauldron. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-22 22:57 ` Martin Sebor @ 2018-08-22 23:08 ` Bernd Edlinger 0 siblings, 0 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-08-22 23:08 UTC (permalink / raw) To: Martin Sebor, Jeff Law, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 08/23/18 00:57, Martin Sebor wrote: > On 08/22/2018 04:34 PM, Jeff Law wrote: >> On 08/22/2018 11:22 AM, Bernd Edlinger wrote: >>> On 08/22/18 18:05, Martin Sebor wrote: >>>> On 08/21/2018 10:05 PM, Bernd Edlinger wrote: >>>>> On 08/22/18 00:25, Jeff Law wrote: >>>>>> On 08/21/2018 02:43 AM, Richard Biener wrote: >>>>>>> On Mon, 20 Aug 2018, Bernd Edlinger wrote: >>>>>> [ snip. ] >>>>>> >>>>>>>> Yes, I found some peanuts on my way. >>>>>>>> >>>>>>>> For instance this fix for PR middle-end/86121 survives bootstrap on >>>>>>>> it's own, and fixes one xfail. >>>>>>>> >>>>>>>> Is it OK for trunk? >>>>>>> >>>>>>> Yes, that's OK for trunk. >>>>>> Agreed. Seems like a nice independent bugfix and I don't think it >>>>>> adversely affects anything else under current discussion. In fact, not >>>>>> folding here makes it easier to warn about incorrect code elsewhere. >>>>>> >>>>>>> >>>>>>>>> Btw, I don't think we want sth like >>>>>>>>> flag_assume_zero_terminated_char_arrays or even make it default at >>>>>>>>> -Ofast. >>>>>>>>> >>>>>>>> >>>>>>>> Yes, I agree. Is there a consensus about this? >>>>>>> >>>>>>> Well, it's my own opinion of course. Show me a benchmark that >>>>>>> improves with -fassume-zero-terminated-char-arrays. Certainly >>>>>>> for security reasons it sounds a dangerous thing (and the documentation >>>>>>> needs a more thorough description of what it really means). >>>>>> I certainly don't want to see a flag. We've already got way too many; >>>>>> adding another for marginal behavior just seems wrong. >>>>>> >>>>>>> >>>>>>>> If yes, I go ahead and remove that option again. >>>>>>>> >>>>>>>> BTW: I needed this option in a few test cases, that insist in checking the >>>>>>>> optimizer to eliminate stuff, based on the VRP info. (6 +/-1 or so). >>>>>>> >>>>>>> Any example? >>>>>>> >>>>>>>> But we can as well remove those test cases. >>>>>> Bernd, if there are specific tests that you want to see removed, we >>>>>> should discuss them. >>>>>> >>>>> >>>>> The test cases are: >>>>> gcc.dg/strlenopt-36.c >>>> >>>> There are plenty of valid test cases in this test. For example: >>>> >>>> extern char a7[7]; >>>> if (strlen (a7) >= 7) // fold to false >>>> abort (); >>>> >>>> Even if we wanted to accommodate common definitions the array >>>> declarations could be changed to static and the tests would >>>> be useful: >>>> >>>> static char a7[7]; >>>> >>>> There is no valid program where the if condition could be true. >>>> >>>>> gcc.dg/strlenopt-40.c >>>> >>>> There are even more completely uncontroversial test cases here, >>>> such as: >>>> >>>> if (strlen (i < 0 ? "123" : "4321") > 4) // fold to false >>>> abort (); >>>> >>> >>> I see, the trouble is that the test case mixes valid cases with >>> cases that depend on type info in GIMPLE. >> I believe Martin's point is that there are tests within those files that >> are still valid. We don't want to zap the entire test unless all the >> subtests are invalid. We need to look at each sub-test and determine if >> it's valid or not. > > Right. And if these changes extend to sprintf as I expect will > be the case there will be many more adjustments to make to those > tests. Those tests are quite delicate. > None of these test cases are affected. > As I think Jeff already implied, I would really prefer to tackle > this work myself, both to make sure that it's done without > compromising existing warnings, and that the future enhancements > we have planned in these areas are made possible without too much > churn. > Sure, but it is funny, how this patch does not change a single sprintf warning test case. (a previous version did, but that was fixed). > I expect to be able to get to it after I get back from Cauldron. > > Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-19 15:55 ` Bernd Edlinger 2018-08-20 10:24 ` Richard Biener @ 2018-08-21 22:43 ` Jeff Law 2018-08-22 4:16 ` Bernd Edlinger ` (2 more replies) 1 sibling, 3 replies; 121+ messages in thread From: Jeff Law @ 2018-08-21 22:43 UTC (permalink / raw) To: Bernd Edlinger, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek [ I'm still digesting, but saw something in this that ought to be broken out... ] On 08/19/2018 09:55 AM, Bernd Edlinger wrote: > diff -Npur gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c > --- gcc/tree-ssa-dse.c 2018-07-18 21:21:34.000000000 +0200 > +++ gcc/tree-ssa-dse.c 2018-08-19 14:29:32.344498771 +0200 > @@ -248,6 +248,12 @@ compute_trims (ao_ref *ref, sbitmap live > residual handling in mem* and str* functions is usually > reasonably efficient. */ > *trim_tail = last_orig - last_live; > + /* Don't fold away an out of bounds access, as this defeats proper > + warnings. */ > + if (*trim_tail > + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (ref->base)), > + last_orig) <= 0) > + *trim_tail = 0; > } > else > *trim_tail = 0; This seems like a good change in and of itself and should be able to go forward without further review work. Consider this hunk approved, along with any testsuite you have which tickles this code (I didn't immediately see one attached to this patch. But I could have missed it). Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-21 22:43 ` Jeff Law @ 2018-08-22 4:16 ` Bernd Edlinger 2018-08-22 23:41 ` Jeff Law 2018-08-22 13:10 ` Bernd Edlinger 2018-10-24 9:14 ` Maxim Kuvyrkov 2 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-22 4:16 UTC (permalink / raw) To: Jeff Law, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 08/22/18 00:43, Jeff Law wrote: > [ I'm still digesting, but saw something in this that ought to be broken > out... ] > > On 08/19/2018 09:55 AM, Bernd Edlinger wrote: >> diff -Npur gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c >> --- gcc/tree-ssa-dse.c 2018-07-18 21:21:34.000000000 +0200 >> +++ gcc/tree-ssa-dse.c 2018-08-19 14:29:32.344498771 +0200 >> @@ -248,6 +248,12 @@ compute_trims (ao_ref *ref, sbitmap live >> residual handling in mem* and str* functions is usually >> reasonably efficient. */ >> *trim_tail = last_orig - last_live; >> + /* Don't fold away an out of bounds access, as this defeats proper >> + warnings. */ >> + if (*trim_tail >> + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (ref->base)), >> + last_orig) <= 0) >> + *trim_tail = 0; >> } >> else >> *trim_tail = 0; > This seems like a good change in and of itself and should be able to go > forward without further review work. Consider this hunk approved, > along with any testsuite you have which tickles this code (I didn't > immediately see one attached to this patch. But I could have missed it). > Sorry, for not being clear on this. I needed both hunks "Don't fold away an out of bounds access, as this defeats proper warnings" to prevent a regression on gcc.dg/Wstringop-overflow-5.c, and surprise surprise, the xfail in gcc.dg/Wstringop-overflow-6.c suddenly popped up. So without the unsafe range info, gcc.dg/Wstringop-overflow-5.c needs both hunks to not regress, but gcc.dg/Wstringop-overflow-6.c only needs the other one I committed yesterday. So unfortunately I have no test case except gcc.dg/Wstringop-overflow-5.c for that. Still OK? Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-22 4:16 ` Bernd Edlinger @ 2018-08-22 23:41 ` Jeff Law 2018-08-26 9:58 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-08-22 23:41 UTC (permalink / raw) To: Bernd Edlinger, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 08/21/2018 10:15 PM, Bernd Edlinger wrote: > On 08/22/18 00:43, Jeff Law wrote: >> [ I'm still digesting, but saw something in this that ought to be broken >> out... ] >> >> On 08/19/2018 09:55 AM, Bernd Edlinger wrote: >>> diff -Npur gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c >>> --- gcc/tree-ssa-dse.c 2018-07-18 21:21:34.000000000 +0200 >>> +++ gcc/tree-ssa-dse.c 2018-08-19 14:29:32.344498771 +0200 >>> @@ -248,6 +248,12 @@ compute_trims (ao_ref *ref, sbitmap live >>> residual handling in mem* and str* functions is usually >>> reasonably efficient. */ >>> *trim_tail = last_orig - last_live; >>> + /* Don't fold away an out of bounds access, as this defeats proper >>> + warnings. */ >>> + if (*trim_tail >>> + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (ref->base)), >>> + last_orig) <= 0) >>> + *trim_tail = 0; >>> } >>> else >>> *trim_tail = 0; >> This seems like a good change in and of itself and should be able to go >> forward without further review work. Consider this hunk approved, >> along with any testsuite you have which tickles this code (I didn't >> immediately see one attached to this patch. But I could have missed it). >> > > Sorry, for not being clear on this. > > I needed both hunks "Don't fold away an out of bounds access, as this defeats proper > warnings" to prevent a regression on gcc.dg/Wstringop-overflow-5.c, > and surprise surprise, the xfail in gcc.dg/Wstringop-overflow-6.c suddenly popped up. > > So without the unsafe range info, gcc.dg/Wstringop-overflow-5.c needs both hunks > to not regress, but gcc.dg/Wstringop-overflow-6.c only needs the other one I committed > yesterday. > > So unfortunately I have no test case except gcc.dg/Wstringop-overflow-5.c for that. > > > Still OK? I almost had a WTF moment mis-parsing Wstringop-overflow-5 thinking it had no dead stores. But it's full of dead stores :-) It sounds like the testsuite will tickle it when the right set of patches are applied. THere is no distinct test right now. I went ahead and bootstrapped/regression tested the DSE change alone and it doesn't trigger any regressions. I'm going to install it -- I think consensus has formed around not folding an out of bounds access so that we can detect the problem and issue a suitable warning. One less thing to keep track of :-) jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-22 23:41 ` Jeff Law @ 2018-08-26 9:58 ` Bernd Edlinger 2018-09-15 9:22 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-26 9:58 UTC (permalink / raw) To: Jeff Law, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 2738 bytes --] Hi, this is an update on my strlen range patch (V6). Again re-based and retested to current trunk. It finally removes the -ffassume-zero-terminated-char-arrays flag. And is more careful to preserve existing strlen optimization tests. I did not see the need to change the interface of get_range_string yet, as this would make the patch much larger, and might as well be done in a follow-up patch. I might suggest to rename one of the two get_range_strlen functions at the same time as it is rather confusing to have to count the parameters in order to tell which function is meant. This hunk in tree-ssa-strlen.c really needs some explanations: @@ -3192,7 +3156,13 @@ get_min_string_length (tree rhs, bool *f && TREE_READONLY (rhs)) rhs = DECL_INITIAL (rhs); - if (rhs && TREE_CODE (rhs) == STRING_CST) + if (rhs && TREE_CODE (rhs) == STRING_CST + && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (rhs))) + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)), + TREE_STRING_LENGTH (rhs)) >= 0 + && tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (rhs)))) == 1 + && TREE_STRING_LENGTH (rhs) > 0 + && TREE_STRING_POINTER (rhs) [TREE_STRING_LENGTH (rhs) - 1] == '\0') { *full_string_p = true; return strlen (TREE_STRING_POINTER (rhs)); This is responsible for the removed xfail on c-c++-common/attr-nonstring-3.c and is tested in gcc.dg/strlenopt-57.c where an invalid call to strlen is not folded, while in gcc.dg/strlenopt-58.c a valid call to strlen is folded and finally completely removed. This hunk depends a lot on the STRING_CST semantics. In the V1 of the proposed STRING_CST semantics, this part is tautological, and could be removed, or changed to an assertion: + && TREE_STRING_LENGTH (rhs) > 0 + && TREE_STRING_POINTER (rhs) [TREE_STRING_LENGTH (rhs) - 1] == '\0') In the V2 of the proposed STRING_CST semantics, this part is tautlological: + && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (rhs))) + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)), + TREE_STRING_LENGTH (rhs)) >= 0 Currently due the the way the braced initializer to string folding works, there are now initializer strings that are not zero-terminated and some that are. What is interesting to note, is that this is another path where the initializer STRING_CST and literal STRING_CST seem to mix with their different semantics. And the change in string_constant to return initializer elements seems not to be responsible for this confusion here. Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd. [-- Attachment #2: changelog-range-strlen-v6.txt --] [-- Type: text/plain, Size: 909 bytes --] gcc: 2018-08-26 Bernd Edlinger <bernd.edlinger@hotmail.de> * gimple-fold.c (looks_like_a_char_array_without_typecast_p): New helper function for strlen range estimations. (get_range_strlen): Use looks_like_a_char_array_without_typecast_p for warnings, but use GIMPLE semantics otherwise. * tree-ssa-strlen.c (maybe_set_strlen_range): Use GIMPLE semantics. (get_min_string_length): Avoid not NUL terminated string literals. testsuite: 2018-08-26 Bernd Edlinger <bernd.edlinger@hotmail.de> * c-c++-common/attr-nonstring-3.c: Remove xfail. * gcc.dg/pr83373.c: Add xfail. * gcc.dg/strlenopt-36.c: Add xfail. * gcc.dg/strlenopt-40.c: Adjust test expectations. * gcc.dg/strlenopt-45.c: Adjust test expectations. * gcc.dg/strlenopt-48.c: Add xfail. * gcc.dg/strlenopt-51.c: Adjust test expectations. * gcc.dg/strlenopt-57.c: New test. * gcc.dg/strlenopt-58.c: New test. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: patch-range-strlen-v6.diff --] [-- Type: text/x-patch; name="patch-range-strlen-v6.diff", Size: 24764 bytes --] diff -Npur gcc/gimple-fold.c gcc/gimple-fold.c --- gcc/gimple-fold.c 2018-08-23 12:35:17.000000000 +0200 +++ gcc/gimple-fold.c 2018-08-25 21:50:18.354332301 +0200 @@ -1257,6 +1257,40 @@ gimple_fold_builtin_memset (gimple_stmt_ return true; } +/* Determine if a char array is suitable for strlen range estimations. + Return false if ARG is not a char array, or if the inner reference + chain appears to go through a type cast, otherwise return true. + Note that the gimple type informations are not 100% guaranteed + to be accurate, therefore this function shall only be used for + warnings. */ + +static bool +looks_like_a_char_array_without_typecast_p (tree arg) +{ + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (arg)) != ARRAY_TYPE + || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != INTEGER_TYPE + || TYPE_MODE (TREE_TYPE (TREE_TYPE (arg))) != TYPE_MODE (char_type_node) + || TYPE_PRECISION (TREE_TYPE (TREE_TYPE (arg))) + != TYPE_PRECISION (char_type_node)) + return false; + + tree base = arg; + while (TREE_CODE (base) == ARRAY_REF + || TREE_CODE (base) == ARRAY_RANGE_REF + || TREE_CODE (base) == COMPONENT_REF) + base = TREE_OPERAND (base, 0); + + /* If this looks like a type cast don't assume anything. */ + if ((TREE_CODE (base) == MEM_REF + && (! integer_zerop (TREE_OPERAND (base, 1)) + || TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0)))) + != TYPE_MAIN_VARIANT (TREE_TYPE (base)))) + || handled_component_p (base)) + return false; + + return true; +} /* Obtain the minimum and maximum string length or minimum and maximum value of ARG in LENGTH[0] and LENGTH[1], respectively. @@ -1272,6 +1306,7 @@ gimple_fold_builtin_memset (gimple_stmt_ PHIs and COND_EXPRs optimistically, if we can determine string length minimum and maximum, it will use the minimum from the ones where it can be determined. + TYPE == 2 and FUZZY != 0 cannot be used together. Set *FLEXP to true if the range of the string lengths has been obtained from the upper bound of an array at the end of a struct. Such an array may hold a string that's longer than its upper bound @@ -1312,8 +1347,8 @@ get_range_strlen (tree arg, tree length[ member. */ tree idx = TREE_OPERAND (op, 1); - arg = TREE_OPERAND (op, 0); - tree optype = TREE_TYPE (arg); + op = TREE_OPERAND (op, 0); + tree optype = TREE_TYPE (op); if (tree dom = TYPE_DOMAIN (optype)) if (tree bound = TYPE_MAX_VALUE (dom)) if (TREE_CODE (bound) == INTEGER_CST @@ -1339,23 +1374,21 @@ get_range_strlen (tree arg, tree length[ return get_range_strlen (TREE_OPERAND (arg, 0), length, visited, type, fuzzy, flexp, eltsize); + if (eltsize != 1 || fuzzy != 2) + return false; + if (TREE_CODE (arg) == ARRAY_REF) { - tree type = TREE_TYPE (TREE_OPERAND (arg, 0)); - - /* Determine the "innermost" array type. */ - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); - - /* Avoid arrays of pointers. */ - tree eltype = TREE_TYPE (type); - if (TREE_CODE (type) != ARRAY_TYPE - || !INTEGRAL_TYPE_P (eltype)) + if (!looks_like_a_char_array_without_typecast_p (arg)) return false; + tree type = TREE_TYPE (arg); + + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); - if (!val || integer_zerop (val)) + if (!val + || TREE_CODE (val) != INTEGER_CST + || integer_zerop (val)) return false; val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, @@ -1364,15 +1397,16 @@ get_range_strlen (tree arg, tree length[ the array could have zero length. */ *minlen = ssize_int (0); - if (TREE_CODE (TREE_OPERAND (arg, 0)) == COMPONENT_REF - && type == TREE_TYPE (TREE_OPERAND (arg, 0)) - && array_at_struct_end_p (TREE_OPERAND (arg, 0))) + if (TREE_CODE (arg) == COMPONENT_REF + && type == TREE_TYPE (arg) + && array_at_struct_end_p (arg)) *flexp = true; } - else if (TREE_CODE (arg) == COMPONENT_REF - && (TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 1))) - == ARRAY_TYPE)) + else if (TREE_CODE (arg) == COMPONENT_REF) { + if (!looks_like_a_char_array_without_typecast_p (arg)) + return false; + /* Use the type of the member array to determine the upper bound on the length of the array. This may be overly optimistic if the array itself isn't NUL-terminated and @@ -1388,22 +1422,20 @@ get_range_strlen (tree arg, tree length[ tree type = TREE_TYPE (arg); - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); - /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); - if (!val || integer_zerop (val)) + if (!val + || TREE_CODE (val) != INTEGER_CST + || integer_zerop (val)) return false; + val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, integer_one_node); /* Set the minimum size to zero since the string in the array could have zero length. */ *minlen = ssize_int (0); } - - if (VAR_P (arg)) + else if (VAR_P (arg)) { tree type = TREE_TYPE (arg); if (POINTER_TYPE_P (type)) @@ -1411,13 +1443,23 @@ get_range_strlen (tree arg, tree length[ if (TREE_CODE (type) == ARRAY_TYPE) { + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE + || TYPE_MODE (TREE_TYPE (type)) + != TYPE_MODE (char_type_node) + || TYPE_PRECISION (TREE_TYPE (type)) + != TYPE_PRECISION (char_type_node)) + return false; + + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop (val)) return false; - val = wide_int_to_tree (TREE_TYPE (val), - wi::sub (wi::to_wide (val), 1)); + + val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, + integer_one_node); /* Set the minimum size to zero since the string in the array could have zero length. */ *minlen = ssize_int (0); @@ -1550,6 +1592,7 @@ get_range_strlen (tree arg, tree length[ if we can determine string length minimum and maximum; it will use the minimum from the ones where it can be determined. STRICT false should be only used for warning code. + STRICT is by default false. ELTSIZE is 1 for normal single byte character strings, and 2 or 4 for wide characer strings. ELTSIZE is by default 1. */ diff -Npur gcc/tree-ssa-strlen.c gcc/tree-ssa-strlen.c --- gcc/tree-ssa-strlen.c 2018-08-23 12:35:14.000000000 +0200 +++ gcc/tree-ssa-strlen.c 2018-08-25 21:00:29.659409820 +0200 @@ -1154,42 +1154,6 @@ maybe_set_strlen_range (tree lhs, tree s wide_int max = wi::to_wide (TYPE_MAX_VALUE (ptrdiff_type_node)); wide_int min = wi::zero (max.get_precision ()); - if (TREE_CODE (src) == ADDR_EXPR) - { - /* The last array member of a struct can be bigger than its size - suggests if it's treated as a poor-man's flexible array member. */ - src = TREE_OPERAND (src, 0); - bool src_is_array = TREE_CODE (TREE_TYPE (src)) == ARRAY_TYPE; - if (src_is_array && !array_at_struct_end_p (src)) - { - tree type = TREE_TYPE (src); - if (tree size = TYPE_SIZE_UNIT (type)) - if (size && TREE_CODE (size) == INTEGER_CST) - max = wi::to_wide (size); - - /* For strlen() the upper bound above is equal to - the longest string that can be stored in the array - (i.e., it accounts for the terminating nul. For - strnlen() bump up the maximum by one since the array - need not be nul-terminated. */ - if (!bound && max != 0) - --max; - } - else - { - if (TREE_CODE (src) == COMPONENT_REF && !src_is_array) - src = TREE_OPERAND (src, 1); - if (DECL_P (src)) - { - /* Handle the unlikely case of strlen (&c) where c is some - variable. */ - if (tree size = DECL_SIZE_UNIT (src)) - if (TREE_CODE (size) == INTEGER_CST) - max = wi::to_wide (size); - } - } - } - if (bound) { /* For strnlen, adjust MIN and MAX as necessary. If the bound @@ -3192,7 +3156,13 @@ get_min_string_length (tree rhs, bool *f && TREE_READONLY (rhs)) rhs = DECL_INITIAL (rhs); - if (rhs && TREE_CODE (rhs) == STRING_CST) + if (rhs && TREE_CODE (rhs) == STRING_CST + && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (rhs))) + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)), + TREE_STRING_LENGTH (rhs)) >= 0 + && tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (rhs)))) == 1 + && TREE_STRING_LENGTH (rhs) > 0 + && TREE_STRING_POINTER (rhs) [TREE_STRING_LENGTH (rhs) - 1] == '\0') { *full_string_p = true; return strlen (TREE_STRING_POINTER (rhs)); diff -Npur gcc/testsuite/c-c++-common/attr-nonstring-3.c gcc/testsuite/c-c++-common/attr-nonstring-3.c --- gcc/testsuite/c-c++-common/attr-nonstring-3.c 2018-08-23 12:35:15.000000000 +0200 +++ gcc/testsuite/c-c++-common/attr-nonstring-3.c 2018-08-25 20:45:51.835677067 +0200 @@ -406,7 +406,7 @@ void test_strlen (struct MemArrays *p, c { char a[] __attribute__ ((nonstring)) = { 1, 2, 3 }; - T (strlen (a)); /* { dg-warning "argument 1 declared attribute .nonstring." "pr86688" { xfail *-*-* } } */ + T (strlen (a)); /* { dg-warning "argument 1 declared attribute .nonstring." } */ } { diff -Npur gcc/testsuite/gcc.dg/pr83373.c gcc/testsuite/gcc.dg/pr83373.c --- gcc/testsuite/gcc.dg/pr83373.c 2018-08-22 22:34:01.000000000 +0200 +++ gcc/testsuite/gcc.dg/pr83373.c 2018-08-25 14:41:59.649419917 +0200 @@ -16,7 +16,7 @@ inline char* my_strcpy (char* dst, const __builtin_memcpy (dst, src, len + 1); else { - __builtin_memcpy (dst, src, size - 1); /* { dg-bogus "\\\[-Wstringop-oveflow]" } */ + __builtin_memcpy (dst, src, size - 1); /* { dg-bogus "\\\[-Wstringop-overflow=]" "" { xfail *-*-* } } */ dst[size - 1] = '\0'; } diff -Npur gcc/testsuite/gcc.dg/strlenopt-36.c gcc/testsuite/gcc.dg/strlenopt-36.c --- gcc/testsuite/gcc.dg/strlenopt-36.c 2018-08-22 22:34:01.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-36.c 2018-08-25 18:10:54.475978390 +0200 @@ -83,4 +83,4 @@ void test_nested_memarray (struct Nested T (strlen (ma->ma1.a1) == 0); */ } -/* { dg-final { scan-tree-dump-not "failure_on_line" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "failure_on_line" "optimized" { xfail *-*-* } } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-40.c gcc/testsuite/gcc.dg/strlenopt-40.c --- gcc/testsuite/gcc.dg/strlenopt-40.c 2018-08-22 22:34:01.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-40.c 2018-08-25 18:27:38.573627934 +0200 @@ -105,135 +105,20 @@ void elim_global_arrays (int i) /* Verify that the expression involving the strlen call as well as whatever depends on it is eliminated from the test output. All these expressions must be trivially true. */ - ELIM_TRUE (strlen (a7_3[0]) < sizeof a7_3[0]); - ELIM_TRUE (strlen (a7_3[1]) < sizeof a7_3[1]); - ELIM_TRUE (strlen (a7_3[6]) < sizeof a7_3[6]); - ELIM_TRUE (strlen (a7_3[i]) < sizeof a7_3[i]); - - ELIM_TRUE (strlen (a5_7[0]) < sizeof a5_7[0]); - ELIM_TRUE (strlen (a5_7[1]) < sizeof a5_7[1]); - ELIM_TRUE (strlen (a5_7[4]) < sizeof a5_7[4]); - ELIM_TRUE (strlen (a5_7[i]) < sizeof a5_7[0]); - - ELIM_TRUE (strlen (ax_3[0]) < sizeof ax_3[0]); - ELIM_TRUE (strlen (ax_3[1]) < sizeof ax_3[1]); - ELIM_TRUE (strlen (ax_3[9]) < sizeof ax_3[9]); - ELIM_TRUE (strlen (ax_3[i]) < sizeof ax_3[i]); - - ELIM_TRUE (strlen (a3) < sizeof a3); - ELIM_TRUE (strlen (a7) < sizeof a7); - ELIM_TRUE (strlen (ax) != DIFF_MAX); ELIM_TRUE (strlen (ax) != DIFF_MAX - 1); ELIM_TRUE (strlen (ax) < DIFF_MAX - 1); } -void elim_pointer_to_arrays (void) -{ - ELIM_TRUE (strlen (*pa7) < 7); - ELIM_TRUE (strlen (*pa5) < 5); - ELIM_TRUE (strlen (*pa3) < 3); - - ELIM_TRUE (strlen ((*pa7_3)[0]) < 3); - ELIM_TRUE (strlen ((*pa7_3)[1]) < 3); - ELIM_TRUE (strlen ((*pa7_3)[6]) < 3); - - ELIM_TRUE (strlen ((*pax_3)[0]) < 3); - ELIM_TRUE (strlen ((*pax_3)[1]) < 3); - ELIM_TRUE (strlen ((*pax_3)[9]) < 3); - - ELIM_TRUE (strlen ((*pa5_7)[0]) < 7); - ELIM_TRUE (strlen ((*pa5_7)[1]) < 7); - ELIM_TRUE (strlen ((*pa5_7)[4]) < 7); -} - -void elim_global_arrays_and_strings (int i) -{ - ELIM_TRUE (strlen (i < 0 ? a3 : "") < 3); - ELIM_TRUE (strlen (i < 0 ? a3 : "1") < 3); - ELIM_TRUE (strlen (i < 0 ? a3 : "12") < 3); - ELIM_TRUE (strlen (i < 0 ? a3 : "123") < 4); - - ELIM_FALSE (strlen (i < 0 ? a3 : "") > 3); - ELIM_FALSE (strlen (i < 0 ? a3 : "1") > 3); - ELIM_FALSE (strlen (i < 0 ? a3 : "12") > 3); - ELIM_FALSE (strlen (i < 0 ? a3 : "123") > 4); - - ELIM_TRUE (strlen (i < 0 ? a7 : "") < 7); - ELIM_TRUE (strlen (i < 0 ? a7 : "1") < 7); - ELIM_TRUE (strlen (i < 0 ? a7 : "12") < 7); - ELIM_TRUE (strlen (i < 0 ? a7 : "123") < 7); - ELIM_TRUE (strlen (i < 0 ? a7 : "123456") < 7); - ELIM_TRUE (strlen (i < 0 ? a7 : "1234567") < 8); - - ELIM_FALSE (strlen (i < 0 ? a7 : "") > 6); - ELIM_FALSE (strlen (i < 0 ? a7 : "1") > 6); - ELIM_FALSE (strlen (i < 0 ? a7 : "12") > 6); - ELIM_FALSE (strlen (i < 0 ? a7 : "123") > 6); - ELIM_FALSE (strlen (i < 0 ? a7 : "123456") > 7); - ELIM_FALSE (strlen (i < 0 ? a7 : "1234567") > 8); -} - -void elim_member_arrays_obj (int i) -{ - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a3) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][1].a3) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][2].a3) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][6].a3) < 3); - - ELIM_TRUE (strlen (ma0_3_5_7[1][0][0].a3) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[2][0][1].a3) < 3); - - ELIM_TRUE (strlen (ma0_3_5_7[1][1][0].a3) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a3) < 3); - - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][1].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][2].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][6].a5) < 5); - - ELIM_TRUE (strlen (ma0_3_5_7[1][0][0].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[2][0][1].a5) < 5); - - ELIM_TRUE (strlen (ma0_3_5_7[1][1][0].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a5) < 5); - - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a7_3[0]) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a7_3[2]) < 3); - - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a5_7[4]) < 7); -} - void elim_member_arrays_ptr (struct MemArrays0 *ma0, struct MemArraysX *max, struct MemArrays7 *ma7, int i) { - ELIM_TRUE (strlen (ma0->a7_3[0]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[1]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[6]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[6]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[i]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[i]) < 3); - - ELIM_TRUE (strlen (ma0->a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[0].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[1].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[1].a5_7[4]) < 7); - ELIM_TRUE (strlen (ma0[9].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[9].a5_7[4]) < 7); - - ELIM_TRUE (strlen (ma0->a3) < sizeof ma0->a3); - ELIM_TRUE (strlen (ma0->a5) < sizeof ma0->a5); ELIM_TRUE (strlen (ma0->a0) < DIFF_MAX - 1); - ELIM_TRUE (strlen (max->a3) < sizeof max->a3); - ELIM_TRUE (strlen (max->a5) < sizeof max->a5); ELIM_TRUE (strlen (max->ax) < DIFF_MAX - 1); - ELIM_TRUE (strlen (ma7->a3) < sizeof max->a3); - ELIM_TRUE (strlen (ma7->a5) < sizeof max->a5); ELIM_TRUE (strlen (ma7->a7) < DIFF_MAX - 1); } diff -Npur gcc/testsuite/gcc.dg/strlenopt-45.c gcc/testsuite/gcc.dg/strlenopt-45.c --- gcc/testsuite/gcc.dg/strlenopt-45.c 2018-08-22 22:34:01.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-45.c 2018-08-25 18:46:19.901932755 +0200 @@ -58,46 +58,24 @@ void elim_strnlen_arr_cst (void) be one). */ ELIM (strnlen (&c, 0) == 0); ELIM (strnlen (&c, 1) < 2); - ELIM (strnlen (&c, 2) == 0); - ELIM (strnlen (&c, 9) == 0); - ELIM (strnlen (&c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&c, SIZE_MAX) == 0); - ELIM (strnlen (&c, -1) == 0); ELIM (strnlen (a1, 0) == 0); ELIM (strnlen (a1, 1) < 2); - ELIM (strnlen (a1, 2) == 0); - ELIM (strnlen (a1, 9) == 0); - ELIM (strnlen (a1, PTRDIFF_MAX) == 0); - ELIM (strnlen (a1, SIZE_MAX) == 0); - ELIM (strnlen (a1, -1) == 0); ELIM (strnlen (a3, 0) == 0); ELIM (strnlen (a3, 1) < 2); ELIM (strnlen (a3, 2) < 3); ELIM (strnlen (a3, 3) < 4); - ELIM (strnlen (a3, 9) < 4); - ELIM (strnlen (a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (a3, SIZE_MAX) < 4); - ELIM (strnlen (a3, -1) < 4); ELIM (strnlen (a3_7[0], 0) == 0); ELIM (strnlen (a3_7[0], 1) < 2); ELIM (strnlen (a3_7[0], 2) < 3); ELIM (strnlen (a3_7[0], 3) < 4); - ELIM (strnlen (a3_7[0], 9) < 8); - ELIM (strnlen (a3_7[0], PTRDIFF_MAX) < 8); - ELIM (strnlen (a3_7[0], SIZE_MAX) < 8); - ELIM (strnlen (a3_7[0], -1) < 8); ELIM (strnlen (a3_7[2], 0) == 0); ELIM (strnlen (a3_7[2], 1) < 2); ELIM (strnlen (a3_7[2], 2) < 3); ELIM (strnlen (a3_7[2], 3) < 4); - ELIM (strnlen (a3_7[2], 9) < 8); - ELIM (strnlen (a3_7[2], PTRDIFF_MAX) < 8); - ELIM (strnlen (a3_7[2], SIZE_MAX) < 8); - ELIM (strnlen (a3_7[2], -1) < 8); ELIM (strnlen ((char*)a3_7, 0) == 0); ELIM (strnlen ((char*)a3_7, 1) < 2); @@ -106,10 +84,6 @@ void elim_strnlen_arr_cst (void) ELIM (strnlen ((char*)a3_7, 9) < 10); ELIM (strnlen ((char*)a3_7, 19) < 20); ELIM (strnlen ((char*)a3_7, 21) < 22); - ELIM (strnlen ((char*)a3_7, 23) < 22); - ELIM (strnlen ((char*)a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)a3_7, -1) < 22); ELIM (strnlen (ax, 0) == 0); ELIM (strnlen (ax, 1) < 2); @@ -135,56 +109,32 @@ void elim_strnlen_memarr_cst (struct Mem { ELIM (strnlen (&p->c, 0) == 0); ELIM (strnlen (&p->c, 1) < 2); - ELIM (strnlen (&p->c, 9) == 0); - ELIM (strnlen (&p->c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&p->c, SIZE_MAX) == 0); - ELIM (strnlen (&p->c, -1) == 0); /* Other accesses to internal zero-length arrays are undefined. */ ELIM (strnlen (p->a0, 0) == 0); ELIM (strnlen (p->a1, 0) == 0); ELIM (strnlen (p->a1, 1) < 2); - ELIM (strnlen (p->a1, 9) == 0); - ELIM (strnlen (p->a1, PTRDIFF_MAX) == 0); - ELIM (strnlen (p->a1, SIZE_MAX) == 0); - ELIM (strnlen (p->a1, -1) == 0); ELIM (strnlen (p->a3, 0) == 0); ELIM (strnlen (p->a3, 1) < 2); ELIM (strnlen (p->a3, 2) < 3); ELIM (strnlen (p->a3, 3) < 4); - ELIM (strnlen (p->a3, 9) < 4); - ELIM (strnlen (p->a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p->a3, SIZE_MAX) < 4); - ELIM (strnlen (p->a3, -1) < 4); ELIM (strnlen (p[i].a3, 0) == 0); ELIM (strnlen (p[i].a3, 1) < 2); ELIM (strnlen (p[i].a3, 2) < 3); ELIM (strnlen (p[i].a3, 3) < 4); - ELIM (strnlen (p[i].a3, 9) < 4); - ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p[i].a3, SIZE_MAX) < 4); - ELIM (strnlen (p[i].a3, -1) < 4); ELIM (strnlen (p->a3_7[0], 0) == 0); ELIM (strnlen (p->a3_7[0], 1) < 2); ELIM (strnlen (p->a3_7[0], 2) < 3); ELIM (strnlen (p->a3_7[0], 3) < 4); - ELIM (strnlen (p->a3_7[0], 9) < 8); - ELIM (strnlen (p->a3_7[0], PTRDIFF_MAX) < 8); - ELIM (strnlen (p->a3_7[0], SIZE_MAX) < 8); - ELIM (strnlen (p->a3_7[0], -1) < 8); ELIM (strnlen (p->a3_7[2], 0) == 0); ELIM (strnlen (p->a3_7[2], 1) < 2); ELIM (strnlen (p->a3_7[2], 2) < 3); ELIM (strnlen (p->a3_7[2], 3) < 4); - ELIM (strnlen (p->a3_7[2], 9) < 8); - ELIM (strnlen (p->a3_7[2], PTRDIFF_MAX) < 8); - ELIM (strnlen (p->a3_7[2], SIZE_MAX) < 8); - ELIM (strnlen (p->a3_7[2], -1) < 8); ELIM (strnlen (p->a3_7[i], 0) == 0); ELIM (strnlen (p->a3_7[i], 1) < 2); @@ -210,10 +160,6 @@ void elim_strnlen_memarr_cst (struct Mem ELIM (strnlen ((char*)p->a3_7, 9) < 10); ELIM (strnlen ((char*)p->a3_7, 19) < 20); ELIM (strnlen ((char*)p->a3_7, 21) < 22); - ELIM (strnlen ((char*)p->a3_7, 23) < 22); - ELIM (strnlen ((char*)p->a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, -1) < 22); ELIM (strnlen (p->ax, 0) == 0); ELIM (strnlen (p->ax, 1) < 2); diff -Npur gcc/testsuite/gcc.dg/strlenopt-48.c gcc/testsuite/gcc.dg/strlenopt-48.c --- gcc/testsuite/gcc.dg/strlenopt-48.c 2018-08-22 22:34:01.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-48.c 2018-08-25 18:50:25.483526060 +0200 @@ -31,5 +31,5 @@ void h (void) abort(); } -/* { dg-final { scan-tree-dump-times "strlen" 0 "optimized" } } - { dg-final { scan-tree-dump-times "abort" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "strlen" 0 "optimized" { xfail *-*-* } } } + { dg-final { scan-tree-dump-times "abort" 0 "optimized" { xfail *-*-* } } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-51.c gcc/testsuite/gcc.dg/strlenopt-51.c --- gcc/testsuite/gcc.dg/strlenopt-51.c 2018-08-22 22:34:01.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-51.c 2018-08-25 14:41:59.649419917 +0200 @@ -101,7 +101,7 @@ void test_keep_a9_9 (int i) { #undef T #define T(I) \ - KEEP (strlen (&a9_9[i][I][0]) > (1 + I) % 9); \ + KEEP (strlen (&a9_9[i][I][0]) > (0 + I) % 9); \ KEEP (strlen (&a9_9[i][I][1]) > (1 + I) % 9); \ KEEP (strlen (&a9_9[i][I][2]) > (2 + I) % 9); \ KEEP (strlen (&a9_9[i][I][3]) > (3 + I) % 9); \ @@ -115,7 +115,7 @@ void test_keep_a9_9 (int i) } /* { dg-final { scan-tree-dump-times "strlen" 72 "gimple" } } - { dg-final { scan-tree-dump-times "strlen" 63 "optimized" } } + { dg-final { scan-tree-dump-times "strlen" 72 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 72 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-57.c gcc/testsuite/gcc.dg/strlenopt-57.c --- gcc/testsuite/gcc.dg/strlenopt-57.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.dg/strlenopt-57.c 2018-08-25 14:41:59.649419917 +0200 @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +#define assert(x) do { if (!(x)) __builtin_abort (); } while (0) +extern int system (const char *); +static int fun (char *p) +{ + char buf[16]; + + assert (__builtin_strlen (p) < 4); + + __builtin_sprintf (buf, "echo %s - %s", p, p); + return system (buf); +} + +void test (void) +{ + char b[2] = "ab"; + fun (b); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_sprintf" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "system" 1 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-58.c gcc/testsuite/gcc.dg/strlenopt-58.c --- gcc/testsuite/gcc.dg/strlenopt-58.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.dg/strlenopt-58.c 2018-08-25 15:26:13.653767630 +0200 @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +#define assert(x) do { if (!(x)) __builtin_abort (); } while (0) +extern int system (const char *); +static int fun (char *p) +{ + char buf[16]; + + assert (__builtin_strlen (p) < 4); + + __builtin_sprintf (buf, "echo %s - %s", p, p); + return system (buf); +} + +void test (void) +{ + char b[3] = "ab"; + fun (b); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_sprintf" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "system" 1 "optimized" } } */ ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-26 9:58 ` Bernd Edlinger @ 2018-09-15 9:22 ` Bernd Edlinger 2018-10-10 23:12 ` Jeff Law 2018-10-12 15:03 ` Jeff Law 0 siblings, 2 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-09-15 9:22 UTC (permalink / raw) To: Jeff Law, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 549 bytes --] Hi, this is an update on my strlen range patch (V7). Again re-based and retested to current trunk. I am aware that Martin wants to re-factor the interface of get_range_strlen and have no objections against, but I'd suggest that to be a follow-up patch. I might suggest to rename one of the two get_range_strlen functions at the same time as it is rather confusing to have to count the parameters in order to tell which function is meant. Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd. [-- Attachment #2: changelog-range-strlen-v7.txt --] [-- Type: text/plain, Size: 909 bytes --] gcc: 2018-08-26 Bernd Edlinger <bernd.edlinger@hotmail.de> * gimple-fold.c (looks_like_a_char_array_without_typecast_p): New helper function for strlen range estimations. (get_range_strlen): Use looks_like_a_char_array_without_typecast_p for warnings, but use GIMPLE semantics otherwise. * tree-ssa-strlen.c (maybe_set_strlen_range): Use GIMPLE semantics. (get_min_string_length): Avoid not NUL terminated string literals. testsuite: 2018-08-26 Bernd Edlinger <bernd.edlinger@hotmail.de> * c-c++-common/attr-nonstring-3.c: Remove xfail. * gcc.dg/pr83373.c: Add xfail. * gcc.dg/strlenopt-36.c: Add xfail. * gcc.dg/strlenopt-40.c: Adjust test expectations. * gcc.dg/strlenopt-45.c: Adjust test expectations. * gcc.dg/strlenopt-48.c: Add xfail. * gcc.dg/strlenopt-51.c: Adjust test expectations. * gcc.dg/strlenopt-59.c: New test. * gcc.dg/strlenopt-60.c: New test. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: patch-range-strlen-v7.diff --] [-- Type: text/x-patch; name="patch-range-strlen-v7.diff", Size: 24652 bytes --] diff -Npur gcc/gimple-fold.c gcc/gimple-fold.c --- gcc/gimple-fold.c 2018-09-14 21:52:32.000000000 +0200 +++ gcc/gimple-fold.c 2018-09-14 22:46:11.764019221 +0200 @@ -1261,6 +1261,40 @@ gimple_fold_builtin_memset (gimple_stmt_ return true; } +/* Determine if a char array is suitable for strlen range estimations. + Return false if ARG is not a char array, or if the inner reference + chain appears to go through a type cast, otherwise return true. + Note that the gimple type informations are not 100% guaranteed + to be accurate, therefore this function shall only be used for + warnings. */ + +static bool +looks_like_a_char_array_without_typecast_p (tree arg) +{ + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (arg)) != ARRAY_TYPE + || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != INTEGER_TYPE + || TYPE_MODE (TREE_TYPE (TREE_TYPE (arg))) != TYPE_MODE (char_type_node) + || TYPE_PRECISION (TREE_TYPE (TREE_TYPE (arg))) + != TYPE_PRECISION (char_type_node)) + return false; + + tree base = arg; + while (TREE_CODE (base) == ARRAY_REF + || TREE_CODE (base) == ARRAY_RANGE_REF + || TREE_CODE (base) == COMPONENT_REF) + base = TREE_OPERAND (base, 0); + + /* If this looks like a type cast don't assume anything. */ + if ((TREE_CODE (base) == MEM_REF + && (! integer_zerop (TREE_OPERAND (base, 1)) + || TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0)))) + != TYPE_MAIN_VARIANT (TREE_TYPE (base)))) + || handled_component_p (base)) + return false; + + return true; +} /* Obtain the minimum and maximum string length or minimum and maximum value of ARG in LENGTH[0] and LENGTH[1], respectively. @@ -1276,6 +1310,7 @@ gimple_fold_builtin_memset (gimple_stmt_ PHIs and COND_EXPRs optimistically, if we can determine string length minimum and maximum, it will use the minimum from the ones where it can be determined. + TYPE == 2 and FUZZY != 0 cannot be used together. Set *FLEXP to true if the range of the string lengths has been obtained from the upper bound of an array at the end of a struct. Such an array may hold a string that's longer than its upper bound @@ -1318,8 +1353,8 @@ get_range_strlen (tree arg, tree length[ member. */ tree idx = TREE_OPERAND (op, 1); - arg = TREE_OPERAND (op, 0); - tree optype = TREE_TYPE (arg); + op = TREE_OPERAND (op, 0); + tree optype = TREE_TYPE (op); if (tree dom = TYPE_DOMAIN (optype)) if (tree bound = TYPE_MAX_VALUE (dom)) if (TREE_CODE (bound) == INTEGER_CST @@ -1346,23 +1381,21 @@ get_range_strlen (tree arg, tree length[ visited, type, fuzzy, flexp, eltsize, nonstr); + if (eltsize != 1 || fuzzy != 2) + return false; + if (TREE_CODE (arg) == ARRAY_REF) { - tree type = TREE_TYPE (TREE_OPERAND (arg, 0)); - - /* Determine the "innermost" array type. */ - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); - - /* Avoid arrays of pointers. */ - tree eltype = TREE_TYPE (type); - if (TREE_CODE (type) != ARRAY_TYPE - || !INTEGRAL_TYPE_P (eltype)) + if (!looks_like_a_char_array_without_typecast_p (arg)) return false; + tree type = TREE_TYPE (arg); + + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); - if (!val || integer_zerop (val)) + if (!val + || TREE_CODE (val) != INTEGER_CST + || integer_zerop (val)) return false; val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, @@ -1371,15 +1404,16 @@ get_range_strlen (tree arg, tree length[ the array could have zero length. */ *minlen = ssize_int (0); - if (TREE_CODE (TREE_OPERAND (arg, 0)) == COMPONENT_REF - && type == TREE_TYPE (TREE_OPERAND (arg, 0)) - && array_at_struct_end_p (TREE_OPERAND (arg, 0))) + if (TREE_CODE (arg) == COMPONENT_REF + && type == TREE_TYPE (arg) + && array_at_struct_end_p (arg)) *flexp = true; } - else if (TREE_CODE (arg) == COMPONENT_REF - && (TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 1))) - == ARRAY_TYPE)) + else if (TREE_CODE (arg) == COMPONENT_REF) { + if (!looks_like_a_char_array_without_typecast_p (arg)) + return false; + /* Use the type of the member array to determine the upper bound on the length of the array. This may be overly optimistic if the array itself isn't NUL-terminated and @@ -1395,22 +1429,20 @@ get_range_strlen (tree arg, tree length[ tree type = TREE_TYPE (arg); - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); - /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); - if (!val || integer_zerop (val)) + if (!val + || TREE_CODE (val) != INTEGER_CST + || integer_zerop (val)) return false; + val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, integer_one_node); /* Set the minimum size to zero since the string in the array could have zero length. */ *minlen = ssize_int (0); } - - if (VAR_P (arg)) + else if (VAR_P (arg)) { tree type = TREE_TYPE (arg); if (POINTER_TYPE_P (type)) @@ -1418,13 +1450,23 @@ get_range_strlen (tree arg, tree length[ if (TREE_CODE (type) == ARRAY_TYPE) { + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE + || TYPE_MODE (TREE_TYPE (type)) + != TYPE_MODE (char_type_node) + || TYPE_PRECISION (TREE_TYPE (type)) + != TYPE_PRECISION (char_type_node)) + return false; + + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop (val)) return false; - val = wide_int_to_tree (TREE_TYPE (val), - wi::sub (wi::to_wide (val), 1)); + + val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, + integer_one_node); /* Set the minimum size to zero since the string in the array could have zero length. */ *minlen = ssize_int (0); @@ -1557,6 +1599,8 @@ get_range_strlen (tree arg, tree length[ if we can determine string length minimum and maximum; it will use the minimum from the ones where it can be determined. STRICT false should be only used for warning code. + STRICT is by default false. + When non-null, clear *NONSTR if ARG refers to a constant array that is known not be nul-terminated. Otherwise set it to the declaration of the constant non-terminated array. diff -Npur gcc/tree-ssa-strlen.c gcc/tree-ssa-strlen.c --- gcc/tree-ssa-strlen.c 2018-09-14 13:12:10.000000000 +0200 +++ gcc/tree-ssa-strlen.c 2018-09-14 22:43:59.077921412 +0200 @@ -1144,44 +1144,6 @@ maybe_set_strlen_range (tree lhs, tree s wide_int max = wi::to_wide (TYPE_MAX_VALUE (ptrdiff_type_node)); wide_int min = wi::zero (max.get_precision ()); - if (TREE_CODE (src) == ADDR_EXPR) - { - /* The last array member of a struct can be bigger than its size - suggests if it's treated as a poor-man's flexible array member. */ - src = TREE_OPERAND (src, 0); - bool src_is_array = TREE_CODE (TREE_TYPE (src)) == ARRAY_TYPE; - if (src_is_array - && TREE_CODE (src) != MEM_REF - && !array_at_struct_end_p (src)) - { - tree type = TREE_TYPE (src); - if (tree size = TYPE_SIZE_UNIT (type)) - if (size && TREE_CODE (size) == INTEGER_CST) - max = wi::to_wide (size); - - /* For strlen() the upper bound above is equal to - the longest string that can be stored in the array - (i.e., it accounts for the terminating nul. For - strnlen() bump up the maximum by one since the array - need not be nul-terminated. */ - if (!bound && max != 0) - --max; - } - else - { - if (TREE_CODE (src) == COMPONENT_REF && !src_is_array) - src = TREE_OPERAND (src, 1); - if (DECL_P (src)) - { - /* Handle the unlikely case of strlen (&c) where c is some - variable. */ - if (tree size = DECL_SIZE_UNIT (src)) - if (TREE_CODE (size) == INTEGER_CST) - max = wi::to_wide (size); - } - } - } - if (bound) { /* For strnlen, adjust MIN and MAX as necessary. If the bound @@ -3184,7 +3146,10 @@ get_min_string_length (tree rhs, bool *f && TREE_READONLY (rhs)) rhs = DECL_INITIAL (rhs); - if (rhs && TREE_CODE (rhs) == STRING_CST) + if (rhs && TREE_CODE (rhs) == STRING_CST + && tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (rhs)))) == 1 + && TREE_STRING_LENGTH (rhs) > 0 + && TREE_STRING_POINTER (rhs) [TREE_STRING_LENGTH (rhs) - 1] == '\0') { *full_string_p = true; return strlen (TREE_STRING_POINTER (rhs)); diff -Npur gcc/testsuite/c-c++-common/attr-nonstring-3.c gcc/testsuite/c-c++-common/attr-nonstring-3.c --- gcc/testsuite/c-c++-common/attr-nonstring-3.c 2018-08-31 08:32:03.000000000 +0200 +++ gcc/testsuite/c-c++-common/attr-nonstring-3.c 2018-09-14 22:43:59.078921397 +0200 @@ -406,7 +406,7 @@ void test_strlen (struct MemArrays *p, c { char a[] __attribute__ ((nonstring)) = { 1, 2, 3 }; - T (strlen (a)); /* { dg-warning "argument 1 declared attribute .nonstring." "pr86688" { xfail *-*-* } } */ + T (strlen (a)); /* { dg-warning "argument 1 declared attribute .nonstring." } */ } { diff -Npur gcc/testsuite/gcc.dg/pr83373.c gcc/testsuite/gcc.dg/pr83373.c --- gcc/testsuite/gcc.dg/pr83373.c 2018-08-31 08:32:03.000000000 +0200 +++ gcc/testsuite/gcc.dg/pr83373.c 2018-09-14 22:43:59.078921397 +0200 @@ -16,7 +16,7 @@ inline char* my_strcpy (char* dst, const __builtin_memcpy (dst, src, len + 1); else { - __builtin_memcpy (dst, src, size - 1); /* { dg-bogus "\\\[-Wstringop-oveflow]" } */ + __builtin_memcpy (dst, src, size - 1); /* { dg-bogus "\\\[-Wstringop-overflow=]" "" { xfail *-*-* } } */ dst[size - 1] = '\0'; } diff -Npur gcc/testsuite/gcc.dg/strlenopt-36.c gcc/testsuite/gcc.dg/strlenopt-36.c --- gcc/testsuite/gcc.dg/strlenopt-36.c 2018-08-31 08:32:03.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-36.c 2018-09-14 22:43:59.078921397 +0200 @@ -83,4 +83,4 @@ void test_nested_memarray (struct Nested T (strlen (ma->ma1.a1) == 0); */ } -/* { dg-final { scan-tree-dump-not "failure_on_line" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "failure_on_line" "optimized" { xfail *-*-* } } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-40.c gcc/testsuite/gcc.dg/strlenopt-40.c --- gcc/testsuite/gcc.dg/strlenopt-40.c 2018-08-31 08:32:04.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-40.c 2018-09-14 22:43:59.079921383 +0200 @@ -105,135 +105,20 @@ void elim_global_arrays (int i) /* Verify that the expression involving the strlen call as well as whatever depends on it is eliminated from the test output. All these expressions must be trivially true. */ - ELIM_TRUE (strlen (a7_3[0]) < sizeof a7_3[0]); - ELIM_TRUE (strlen (a7_3[1]) < sizeof a7_3[1]); - ELIM_TRUE (strlen (a7_3[6]) < sizeof a7_3[6]); - ELIM_TRUE (strlen (a7_3[i]) < sizeof a7_3[i]); - - ELIM_TRUE (strlen (a5_7[0]) < sizeof a5_7[0]); - ELIM_TRUE (strlen (a5_7[1]) < sizeof a5_7[1]); - ELIM_TRUE (strlen (a5_7[4]) < sizeof a5_7[4]); - ELIM_TRUE (strlen (a5_7[i]) < sizeof a5_7[0]); - - ELIM_TRUE (strlen (ax_3[0]) < sizeof ax_3[0]); - ELIM_TRUE (strlen (ax_3[1]) < sizeof ax_3[1]); - ELIM_TRUE (strlen (ax_3[9]) < sizeof ax_3[9]); - ELIM_TRUE (strlen (ax_3[i]) < sizeof ax_3[i]); - - ELIM_TRUE (strlen (a3) < sizeof a3); - ELIM_TRUE (strlen (a7) < sizeof a7); - ELIM_TRUE (strlen (ax) != DIFF_MAX); ELIM_TRUE (strlen (ax) != DIFF_MAX - 1); ELIM_TRUE (strlen (ax) < DIFF_MAX - 1); } -void elim_pointer_to_arrays (void) -{ - ELIM_TRUE (strlen (*pa7) < 7); - ELIM_TRUE (strlen (*pa5) < 5); - ELIM_TRUE (strlen (*pa3) < 3); - - ELIM_TRUE (strlen ((*pa7_3)[0]) < 3); - ELIM_TRUE (strlen ((*pa7_3)[1]) < 3); - ELIM_TRUE (strlen ((*pa7_3)[6]) < 3); - - ELIM_TRUE (strlen ((*pax_3)[0]) < 3); - ELIM_TRUE (strlen ((*pax_3)[1]) < 3); - ELIM_TRUE (strlen ((*pax_3)[9]) < 3); - - ELIM_TRUE (strlen ((*pa5_7)[0]) < 7); - ELIM_TRUE (strlen ((*pa5_7)[1]) < 7); - ELIM_TRUE (strlen ((*pa5_7)[4]) < 7); -} - -void elim_global_arrays_and_strings (int i) -{ - ELIM_TRUE (strlen (i < 0 ? a3 : "") < 3); - ELIM_TRUE (strlen (i < 0 ? a3 : "1") < 3); - ELIM_TRUE (strlen (i < 0 ? a3 : "12") < 3); - ELIM_TRUE (strlen (i < 0 ? a3 : "123") < 4); - - ELIM_FALSE (strlen (i < 0 ? a3 : "") > 3); - ELIM_FALSE (strlen (i < 0 ? a3 : "1") > 3); - ELIM_FALSE (strlen (i < 0 ? a3 : "12") > 3); - ELIM_FALSE (strlen (i < 0 ? a3 : "123") > 4); - - ELIM_TRUE (strlen (i < 0 ? a7 : "") < 7); - ELIM_TRUE (strlen (i < 0 ? a7 : "1") < 7); - ELIM_TRUE (strlen (i < 0 ? a7 : "12") < 7); - ELIM_TRUE (strlen (i < 0 ? a7 : "123") < 7); - ELIM_TRUE (strlen (i < 0 ? a7 : "123456") < 7); - ELIM_TRUE (strlen (i < 0 ? a7 : "1234567") < 8); - - ELIM_FALSE (strlen (i < 0 ? a7 : "") > 6); - ELIM_FALSE (strlen (i < 0 ? a7 : "1") > 6); - ELIM_FALSE (strlen (i < 0 ? a7 : "12") > 6); - ELIM_FALSE (strlen (i < 0 ? a7 : "123") > 6); - ELIM_FALSE (strlen (i < 0 ? a7 : "123456") > 7); - ELIM_FALSE (strlen (i < 0 ? a7 : "1234567") > 8); -} - -void elim_member_arrays_obj (int i) -{ - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a3) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][1].a3) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][2].a3) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][6].a3) < 3); - - ELIM_TRUE (strlen (ma0_3_5_7[1][0][0].a3) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[2][0][1].a3) < 3); - - ELIM_TRUE (strlen (ma0_3_5_7[1][1][0].a3) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a3) < 3); - - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][1].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][2].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[0][0][6].a5) < 5); - - ELIM_TRUE (strlen (ma0_3_5_7[1][0][0].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[2][0][1].a5) < 5); - - ELIM_TRUE (strlen (ma0_3_5_7[1][1][0].a5) < 5); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a5) < 5); - - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a7_3[0]) < 3); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a7_3[2]) < 3); - - ELIM_TRUE (strlen (ma0_3_5_7[0][0][0].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0_3_5_7[2][4][6].a5_7[4]) < 7); -} - void elim_member_arrays_ptr (struct MemArrays0 *ma0, struct MemArraysX *max, struct MemArrays7 *ma7, int i) { - ELIM_TRUE (strlen (ma0->a7_3[0]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[1]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[6]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[6]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[i]) < 3); - ELIM_TRUE (strlen (ma0->a7_3[i]) < 3); - - ELIM_TRUE (strlen (ma0->a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[0].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[1].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[1].a5_7[4]) < 7); - ELIM_TRUE (strlen (ma0[9].a5_7[0]) < 7); - ELIM_TRUE (strlen (ma0[9].a5_7[4]) < 7); - - ELIM_TRUE (strlen (ma0->a3) < sizeof ma0->a3); - ELIM_TRUE (strlen (ma0->a5) < sizeof ma0->a5); ELIM_TRUE (strlen (ma0->a0) < DIFF_MAX - 1); - ELIM_TRUE (strlen (max->a3) < sizeof max->a3); - ELIM_TRUE (strlen (max->a5) < sizeof max->a5); ELIM_TRUE (strlen (max->ax) < DIFF_MAX - 1); - ELIM_TRUE (strlen (ma7->a3) < sizeof max->a3); - ELIM_TRUE (strlen (ma7->a5) < sizeof max->a5); ELIM_TRUE (strlen (ma7->a7) < DIFF_MAX - 1); } diff -Npur gcc/testsuite/gcc.dg/strlenopt-45.c gcc/testsuite/gcc.dg/strlenopt-45.c --- gcc/testsuite/gcc.dg/strlenopt-45.c 2018-08-31 08:32:04.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-45.c 2018-09-14 22:43:59.080921369 +0200 @@ -58,46 +58,24 @@ void elim_strnlen_arr_cst (void) be one). */ ELIM (strnlen (&c, 0) == 0); ELIM (strnlen (&c, 1) < 2); - ELIM (strnlen (&c, 2) == 0); - ELIM (strnlen (&c, 9) == 0); - ELIM (strnlen (&c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&c, SIZE_MAX) == 0); - ELIM (strnlen (&c, -1) == 0); ELIM (strnlen (a1, 0) == 0); ELIM (strnlen (a1, 1) < 2); - ELIM (strnlen (a1, 2) == 0); - ELIM (strnlen (a1, 9) == 0); - ELIM (strnlen (a1, PTRDIFF_MAX) == 0); - ELIM (strnlen (a1, SIZE_MAX) == 0); - ELIM (strnlen (a1, -1) == 0); ELIM (strnlen (a3, 0) == 0); ELIM (strnlen (a3, 1) < 2); ELIM (strnlen (a3, 2) < 3); ELIM (strnlen (a3, 3) < 4); - ELIM (strnlen (a3, 9) < 4); - ELIM (strnlen (a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (a3, SIZE_MAX) < 4); - ELIM (strnlen (a3, -1) < 4); ELIM (strnlen (a3_7[0], 0) == 0); ELIM (strnlen (a3_7[0], 1) < 2); ELIM (strnlen (a3_7[0], 2) < 3); ELIM (strnlen (a3_7[0], 3) < 4); - ELIM (strnlen (a3_7[0], 9) < 8); - ELIM (strnlen (a3_7[0], PTRDIFF_MAX) < 8); - ELIM (strnlen (a3_7[0], SIZE_MAX) < 8); - ELIM (strnlen (a3_7[0], -1) < 8); ELIM (strnlen (a3_7[2], 0) == 0); ELIM (strnlen (a3_7[2], 1) < 2); ELIM (strnlen (a3_7[2], 2) < 3); ELIM (strnlen (a3_7[2], 3) < 4); - ELIM (strnlen (a3_7[2], 9) < 8); - ELIM (strnlen (a3_7[2], PTRDIFF_MAX) < 8); - ELIM (strnlen (a3_7[2], SIZE_MAX) < 8); - ELIM (strnlen (a3_7[2], -1) < 8); ELIM (strnlen ((char*)a3_7, 0) == 0); ELIM (strnlen ((char*)a3_7, 1) < 2); @@ -106,10 +84,6 @@ void elim_strnlen_arr_cst (void) ELIM (strnlen ((char*)a3_7, 9) < 10); ELIM (strnlen ((char*)a3_7, 19) < 20); ELIM (strnlen ((char*)a3_7, 21) < 22); - ELIM (strnlen ((char*)a3_7, 23) < 22); - ELIM (strnlen ((char*)a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)a3_7, -1) < 22); ELIM (strnlen (ax, 0) == 0); ELIM (strnlen (ax, 1) < 2); @@ -135,56 +109,32 @@ void elim_strnlen_memarr_cst (struct Mem { ELIM (strnlen (&p->c, 0) == 0); ELIM (strnlen (&p->c, 1) < 2); - ELIM (strnlen (&p->c, 9) == 0); - ELIM (strnlen (&p->c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&p->c, SIZE_MAX) == 0); - ELIM (strnlen (&p->c, -1) == 0); /* Other accesses to internal zero-length arrays are undefined. */ ELIM (strnlen (p->a0, 0) == 0); ELIM (strnlen (p->a1, 0) == 0); ELIM (strnlen (p->a1, 1) < 2); - ELIM (strnlen (p->a1, 9) == 0); - ELIM (strnlen (p->a1, PTRDIFF_MAX) == 0); - ELIM (strnlen (p->a1, SIZE_MAX) == 0); - ELIM (strnlen (p->a1, -1) == 0); ELIM (strnlen (p->a3, 0) == 0); ELIM (strnlen (p->a3, 1) < 2); ELIM (strnlen (p->a3, 2) < 3); ELIM (strnlen (p->a3, 3) < 4); - ELIM (strnlen (p->a3, 9) < 4); - ELIM (strnlen (p->a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p->a3, SIZE_MAX) < 4); - ELIM (strnlen (p->a3, -1) < 4); ELIM (strnlen (p[i].a3, 0) == 0); ELIM (strnlen (p[i].a3, 1) < 2); ELIM (strnlen (p[i].a3, 2) < 3); ELIM (strnlen (p[i].a3, 3) < 4); - ELIM (strnlen (p[i].a3, 9) < 4); - ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p[i].a3, SIZE_MAX) < 4); - ELIM (strnlen (p[i].a3, -1) < 4); ELIM (strnlen (p->a3_7[0], 0) == 0); ELIM (strnlen (p->a3_7[0], 1) < 2); ELIM (strnlen (p->a3_7[0], 2) < 3); ELIM (strnlen (p->a3_7[0], 3) < 4); - ELIM (strnlen (p->a3_7[0], 9) < 8); - ELIM (strnlen (p->a3_7[0], PTRDIFF_MAX) < 8); - ELIM (strnlen (p->a3_7[0], SIZE_MAX) < 8); - ELIM (strnlen (p->a3_7[0], -1) < 8); ELIM (strnlen (p->a3_7[2], 0) == 0); ELIM (strnlen (p->a3_7[2], 1) < 2); ELIM (strnlen (p->a3_7[2], 2) < 3); ELIM (strnlen (p->a3_7[2], 3) < 4); - ELIM (strnlen (p->a3_7[2], 9) < 8); - ELIM (strnlen (p->a3_7[2], PTRDIFF_MAX) < 8); - ELIM (strnlen (p->a3_7[2], SIZE_MAX) < 8); - ELIM (strnlen (p->a3_7[2], -1) < 8); ELIM (strnlen (p->a3_7[i], 0) == 0); ELIM (strnlen (p->a3_7[i], 1) < 2); @@ -210,10 +160,6 @@ void elim_strnlen_memarr_cst (struct Mem ELIM (strnlen ((char*)p->a3_7, 9) < 10); ELIM (strnlen ((char*)p->a3_7, 19) < 20); ELIM (strnlen ((char*)p->a3_7, 21) < 22); - ELIM (strnlen ((char*)p->a3_7, 23) < 22); - ELIM (strnlen ((char*)p->a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, -1) < 22); ELIM (strnlen (p->ax, 0) == 0); ELIM (strnlen (p->ax, 1) < 2); diff -Npur gcc/testsuite/gcc.dg/strlenopt-48.c gcc/testsuite/gcc.dg/strlenopt-48.c --- gcc/testsuite/gcc.dg/strlenopt-48.c 2018-08-31 08:32:04.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-48.c 2018-09-14 22:43:59.080921369 +0200 @@ -31,5 +31,5 @@ void h (void) abort(); } -/* { dg-final { scan-tree-dump-times "strlen" 0 "optimized" } } - { dg-final { scan-tree-dump-times "abort" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "strlen" 0 "optimized" { xfail *-*-* } } } + { dg-final { scan-tree-dump-times "abort" 0 "optimized" { xfail *-*-* } } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-51.c gcc/testsuite/gcc.dg/strlenopt-51.c --- gcc/testsuite/gcc.dg/strlenopt-51.c 2018-08-31 08:32:04.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-51.c 2018-09-14 22:43:59.081921354 +0200 @@ -101,7 +101,7 @@ void test_keep_a9_9 (int i) { #undef T #define T(I) \ - KEEP (strlen (&a9_9[i][I][0]) > (1 + I) % 9); \ + KEEP (strlen (&a9_9[i][I][0]) > (0 + I) % 9); \ KEEP (strlen (&a9_9[i][I][1]) > (1 + I) % 9); \ KEEP (strlen (&a9_9[i][I][2]) > (2 + I) % 9); \ KEEP (strlen (&a9_9[i][I][3]) > (3 + I) % 9); \ @@ -115,7 +115,7 @@ void test_keep_a9_9 (int i) } /* { dg-final { scan-tree-dump-times "strlen" 72 "gimple" } } - { dg-final { scan-tree-dump-times "strlen" 63 "optimized" } } + { dg-final { scan-tree-dump-times "strlen" 72 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 72 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-59.c gcc/testsuite/gcc.dg/strlenopt-59.c --- gcc/testsuite/gcc.dg/strlenopt-59.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.dg/strlenopt-59.c 2018-09-14 22:43:59.081921354 +0200 @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +#define assert(x) do { if (!(x)) __builtin_abort (); } while (0) +extern int system (const char *); +static int fun (char *p) +{ + char buf[16]; + + assert (__builtin_strlen (p) < 4); + + __builtin_sprintf (buf, "echo %s - %s", p, p); + return system (buf); +} + +void test (void) +{ + char b[2] = "ab"; + fun (b); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_sprintf" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "system" 1 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-60.c gcc/testsuite/gcc.dg/strlenopt-60.c --- gcc/testsuite/gcc.dg/strlenopt-60.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.dg/strlenopt-60.c 2018-09-14 22:43:59.082921340 +0200 @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +#define assert(x) do { if (!(x)) __builtin_abort (); } while (0) +extern int system (const char *); +static int fun (char *p) +{ + char buf[16]; + + assert (__builtin_strlen (p) < 4); + + __builtin_sprintf (buf, "echo %s - %s", p, p); + return system (buf); +} + +void test (void) +{ + char b[3] = "ab"; + fun (b); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_sprintf" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "system" 1 "optimized" } } */ ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-09-15 9:22 ` Bernd Edlinger @ 2018-10-10 23:12 ` Jeff Law 2018-10-12 15:03 ` Jeff Law 1 sibling, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-10-10 23:12 UTC (permalink / raw) To: Bernd Edlinger, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 9/15/18 2:43 AM, Bernd Edlinger wrote: > Hi, > > this is an update on my strlen range patch (V7). Again re-based and > retested to current trunk. > > I am aware that Martin wants to re-factor the interface of get_range_strlen > and have no objections against, but I'd suggest that to be a follow-up patch. > > I might suggest to rename one of the two get_range_strlen functions at the > same time as it is rather confusing to have to count the parameters in order > to tell which function is meant. > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? > > > Thanks > Bernd. > > > changelog-range-strlen-v7.txt > > gcc: > 2018-08-26 Bernd Edlinger <bernd.edlinger@hotmail.de> > > * gimple-fold.c (looks_like_a_char_array_without_typecast_p): New > helper function for strlen range estimations. > (get_range_strlen): Use looks_like_a_char_array_without_typecast_p > for warnings, but use GIMPLE semantics otherwise. > * tree-ssa-strlen.c (maybe_set_strlen_range): Use GIMPLE semantics. > (get_min_string_length): Avoid not NUL terminated string literals. > > testsuite: > 2018-08-26 Bernd Edlinger <bernd.edlinger@hotmail.de> > > * c-c++-common/attr-nonstring-3.c: Remove xfail. > * gcc.dg/pr83373.c: Add xfail. > * gcc.dg/strlenopt-36.c: Add xfail. > * gcc.dg/strlenopt-40.c: Adjust test expectations. > * gcc.dg/strlenopt-45.c: Adjust test expectations. > * gcc.dg/strlenopt-48.c: Add xfail. > * gcc.dg/strlenopt-51.c: Adjust test expectations. > * gcc.dg/strlenopt-59.c: New test. > * gcc.dg/strlenopt-60.c: New test. Just an FYI -- this is not forgotten. I'll be poking at it tomorrow. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-09-15 9:22 ` Bernd Edlinger 2018-10-10 23:12 ` Jeff Law @ 2018-10-12 15:03 ` Jeff Law 2018-10-13 9:07 ` Bernd Edlinger 1 sibling, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-10-12 15:03 UTC (permalink / raw) To: Bernd Edlinger, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 9/15/18 2:43 AM, Bernd Edlinger wrote: > Hi, > > this is an update on my strlen range patch (V7). Again re-based and > retested to current trunk. > > I am aware that Martin wants to re-factor the interface of get_range_strlen > and have no objections against, but I'd suggest that to be a follow-up patch. > > I might suggest to rename one of the two get_range_strlen functions at the > same time as it is rather confusing to have to count the parameters in order > to tell which function is meant. > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? > > > Thanks > Bernd. > > > changelog-range-strlen-v7.txt > > gcc: > 2018-08-26 Bernd Edlinger <bernd.edlinger@hotmail.de> > > * gimple-fold.c (looks_like_a_char_array_without_typecast_p): New > helper function for strlen range estimations. > (get_range_strlen): Use looks_like_a_char_array_without_typecast_p > for warnings, but use GIMPLE semantics otherwise. > * tree-ssa-strlen.c (maybe_set_strlen_range): Use GIMPLE semantics. > (get_min_string_length): Avoid not NUL terminated string literals. The introduction of looks_like_a_char_array_without_typecast_p is probably a good thing. Too much code is already implemented inline within get_range_strlen. It looks like you added handling of ARRAY_RANGE_REF. I don't know how often they come up in practice, but handling it seems like a reasonable extension to what we're doing. Bonus points if it's triggering with any kind of consistency. I actually prefer Martin's unification of type/fuzzy into a single enumeration to describe the desired behavior. Doing it with two args where some values are mutually exclusive is just asking for trouble. Though I like that you called out the values that are mutually exclusive. I definitely want to look at how your patch and Martin's differ on the handling of flexible array members -- clearly we must avoid setting a range in that case. I'm surprised this didn't trigger a failure in the testsuite though. Martin's work in this space did. The bugfix in get_min_string_length looks like it probably stands on its own. I'm still evaluating the two approaches... jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-10-12 15:03 ` Jeff Law @ 2018-10-13 9:07 ` Bernd Edlinger 2018-10-17 23:59 ` Jeff Law 0 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-10-13 9:07 UTC (permalink / raw) To: Jeff Law, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 10/12/18 16:55, Jeff Law wrote: > On 9/15/18 2:43 AM, Bernd Edlinger wrote: >> Hi, >> >> this is an update on my strlen range patch (V7). Again re-based and >> retested to current trunk. >> >> I am aware that Martin wants to re-factor the interface of get_range_strlen >> and have no objections against, but I'd suggest that to be a follow-up patch. >> >> I might suggest to rename one of the two get_range_strlen functions at the >> same time as it is rather confusing to have to count the parameters in order >> to tell which function is meant. >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? >> >> >> Thanks >> Bernd. >> >> >> changelog-range-strlen-v7.txt >> >> gcc: >> 2018-08-26 Bernd Edlinger <bernd.edlinger@hotmail.de> >> >> * gimple-fold.c (looks_like_a_char_array_without_typecast_p): New >> helper function for strlen range estimations. >> (get_range_strlen): Use looks_like_a_char_array_without_typecast_p >> for warnings, but use GIMPLE semantics otherwise. >> * tree-ssa-strlen.c (maybe_set_strlen_range): Use GIMPLE semantics. >> (get_min_string_length): Avoid not NUL terminated string literals. > The introduction of looks_like_a_char_array_without_typecast_p is > probably a good thing. Too much code is already implemented inline > within get_range_strlen. > > It looks like you added handling of ARRAY_RANGE_REF. I don't know how > often they come up in practice, but handling it seems like a reasonable > extension to what we're doing. Bonus points if it's triggering with any > kind of consistency. > I did only want to be consistent with get_inner_reference here, but did not have encountered these, probably only an Ada thing? > I actually prefer Martin's unification of type/fuzzy into a single > enumeration to describe the desired behavior. Doing it with two args > where some values are mutually exclusive is just asking for trouble. > Though I like that you called out the values that are mutually exclusive. > > I definitely want to look at how your patch and Martin's differ on the > handling of flexible array members -- clearly we must avoid setting a > range in that case. I'm surprised this didn't trigger a failure in the > testsuite though. Martin's work in this space did. > > The bugfix in get_min_string_length looks like it probably stands on its > own. > > I'm still evaluating the two approaches... > One thing I should mention is, that there is still one place where opportunistic range info influence conde gen. I mean at least with my patch. That is the return value from sprintf is using the range info from the warning, and uses that to set the range info of the result. In try_substitute_return_value, which uses the range info that was from the warnings and feeds that into set_range_info. Bernd. > jeff > ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-10-13 9:07 ` Bernd Edlinger @ 2018-10-17 23:59 ` Jeff Law 2018-10-20 11:16 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-10-17 23:59 UTC (permalink / raw) To: Bernd Edlinger, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek On 10/12/18 9:34 PM, Bernd Edlinger wrote: > On 10/12/18 16:55, Jeff Law wrote: >> On 9/15/18 2:43 AM, Bernd Edlinger wrote: >>> Hi, >>> >>> this is an update on my strlen range patch (V7). Again re-based and >>> retested to current trunk. >>> >>> I am aware that Martin wants to re-factor the interface of get_range_strlen >>> and have no objections against, but I'd suggest that to be a follow-up patch. >>> >>> I might suggest to rename one of the two get_range_strlen functions at the >>> same time as it is rather confusing to have to count the parameters in order >>> to tell which function is meant. >>> >>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >>> Is it OK for trunk? >>> >>> >>> Thanks >>> Bernd. >>> >>> >>> changelog-range-strlen-v7.txt >>> >>> gcc: >>> 2018-08-26 Bernd Edlinger <bernd.edlinger@hotmail.de> >>> >>> * gimple-fold.c (looks_like_a_char_array_without_typecast_p): New >>> helper function for strlen range estimations. >>> (get_range_strlen): Use looks_like_a_char_array_without_typecast_p >>> for warnings, but use GIMPLE semantics otherwise. >>> * tree-ssa-strlen.c (maybe_set_strlen_range): Use GIMPLE semantics. >>> (get_min_string_length): Avoid not NUL terminated string literals. >> The introduction of looks_like_a_char_array_without_typecast_p is >> probably a good thing. Too much code is already implemented inline >> within get_range_strlen. >> >> It looks like you added handling of ARRAY_RANGE_REF. I don't know how >> often they come up in practice, but handling it seems like a reasonable >> extension to what we're doing. Bonus points if it's triggering with any >> kind of consistency. >> > > I did only want to be consistent with get_inner_reference here, > but did not have encountered these, probably only an Ada thing? Trying to be consistent with get_inner_reference is fine :-) GCC supports case ranges as an extension for C/C++. No clue if they're natively supported by Ada or any other langauge. > >> I actually prefer Martin's unification of type/fuzzy into a single >> enumeration to describe the desired behavior. Doing it with two args >> where some values are mutually exclusive is just asking for trouble. >> Though I like that you called out the values that are mutually exclusive. >> >> I definitely want to look at how your patch and Martin's differ on the >> handling of flexible array members -- clearly we must avoid setting a >> range in that case. I'm surprised this didn't trigger a failure in the >> testsuite though. Martin's work in this space did. >> >> The bugfix in get_min_string_length looks like it probably stands on its >> own. >> >> I'm still evaluating the two approaches... >> > > One thing I should mention is, that there is still one place where opportunistic > range info influence conde gen. I mean at least with my patch. ACK. That's soemthing Martin's patch does address. AT least it's supposed to. > > That is the return value from sprintf is using the range info from the > warning, and uses that to set the range info of the result. > In try_substitute_return_value, which uses the range info that was > from the warnings and feeds that into set_range_info. Right. In Martin's work we have enough range info to distinguish between the range info for warnings and the true range info and only use the latter in the call to set_range_info. jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-10-17 23:59 ` Jeff Law @ 2018-10-20 11:16 ` Bernd Edlinger 2018-11-16 17:26 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-10-20 11:16 UTC (permalink / raw) To: Jeff Law, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 5430 bytes --] On 10/17/18 11:56 PM, Jeff Law wrote: > On 10/12/18 9:34 PM, Bernd Edlinger wrote: >> On 10/12/18 16:55, Jeff Law wrote: >>> On 9/15/18 2:43 AM, Bernd Edlinger wrote: >>>> Hi, >>>> >>>> this is an update on my strlen range patch (V7). Again re-based and >>>> retested to current trunk. >>>> >>>> I am aware that Martin wants to re-factor the interface of get_range_strlen >>>> and have no objections against, but I'd suggest that to be a follow-up patch. >>>> >>>> I might suggest to rename one of the two get_range_strlen functions at the >>>> same time as it is rather confusing to have to count the parameters in order >>>> to tell which function is meant. >>>> >>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >>>> Is it OK for trunk? >>>> >>>> >>>> Thanks >>>> Bernd. >>>> >>>> >>>> changelog-range-strlen-v7.txt >>>> >>>> gcc: >>>> 2018-08-26 Bernd Edlinger <bernd.edlinger@hotmail.de> >>>> >>>> * gimple-fold.c (looks_like_a_char_array_without_typecast_p): New >>>> helper function for strlen range estimations. >>>> (get_range_strlen): Use looks_like_a_char_array_without_typecast_p >>>> for warnings, but use GIMPLE semantics otherwise. >>>> * tree-ssa-strlen.c (maybe_set_strlen_range): Use GIMPLE semantics. >>>> (get_min_string_length): Avoid not NUL terminated string literals. >>> The introduction of looks_like_a_char_array_without_typecast_p is >>> probably a good thing. Too much code is already implemented inline >>> within get_range_strlen. >>> >>> It looks like you added handling of ARRAY_RANGE_REF. I don't know how >>> often they come up in practice, but handling it seems like a reasonable >>> extension to what we're doing. Bonus points if it's triggering with any >>> kind of consistency. >>> >> >> I did only want to be consistent with get_inner_reference here, >> but did not have encountered these, probably only an Ada thing? > Trying to be consistent with get_inner_reference is fine :-) GCC > supports case ranges as an extension for C/C++. No clue if they're > natively supported by Ada or any other langauge. > > > >> >>> I actually prefer Martin's unification of type/fuzzy into a single >>> enumeration to describe the desired behavior. Doing it with two args >>> where some values are mutually exclusive is just asking for trouble. >>> Though I like that you called out the values that are mutually exclusive. >>> >>> I definitely want to look at how your patch and Martin's differ on the >>> handling of flexible array members -- clearly we must avoid setting a >>> range in that case. I'm surprised this didn't trigger a failure in the >>> testsuite though. Martin's work in this space did. >>> >>> The bugfix in get_min_string_length looks like it probably stands on its >>> own. >>> >>> I'm still evaluating the two approaches... >>> >> >> One thing I should mention is, that there is still one place where opportunistic >> range info influence conde gen. I mean at least with my patch. > ACK. That's soemthing Martin's patch does address. AT least it's > supposed to. Okay, based on my previous patch I can of course do the same. See attached. This was bootstrapped and reg-tested together with my previous patch. The only "regression" was pr79376.c, which is xfailed because the test case is expecting the return value to be in the limits given by in the opportunistic range info. While I think the strlen return optimization will be safe with this patch, I have however still a philosophical problem with it, because s[n]printf is a highly complex piece of software, and we take it away the right to return a failure code, when it has to because of an implementation bug. >> >> That is the return value from sprintf is using the range info from the >> warning, and uses that to set the range info of the result. >> In try_substitute_return_value, which uses the range info that was >> from the warnings and feeds that into set_range_info. > Right. In Martin's work we have enough range info to distinguish > between the range info for warnings and the true range info and only use > the latter in the call to set_range_info. > > Well I have tried the test cases from Martins patch, and all except one work fine for me, and pass with my patch-set as well. The problematic one is strlenopt-59.c (in his patch, my patch has picked the same name, unfortunately). The difference is how object declarations are handled. While my patch does not try to solve that problem at all, his patch does probably look at the declaration size to improve the strict limits. I am not totally against it, but do not feel any need to implement that feature in the same patch together with a function interface change, and a code-correctness fix. From the test case it looks like the globals are comdat objects, because there is no initialization. You can declare "char a3[3];" and "char a3[100];" in different translation units and it will be a3[100] at run-time. For me the red line here is basically, that the strlen optimization should _not_ be more aggressive than the loop-niter optimization, thus the lackmus test is, would the test case pass if strlen is implemented as: #define strlen(c) ({ __SIZE_TYPE__ _n; for(_n=0; (c)[_n]; _n++); _n; }) Well, it does not. But that should probably considered as a goal. Bernd. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: patch-sprintf-ranges.diff --] [-- Type: text/x-patch; name="patch-sprintf-ranges.diff", Size: 9882 bytes --] 2018-10-19 Bernd Edlinger <bernd.edlinger@hotmail.de> * gimple-ssa-sprintf.c (result_range::strict_max): New member. (format_result::operator++, format_result::operator+=): Remove. (fmtresult::adjust_for_width_or_precision, format_none, format_percent, format_integer, format_floating, format_character, format_string): Handle strict_max. (get_string_length): Compute strict_max. (format_directive): Accumulate strict_max. (is_call_safe): Use strict_max. testsuite: 2018-10-19 Bernd Edlinger <bernd.edlinger@hotmail.de> * gcc.dg/tree-ssa/pr79376.c: Add xfail. diff -Npur gcc/gimple-ssa-sprintf.c gcc/gimple-ssa-sprintf.c --- gcc/gimple-ssa-sprintf.c 2018-10-04 04:55:10.000000000 +0200 +++ gcc/gimple-ssa-sprintf.c 2018-10-19 14:55:44.971280820 +0200 @@ -191,6 +191,8 @@ struct result_range UNLIKELY == MAX. UNLIKELY is used to control the return value optimization but not in diagnostics. */ unsigned HOST_WIDE_INT unlikely; + /* Conservative maximum range value. */ + unsigned HOST_WIDE_INT strict_max; }; /* The result of a call to a formatted function. */ @@ -228,45 +230,8 @@ struct format_result avoid issuing duplicate warnings while finishing the processing of a call. WARNED also disables the return value optimization. */ bool warned; - - /* Preincrement the number of output characters by 1. */ - format_result& operator++ () - { - return *this += 1; - } - - /* Postincrement the number of output characters by 1. */ - format_result operator++ (int) - { - format_result prev (*this); - *this += 1; - return prev; - } - - /* Increment the number of output characters by N. */ - format_result& operator+= (unsigned HOST_WIDE_INT); }; -format_result& -format_result::operator+= (unsigned HOST_WIDE_INT n) -{ - gcc_assert (n < HOST_WIDE_INT_MAX); - - if (range.min < HOST_WIDE_INT_MAX) - range.min += n; - - if (range.max < HOST_WIDE_INT_MAX) - range.max += n; - - if (range.likely < HOST_WIDE_INT_MAX) - range.likely += n; - - if (range.unlikely < HOST_WIDE_INT_MAX) - range.unlikely += n; - - return *this; -} - /* Return the value of INT_MIN for the target. */ static inline HOST_WIDE_INT @@ -524,6 +489,7 @@ struct fmtresult range.max = min; range.likely = min; range.unlikely = min; + range.strict_max = HOST_WIDE_INT_M1U; } /* Construct a FMTRESULT object with MIN, MAX, and LIKELY counters. @@ -538,6 +504,7 @@ struct fmtresult range.max = max; range.likely = max < likely ? min : likely; range.unlikely = max; + range.strict_max = HOST_WIDE_INT_M1U; } /* Adjust result upward to reflect the RANGE of values the specified @@ -617,6 +584,8 @@ fmtresult::adjust_for_width_or_precision adjusted. Otherwise leave it at what it was before. */ knownrange = minadjusted; } + if (range.strict_max < (unsigned HOST_WIDE_INT)adjust[1]) + range.strict_max = adjust[1]; } if (warn_level > 1 && type) @@ -948,6 +917,7 @@ static fmtresult format_none (const directive &, tree, vr_values *) { fmtresult res (0); + res.range.strict_max = res.range.max; return res; } @@ -957,6 +927,7 @@ static fmtresult format_percent (const directive &, tree, vr_values *) { fmtresult res (1); + res.range.strict_max = res.range.max; return res; } @@ -1319,6 +1290,7 @@ format_integer (const directive &dir, tr } res.range.unlikely = res.range.max; + res.range.strict_max = res.range.max; /* Bump up the counters if WIDTH is greater than LEN. */ res.adjust_for_width_or_precision (dir.width, dirtype, base, @@ -1486,6 +1458,8 @@ format_integer (const directive &dir, tr } res.range.unlikely = res.range.max; + res.range.strict_max = res.range.max; + res.adjust_for_width_or_precision (dir.width, dirtype, base, (sign | maybebase) + (base == 16)); res.adjust_for_width_or_precision (dir.prec, dirtype, base, @@ -1796,6 +1770,8 @@ format_floating (const directive &dir, c return fmtresult (); } + res.range.strict_max = res.range.unlikely; + /* Bump up the byte counters if WIDTH is greater. */ res.adjust_for_width_or_precision (dir.width); return res; @@ -1907,6 +1883,8 @@ format_floating (const directive &dir, t such as inf/infinity (e.g., Solaris). */ res.knownrange = dir.known_width_and_precision (); + res.range.strict_max = res.range.unlikely; + /* Adjust the range for width but ignore precision. */ res.adjust_for_width_or_precision (dir.width); @@ -1991,6 +1969,8 @@ format_floating (const directive &dir, t res.range.unlikely += target_mb_len_max () - 1; } + res.range.strict_max = res.range.unlikely; + res.adjust_for_width_or_precision (dir.width); return res; } @@ -2014,6 +1994,7 @@ get_string_length (tree str, unsigned el we know its length. */ fmtresult res (tree_to_shwi (slen)); res.nonstr = NULL_TREE; + res.range.strict_max = res.range.max; return res; } else if (!slen @@ -2086,6 +2067,10 @@ get_string_length (tree str, unsigned el flexible array member, such as in struct S { char a[4]; }; */ res.range.unlikely = flexarray ? HOST_WIDE_INT_MAX : res.range.max; + if (!get_range_strlen (str, lenrange, eltsize, true) + && tree_fits_uhwi_p (lenrange[1])) + res.range.strict_max = tree_to_uhwi (lenrange[1]); + return res; } @@ -2130,7 +2115,7 @@ format_character (const directive &dir, /* A wide character in the ASCII range most likely results in a single byte, and only unlikely in up to MB_LEN_MAX. */ - res.range.max = one_2_one_ascii ? 1 : target_mb_len_max ();; + res.range.max = one_2_one_ascii ? 1 : target_mb_len_max (); res.range.likely = 1; res.range.unlikely = target_mb_len_max (); res.mayfail = !one_2_one_ascii; @@ -2164,6 +2149,8 @@ format_character (const directive &dir, res.knownrange = true; } + res.range.strict_max = res.range.unlikely; + /* Bump up the byte counters if WIDTH is greater. */ return res.adjust_for_width_or_precision (dir.width); } @@ -2209,6 +2196,11 @@ format_string (const directive &dir, tre is bounded by MB_LEN_MAX * wcslen (S). */ res.range.max *= target_mb_len_max (); res.range.unlikely = res.range.max; + if (res.range.strict_max < target_int_max () / target_mb_len_max ()) + res.range.strict_max *= target_mb_len_max (); + else + res.range.strict_max = HOST_WIDE_INT_M1U; + /* It's likely that the the total length is not more that 2 * wcslen (S).*/ res.range.likely = res.range.min * 2; @@ -2282,9 +2274,14 @@ format_string (const directive &dir, tre if (slen.range.likely < target_int_max ()) slen.range.likely *= 2; - if (slen.range.likely < target_int_max ()) + if (slen.range.unlikely < target_int_max ()) slen.range.unlikely *= target_mb_len_max (); + if (slen.range.strict_max < target_int_max () / target_mb_len_max ()) + slen.range.strict_max *= target_mb_len_max (); + else + slen.range.strict_max = HOST_WIDE_INT_M1U; + /* A non-empty wide character conversion may fail. */ if (slen.range.max > 0) res.mayfail = true; @@ -2355,6 +2352,8 @@ format_string (const directive &dir, tre of bytes on output isn't bounded by precision, set NONSTR. */ if (slen.nonstr && slen.range.min < (unsigned HOST_WIDE_INT)dir.prec[0]) res.nonstr = slen.nonstr; + if ((unsigned HOST_WIDE_INT)dir.prec[1] < res.range.strict_max) + res.range.strict_max = dir.prec[1]; /* Bump up the byte counters if WIDTH is greater. */ return res.adjust_for_width_or_precision (dir.width); @@ -2366,6 +2365,7 @@ static fmtresult format_plain (const directive &dir, tree, vr_values *) { fmtresult res (dir.len); + res.range.strict_max = res.range.max; return res; } @@ -2752,6 +2752,10 @@ format_directive (const sprintf_dom_walk /* Compute the range of lengths of the formatted output. */ fmtresult fmtres = dir.fmtfunc (dir, dir.arg, vr_values); + if(res->range.strict_max < HOST_WIDE_INT_M1U - fmtres.range.strict_max) + res->range.strict_max += fmtres.range.strict_max; + else + res->range.strict_max = HOST_WIDE_INT_M1U; /* Record whether the output of all directives is known to be bounded by some maximum, implying that their arguments are @@ -3554,11 +3558,8 @@ is_call_safe (const sprintf_dom_walker:: /* The minimum return value. */ retval[0] = res.range.min; - /* The maximum return value is in most cases bounded by RES.RANGE.MAX - but in cases involving multibyte characters could be as large as - RES.RANGE.UNLIKELY. */ - retval[1] - = res.range.unlikely < res.range.max ? res.range.max : res.range.unlikely; + /* The maximum return value. */ + retval[1] = res.range.strict_max; /* Adjust the number of bytes which includes the terminating nul to reflect the return value of the function which does not. diff -Npur gcc/testsuite/gcc.dg/tree-ssa/pr79376.c gcc/testsuite/gcc.dg/tree-ssa/pr79376.c --- gcc/testsuite/gcc.dg/tree-ssa/pr79376.c 2017-04-15 22:07:47.000000000 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/pr79376.c 2018-10-19 11:16:46.991725994 +0200 @@ -105,5 +105,5 @@ void test_string_and_array (int i, struc } } -/* { dg-final { scan-tree-dump-not "failure_on_line" "optimized"} } +/* { dg-final { scan-tree-dump-not "failure_on_line" "optimized" { xfail *-*-* } } } { dg-final { scan-tree-dump-times "keep_call_on_line" 21 "optimized"} } */ ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-10-20 11:16 ` Bernd Edlinger @ 2018-11-16 17:26 ` Bernd Edlinger 0 siblings, 0 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-11-16 17:26 UTC (permalink / raw) To: Jeff Law, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek Just a reminder: those are the two parts of this patch, which have been posted already a while ago when we were still in stage 1: https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00805.html https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01237.html Bernd. On 10/20/18 11:16 AM, Bernd Edlinger wrote: > On 10/17/18 11:56 PM, Jeff Law wrote: >> On 10/12/18 9:34 PM, Bernd Edlinger wrote: >>> On 10/12/18 16:55, Jeff Law wrote: >>>> On 9/15/18 2:43 AM, Bernd Edlinger wrote: >>>>> Hi, >>>>> >>>>> this is an update on my strlen range patch (V7). Again re-based and >>>>> retested to current trunk. >>>>> >>>>> I am aware that Martin wants to re-factor the interface of get_range_strlen >>>>> and have no objections against, but I'd suggest that to be a follow-up patch. >>>>> >>>>> I might suggest to rename one of the two get_range_strlen functions at the >>>>> same time as it is rather confusing to have to count the parameters in order >>>>> to tell which function is meant. >>>>> >>>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >>>>> Is it OK for trunk? >>>>> >>>>> >>>>> Thanks >>>>> Bernd. >>>>> >>>>> >>>>> changelog-range-strlen-v7.txt >>>>> >>>>> gcc: >>>>> 2018-08-26 Bernd Edlinger <bernd.edlinger@hotmail.de> >>>>> >>>>> * gimple-fold.c (looks_like_a_char_array_without_typecast_p): New >>>>> helper function for strlen range estimations. >>>>> (get_range_strlen): Use looks_like_a_char_array_without_typecast_p >>>>> for warnings, but use GIMPLE semantics otherwise. >>>>> * tree-ssa-strlen.c (maybe_set_strlen_range): Use GIMPLE semantics. >>>>> (get_min_string_length): Avoid not NUL terminated string literals. >>>> The introduction of looks_like_a_char_array_without_typecast_p is >>>> probably a good thing. Too much code is already implemented inline >>>> within get_range_strlen. >>>> >>>> It looks like you added handling of ARRAY_RANGE_REF. I don't know how >>>> often they come up in practice, but handling it seems like a reasonable >>>> extension to what we're doing. Bonus points if it's triggering with any >>>> kind of consistency. >>>> >>> >>> I did only want to be consistent with get_inner_reference here, >>> but did not have encountered these, probably only an Ada thing? >> Trying to be consistent with get_inner_reference is fine :-) GCC >> supports case ranges as an extension for C/C++. No clue if they're >> natively supported by Ada or any other langauge. >> >> >> >>> >>>> I actually prefer Martin's unification of type/fuzzy into a single >>>> enumeration to describe the desired behavior. Doing it with two args >>>> where some values are mutually exclusive is just asking for trouble. >>>> Though I like that you called out the values that are mutually exclusive. >>>> >>>> I definitely want to look at how your patch and Martin's differ on the >>>> handling of flexible array members -- clearly we must avoid setting a >>>> range in that case. I'm surprised this didn't trigger a failure in the >>>> testsuite though. Martin's work in this space did. >>>> >>>> The bugfix in get_min_string_length looks like it probably stands on its >>>> own. >>>> >>>> I'm still evaluating the two approaches... >>>> >>> >>> One thing I should mention is, that there is still one place where opportunistic >>> range info influence conde gen. I mean at least with my patch. >> ACK. That's soemthing Martin's patch does address. AT least it's >> supposed to. > > Okay, based on my previous patch I can of course do the same. > > See attached. This was bootstrapped and reg-tested together with my > previous patch. The only "regression" was pr79376.c, which is xfailed > because the test case is expecting the return value to be in the limits given > by in the opportunistic range info. > > While I think the strlen return optimization will be safe with this patch, > I have however still a philosophical problem with it, because s[n]printf > is a highly complex piece of software, and we take it away the right > to return a failure code, when it has to because of an implementation bug. > >>> >>> That is the return value from sprintf is using the range info from the >>> warning, and uses that to set the range info of the result. >>> In try_substitute_return_value, which uses the range info that was >>> from the warnings and feeds that into set_range_info. >> Right. In Martin's work we have enough range info to distinguish >> between the range info for warnings and the true range info and only use >> the latter in the call to set_range_info. >> >> > > Well I have tried the test cases from Martins patch, and all except one > work fine for me, and pass with my patch-set as well. > > The problematic one is strlenopt-59.c (in his patch, my patch has picked > the same name, unfortunately). > > The difference is how object declarations are handled. While my patch > does not try to solve that problem at all, his patch does probably look > at the declaration size to improve the strict limits. > > I am not totally against it, but do not feel any need to implement that > feature in the same patch together with a function interface change, and > a code-correctness fix. > > From the test case it looks like the globals are comdat objects, because there > is no initialization. You can declare "char a3[3];" and "char a3[100];" in > different translation units and it will be a3[100] at run-time. > > For me the red line here is basically, that the strlen optimization should > _not_ be more aggressive than the loop-niter optimization, thus the lackmus > test is, would the test case pass if strlen is implemented as: > > #define strlen(c) ({ __SIZE_TYPE__ _n; for(_n=0; (c)[_n]; _n++); _n; }) > > Well, it does not. But that should probably considered as a goal. > > > > Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-21 22:43 ` Jeff Law 2018-08-22 4:16 ` Bernd Edlinger @ 2018-08-22 13:10 ` Bernd Edlinger 2018-10-24 9:14 ` Maxim Kuvyrkov 2 siblings, 0 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-08-22 13:10 UTC (permalink / raw) To: Jeff Law, Martin Sebor, Richard Biener; +Cc: GCC Patches, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 420 bytes --] Hi, this is an update (v5) of my patch: As discussed earlier, this version does no longer enable -fassume-zero-terminated-char-arrays with -Ofast. I am ready to remove the -fassume-zero-terminated-char-arrays altogether if we decide what to do with the code-gen test cases that still use it (xfail or remove). Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd. [-- Attachment #2: changelog-range-strlen-v5.txt --] [-- Type: text/plain, Size: 1097 bytes --] gcc: 2018-08-21 Bernd Edlinger <bernd.edlinger@hotmail.de> * common.opt: Add new optimization option -fassume-zero-terminated-char-arrays. * gimple-fold.h (looks_like_a_char_array_without_typecast_p): Declare. * gimple-fold.c (looks_like_a_char_array_without_typecast_p): Helper function for strlen range estimations. (get_range_strlen): Use looks_like_a_char_array_without_typecast_p. * tree-ssa-strlen.c (maybe_set_strlen_range): Likewise. (get_min_string_length): Avoid not NUL terminated string literals. * doc/invoke.texi: Document -fassume-zero-terminated-char-arrays. * tree-ssa-dse.c (compute_trims): Avoid folding away undefined behaviour. testsuite: 2018-08-32 Bernd Edlinger <bernd.edlinger@hotmail.de> * gcc.dg/pr83373.c: Add xfail. * gcc.dg/strlenopt-36.c: Adjust test expectations. * gcc.dg/strlenopt-40.c: Likewise. * gcc.dg/strlenopt-45.c: Likewise. * gcc.dg/strlenopt-48.c: Likewise. * gcc.dg/strlenopt-51.c: Likewise. * gcc.dg/strlenopt-57.c: New test. * gcc.dg/strlenopt-58.c: New test. * gcc.dg/strlenopt-59.c: New test. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: patch-range-strlen-v5.diff --] [-- Type: text/x-patch; name="patch-range-strlen-v5.diff", Size: 25524 bytes --] diff -Npur gcc/common.opt gcc/common.opt --- gcc/common.opt 2018-08-19 17:11:34.000000000 +0200 +++ gcc/common.opt 2018-08-22 09:04:53.520305828 +0200 @@ -1025,6 +1025,10 @@ fsanitize-undefined-trap-on-error Common Driver Report Var(flag_sanitize_undefined_trap_on_error) Init(0) Use trap instead of a library function for undefined behavior sanitization. +fassume-zero-terminated-char-arrays +Common Var(flag_assume_zero_terminated_char_arrays) Optimization Init(0) +Optimize under the assumption that char arrays must always be zero terminated. + fasynchronous-unwind-tables Common Report Var(flag_asynchronous_unwind_tables) Optimization Generate unwind tables that are exact at each instruction boundary. diff -Npur gcc/doc/invoke.texi gcc/doc/invoke.texi --- gcc/doc/invoke.texi 2018-08-21 10:13:34.000000000 +0200 +++ gcc/doc/invoke.texi 2018-08-22 09:06:18.645102845 +0200 @@ -388,7 +388,8 @@ Objective-C and Objective-C++ Dialects}. -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol --fassociative-math -fauto-profile -fauto-profile[=@var{path}] @gol +-fassociative-math -fassume-zero-terminated-char-arrays @gol +-fauto-profile -fauto-profile[=@var{path}] @gol -fauto-inc-dec -fbranch-probabilities @gol -fbranch-target-load-optimize -fbranch-target-load-optimize2 @gol -fbtr-bb-exclusive -fcaller-saves @gol @@ -9978,6 +9979,16 @@ is automatically enabled when both @opti The default is @option{-fno-associative-math}. +@item -fassume-zero-terminated-char-arrays +@opindex fassume-zero-terminated-char-arrays + +Optimize under the assumption that char arrays must always be zero +terminated. This may have an effect on code that uses strlen to +check the string length, for instance in assertions. Under certain +conditions such checks can be optimized away. + +The default is @option{-fno-assume-zero-terminated-char-arrays}. + @item -freciprocal-math @opindex freciprocal-math diff -Npur gcc/gimple-fold.c gcc/gimple-fold.c --- gcc/gimple-fold.c 2018-08-19 17:11:34.000000000 +0200 +++ gcc/gimple-fold.c 2018-08-22 09:04:53.741302702 +0200 @@ -1257,6 +1257,45 @@ gimple_fold_builtin_memset (gimple_stmt_ return true; } +/* Determine if a char array is suitable for strlen range estimations. + Return false if ARG is not a char array, or if the inner reference + chain appears to go through a type cast, or if !optimistic, + or if !flag_assume_zero_terminated_char_arrays. + Otherwise return true. + Note that type gimple type informations are not 100% guaranteed + to be accurate. + OPTIMISTIC is true when the result is used for warnings only. */ + +bool +looks_like_a_char_array_without_typecast_p (tree arg, bool optimistic) +{ + if (!flag_assume_zero_terminated_char_arrays && !optimistic) + return false; + + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (arg)) != ARRAY_TYPE + || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != INTEGER_TYPE + || TYPE_MODE (TREE_TYPE (TREE_TYPE (arg))) != TYPE_MODE (char_type_node) + || TYPE_PRECISION (TREE_TYPE (TREE_TYPE (arg))) + != TYPE_PRECISION (char_type_node)) + return false; + + tree base = arg; + while (TREE_CODE (base) == ARRAY_REF + || TREE_CODE (base) == ARRAY_RANGE_REF + || TREE_CODE (base) == COMPONENT_REF) + base = TREE_OPERAND (base, 0); + + /* If this looks like a type cast don't assume anything. */ + if ((TREE_CODE (base) == MEM_REF + && (! integer_zerop (TREE_OPERAND (base, 1)) + || TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0)))) + != TYPE_MAIN_VARIANT (TREE_TYPE (base)))) + || handled_component_p (base)) + return false; + + return true; +} /* Obtain the minimum and maximum string length or minimum and maximum value of ARG in LENGTH[0] and LENGTH[1], respectively. @@ -1272,6 +1311,7 @@ gimple_fold_builtin_memset (gimple_stmt_ PHIs and COND_EXPRs optimistically, if we can determine string length minimum and maximum, it will use the minimum from the ones where it can be determined. + TYPE == 2 and FUZZY != 0 cannot be used together. Set *FLEXP to true if the range of the string lengths has been obtained from the upper bound of an array at the end of a struct. Such an array may hold a string that's longer than its upper bound @@ -1312,8 +1352,8 @@ get_range_strlen (tree arg, tree length[ member. */ tree idx = TREE_OPERAND (op, 1); - arg = TREE_OPERAND (op, 0); - tree optype = TREE_TYPE (arg); + op = TREE_OPERAND (op, 0); + tree optype = TREE_TYPE (op); if (tree dom = TYPE_DOMAIN (optype)) if (tree bound = TYPE_MAX_VALUE (dom)) if (TREE_CODE (bound) == INTEGER_CST @@ -1339,23 +1379,21 @@ get_range_strlen (tree arg, tree length[ return get_range_strlen (TREE_OPERAND (arg, 0), length, visited, type, fuzzy, flexp, eltsize); + if (eltsize != 1) + return false; + if (TREE_CODE (arg) == ARRAY_REF) { - tree type = TREE_TYPE (TREE_OPERAND (arg, 0)); - - /* Determine the "innermost" array type. */ - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); - - /* Avoid arrays of pointers. */ - tree eltype = TREE_TYPE (type); - if (TREE_CODE (type) != ARRAY_TYPE - || !INTEGRAL_TYPE_P (eltype)) + if (!looks_like_a_char_array_without_typecast_p (arg, fuzzy == 2)) return false; + tree type = TREE_TYPE (arg); + + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); - if (!val || integer_zerop (val)) + if (!val + || TREE_CODE (val) != INTEGER_CST + || integer_zerop (val)) return false; val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, @@ -1364,15 +1402,16 @@ get_range_strlen (tree arg, tree length[ the array could have zero length. */ *minlen = ssize_int (0); - if (TREE_CODE (TREE_OPERAND (arg, 0)) == COMPONENT_REF - && type == TREE_TYPE (TREE_OPERAND (arg, 0)) - && array_at_struct_end_p (TREE_OPERAND (arg, 0))) + if (TREE_CODE (arg) == COMPONENT_REF + && type == TREE_TYPE (arg) + && array_at_struct_end_p (arg)) *flexp = true; } - else if (TREE_CODE (arg) == COMPONENT_REF - && (TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 1))) - == ARRAY_TYPE)) + else if (TREE_CODE (arg) == COMPONENT_REF) { + if (!looks_like_a_char_array_without_typecast_p (arg, fuzzy == 2)) + return false; + /* Use the type of the member array to determine the upper bound on the length of the array. This may be overly optimistic if the array itself isn't NUL-terminated and @@ -1388,22 +1427,21 @@ get_range_strlen (tree arg, tree length[ tree type = TREE_TYPE (arg); - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); - /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); - if (!val || integer_zerop (val)) + if (!val + || TREE_CODE (val) != INTEGER_CST + || integer_zerop (val)) return false; + val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, integer_one_node); /* Set the minimum size to zero since the string in the array could have zero length. */ *minlen = ssize_int (0); } - - if (VAR_P (arg)) + else if (VAR_P (arg) + && (flag_assume_zero_terminated_char_arrays || fuzzy == 2)) { tree type = TREE_TYPE (arg); if (POINTER_TYPE_P (type)) @@ -1411,13 +1449,23 @@ get_range_strlen (tree arg, tree length[ if (TREE_CODE (type) == ARRAY_TYPE) { + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE + || TYPE_MODE (TREE_TYPE (type)) + != TYPE_MODE (char_type_node) + || TYPE_PRECISION (TREE_TYPE (type)) + != TYPE_PRECISION (char_type_node)) + return false; + + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop (val)) return false; - val = wide_int_to_tree (TREE_TYPE (val), - wi::sub (wi::to_wide (val), 1)); + + val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, + integer_one_node); /* Set the minimum size to zero since the string in the array could have zero length. */ *minlen = ssize_int (0); @@ -1550,6 +1598,7 @@ get_range_strlen (tree arg, tree length[ if we can determine string length minimum and maximum; it will use the minimum from the ones where it can be determined. STRICT false should be only used for warning code. + STRICT is by default false. ELTSIZE is 1 for normal single byte character strings, and 2 or 4 for wide characer strings. ELTSIZE is by default 1. */ diff -Npur gcc/gimple-fold.h gcc/gimple-fold.h --- gcc/gimple-fold.h 2018-08-19 17:11:34.000000000 +0200 +++ gcc/gimple-fold.h 2018-08-22 09:04:53.741302702 +0200 @@ -61,6 +61,7 @@ extern bool gimple_fold_builtin_snprintf extern bool arith_code_with_undefined_signed_overflow (tree_code); extern gimple_seq rewrite_to_defined_overflow (gimple *); extern void replace_call_with_value (gimple_stmt_iterator *, tree); +extern bool looks_like_a_char_array_without_typecast_p (tree, bool); /* gimple_build, functionally matching fold_buildN, outputs stmts int the provided sequence, matching and simplifying them on-the-fly. diff -Npur gcc/testsuite/gcc.dg/pr83373.c gcc/testsuite/gcc.dg/pr83373.c --- gcc/testsuite/gcc.dg/pr83373.c 2018-08-19 17:11:34.000000000 +0200 +++ gcc/testsuite/gcc.dg/pr83373.c 2018-08-22 11:48:17.312080785 +0200 @@ -16,7 +16,7 @@ inline char* my_strcpy (char* dst, const __builtin_memcpy (dst, src, len + 1); else { - __builtin_memcpy (dst, src, size - 1); /* { dg-bogus "\\\[-Wstringop-oveflow]" } */ + __builtin_memcpy (dst, src, size - 1); /* { dg-bogus "\\\[-Wstringop-overflow=]" "" { xfail *-*-* } } */ dst[size - 1] = '\0'; } diff -Npur gcc/testsuite/gcc.dg/strlenopt-36.c gcc/testsuite/gcc.dg/strlenopt-36.c --- gcc/testsuite/gcc.dg/strlenopt-36.c 2018-08-19 17:11:34.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-36.c 2018-08-22 09:04:53.742302688 +0200 @@ -1,7 +1,7 @@ /* PR tree-optimization/78450 - strlen(s) return value can be assumed to be less than the size of s { dg-do compile } - { dg-options "-O2 -fdump-tree-optimized" } */ + { dg-options "-O2 -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" diff -Npur gcc/testsuite/gcc.dg/strlenopt-40.c gcc/testsuite/gcc.dg/strlenopt-40.c --- gcc/testsuite/gcc.dg/strlenopt-40.c 2018-08-19 17:11:34.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-40.c 2018-08-22 09:04:53.742302688 +0200 @@ -1,7 +1,7 @@ /* PR tree-optimization/83671 - fix for false positive reported by -Wstringop-overflow does not work with inlining { dg-do compile } - { dg-options "-O1 -fdump-tree-optimized" } */ + { dg-options "-O1 -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" @@ -219,10 +219,15 @@ void elim_member_arrays_ptr (struct MemA ELIM_TRUE (strlen (ma0->a5_7[0]) < 7); ELIM_TRUE (strlen (ma0[0].a5_7[0]) < 7); +#if 0 + /* This is transformed into strlen ((const char *) &(ma0 + 64)->a5_7[0]) + which looks like a type cast and fails the check in + looks_like_a_char_array_without_typecast_p. */ ELIM_TRUE (strlen (ma0[1].a5_7[0]) < 7); ELIM_TRUE (strlen (ma0[1].a5_7[4]) < 7); ELIM_TRUE (strlen (ma0[9].a5_7[0]) < 7); ELIM_TRUE (strlen (ma0[9].a5_7[4]) < 7); +#endif ELIM_TRUE (strlen (ma0->a3) < sizeof ma0->a3); ELIM_TRUE (strlen (ma0->a5) < sizeof ma0->a5); diff -Npur gcc/testsuite/gcc.dg/strlenopt-45.c gcc/testsuite/gcc.dg/strlenopt-45.c --- gcc/testsuite/gcc.dg/strlenopt-45.c 2018-08-19 17:11:34.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-45.c 2018-08-22 09:04:53.767302335 +0200 @@ -2,7 +2,7 @@ Test to verify that strnlen built-in expansion works correctly in the absence of tree strlen optimization. { dg-do compile } - { dg-options "-O2 -Wall -fdump-tree-optimized" } */ + { dg-options "-O2 -Wall -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" @@ -43,7 +43,6 @@ extern size_t strnlen (const char *, siz else \ FAIL (made_in_false_branch) -extern char c; extern char a1[1]; extern char a3[3]; extern char a5[5]; @@ -52,18 +51,6 @@ extern char ax[]; void elim_strnlen_arr_cst (void) { - /* The length of a string stored in a one-element array must be zero. - The result reported by strnlen() for such an array can be non-zero - only when the bound is equal to 1 (in which case the result must - be one). */ - ELIM (strnlen (&c, 0) == 0); - ELIM (strnlen (&c, 1) < 2); - ELIM (strnlen (&c, 2) == 0); - ELIM (strnlen (&c, 9) == 0); - ELIM (strnlen (&c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&c, SIZE_MAX) == 0); - ELIM (strnlen (&c, -1) == 0); - ELIM (strnlen (a1, 0) == 0); ELIM (strnlen (a1, 1) < 2); ELIM (strnlen (a1, 2) == 0); @@ -99,17 +86,18 @@ void elim_strnlen_arr_cst (void) ELIM (strnlen (a3_7[2], SIZE_MAX) < 8); ELIM (strnlen (a3_7[2], -1) < 8); - ELIM (strnlen ((char*)a3_7, 0) == 0); - ELIM (strnlen ((char*)a3_7, 1) < 2); - ELIM (strnlen ((char*)a3_7, 2) < 3); - ELIM (strnlen ((char*)a3_7, 3) < 4); - ELIM (strnlen ((char*)a3_7, 9) < 10); - ELIM (strnlen ((char*)a3_7, 19) < 20); - ELIM (strnlen ((char*)a3_7, 21) < 22); - ELIM (strnlen ((char*)a3_7, 23) < 22); - ELIM (strnlen ((char*)a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)a3_7, -1) < 22); + ELIM (strnlen ((char*)a3_7[0], 0) == 0); + ELIM (strnlen ((char*)a3_7[0], 1) < 2); + ELIM (strnlen ((char*)a3_7[0], 2) < 3); + ELIM (strnlen ((char*)a3_7[0], 3) < 4); + ELIM (strnlen ((char*)a3_7[0], 7) < 8); + ELIM (strnlen ((char*)a3_7[0], 9) < 7); + ELIM (strnlen ((char*)a3_7[0], 19) < 7); + ELIM (strnlen ((char*)a3_7[0], 21) < 7); + ELIM (strnlen ((char*)a3_7[0], 23) < 7); + ELIM (strnlen ((char*)a3_7[0], PTRDIFF_MAX) < 7); + ELIM (strnlen ((char*)a3_7[0], SIZE_MAX) < 7); + ELIM (strnlen ((char*)a3_7[0], -1) < 7); ELIM (strnlen (ax, 0) == 0); ELIM (strnlen (ax, 1) < 2); @@ -122,7 +110,6 @@ void elim_strnlen_arr_cst (void) struct MemArrays { - char c; char a0[0]; char a1[1]; char a3[3]; @@ -133,13 +120,6 @@ struct MemArrays void elim_strnlen_memarr_cst (struct MemArrays *p, int i) { - ELIM (strnlen (&p->c, 0) == 0); - ELIM (strnlen (&p->c, 1) < 2); - ELIM (strnlen (&p->c, 9) == 0); - ELIM (strnlen (&p->c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&p->c, SIZE_MAX) == 0); - ELIM (strnlen (&p->c, -1) == 0); - /* Other accesses to internal zero-length arrays are undefined. */ ELIM (strnlen (p->a0, 0) == 0); @@ -154,19 +134,19 @@ void elim_strnlen_memarr_cst (struct Mem ELIM (strnlen (p->a3, 1) < 2); ELIM (strnlen (p->a3, 2) < 3); ELIM (strnlen (p->a3, 3) < 4); - ELIM (strnlen (p->a3, 9) < 4); - ELIM (strnlen (p->a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p->a3, SIZE_MAX) < 4); - ELIM (strnlen (p->a3, -1) < 4); + ELIM (strnlen (p->a3, 9) < 3); + ELIM (strnlen (p->a3, PTRDIFF_MAX) < 3); + ELIM (strnlen (p->a3, SIZE_MAX) < 3); + ELIM (strnlen (p->a3, -1) < 3); ELIM (strnlen (p[i].a3, 0) == 0); ELIM (strnlen (p[i].a3, 1) < 2); ELIM (strnlen (p[i].a3, 2) < 3); ELIM (strnlen (p[i].a3, 3) < 4); - ELIM (strnlen (p[i].a3, 9) < 4); - ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p[i].a3, SIZE_MAX) < 4); - ELIM (strnlen (p[i].a3, -1) < 4); + ELIM (strnlen (p[i].a3, 9) < 3); + ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 3); + ELIM (strnlen (p[i].a3, SIZE_MAX) < 3); + ELIM (strnlen (p[i].a3, -1) < 3); ELIM (strnlen (p->a3_7[0], 0) == 0); ELIM (strnlen (p->a3_7[0], 1) < 2); @@ -203,17 +183,18 @@ void elim_strnlen_memarr_cst (struct Mem ELIM (strnlen (p->a3_7[i], 19) < 20); #endif - ELIM (strnlen ((char*)p->a3_7, 0) == 0); - ELIM (strnlen ((char*)p->a3_7, 1) < 2); - ELIM (strnlen ((char*)p->a3_7, 2) < 3); - ELIM (strnlen ((char*)p->a3_7, 3) < 4); - ELIM (strnlen ((char*)p->a3_7, 9) < 10); - ELIM (strnlen ((char*)p->a3_7, 19) < 20); - ELIM (strnlen ((char*)p->a3_7, 21) < 22); - ELIM (strnlen ((char*)p->a3_7, 23) < 22); - ELIM (strnlen ((char*)p->a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, -1) < 22); + ELIM (strnlen ((char*)p->a3_7[0], 0) == 0); + ELIM (strnlen ((char*)p->a3_7[0], 1) < 2); + ELIM (strnlen ((char*)p->a3_7[0], 2) < 3); + ELIM (strnlen ((char*)p->a3_7[0], 3) < 4); + ELIM (strnlen ((char*)p->a3_7[0], 7) < 8); + ELIM (strnlen ((char*)p->a3_7[0], 9) < 7); + ELIM (strnlen ((char*)p->a3_7[0], 19) < 7); + ELIM (strnlen ((char*)p->a3_7[0], 21) < 7); + ELIM (strnlen ((char*)p->a3_7[0], 23) < 7); + ELIM (strnlen ((char*)p->a3_7[0], PTRDIFF_MAX) < 7); + ELIM (strnlen ((char*)p->a3_7[0], SIZE_MAX) < 7); + ELIM (strnlen ((char*)p->a3_7[0], -1) < 7); ELIM (strnlen (p->ax, 0) == 0); ELIM (strnlen (p->ax, 1) < 2); @@ -290,9 +271,6 @@ void elim_strnlen_range (char *s) void keep_strnlen_arr_cst (void) { - KEEP (strnlen (&c, 1) == 0); - KEEP (strnlen (&c, 1) == 1); - KEEP (strnlen (a1, 1) == 0); KEEP (strnlen (a1, 1) == 1); @@ -301,16 +279,12 @@ void keep_strnlen_arr_cst (void) struct FlexArrays { - char c; char a0[0]; /* Access to internal zero-length arrays are undefined. */ char a1[1]; }; void keep_strnlen_memarr_cst (struct FlexArrays *p) { - KEEP (strnlen (&p->c, 1) == 0); - KEEP (strnlen (&p->c, 1) == 1); - #if 0 /* Accesses to internal zero-length arrays are undefined so avoid exercising them. */ @@ -331,5 +305,5 @@ void keep_strnlen_memarr_cst (struct Fle /* { dg-final { scan-tree-dump-times "call_in_true_branch_not_eliminated_" 0 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 13 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 13 "optimized" } } */ + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 9 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 9 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-48.c gcc/testsuite/gcc.dg/strlenopt-48.c --- gcc/testsuite/gcc.dg/strlenopt-48.c 2018-08-19 17:11:34.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-48.c 2018-08-22 09:04:53.767302335 +0200 @@ -3,7 +3,7 @@ Verify that strlen() calls with one-character array elements of multidimensional arrays are still folded. { dg-do compile } - { dg-options "-O2 -Wall -fdump-tree-optimized" } */ + { dg-options "-O2 -Wall -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" diff -Npur gcc/testsuite/gcc.dg/strlenopt-51.c gcc/testsuite/gcc.dg/strlenopt-51.c --- gcc/testsuite/gcc.dg/strlenopt-51.c 2018-08-19 17:11:34.000000000 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-51.c 2018-08-22 09:04:53.768302320 +0200 @@ -101,7 +101,7 @@ void test_keep_a9_9 (int i) { #undef T #define T(I) \ - KEEP (strlen (&a9_9[i][I][0]) > (1 + I) % 9); \ + KEEP (strlen (&a9_9[i][I][0]) > (0 + I) % 9); \ KEEP (strlen (&a9_9[i][I][1]) > (1 + I) % 9); \ KEEP (strlen (&a9_9[i][I][2]) > (2 + I) % 9); \ KEEP (strlen (&a9_9[i][I][3]) > (3 + I) % 9); \ @@ -115,7 +115,7 @@ void test_keep_a9_9 (int i) } /* { dg-final { scan-tree-dump-times "strlen" 72 "gimple" } } - { dg-final { scan-tree-dump-times "strlen" 63 "optimized" } } + { dg-final { scan-tree-dump-times "strlen" 72 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 72 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-57.c gcc/testsuite/gcc.dg/strlenopt-57.c --- gcc/testsuite/gcc.dg/strlenopt-57.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.dg/strlenopt-57.c 2018-08-22 09:04:53.768302320 +0200 @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +#define assert(x) do { if (!(x)) __builtin_abort (); } while (0) +extern int system (const char *); +static int fun (char *p) +{ + char buf[16]; + + assert (__builtin_strlen (p) < 4); + + __builtin_sprintf (buf, "echo %s - %s", p, p); + return system (buf); +} + +void test (void) +{ + char b[2] = "ab"; + fun (b); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_sprintf" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "system" 1 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-58.c gcc/testsuite/gcc.dg/strlenopt-58.c --- gcc/testsuite/gcc.dg/strlenopt-58.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.dg/strlenopt-58.c 2018-08-22 09:10:24.485637281 +0200 @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ + +typedef char A[6]; +typedef char B[2][3]; + +A a; + +void test (void) +{ + B* b = (B*) a; + if (__builtin_strlen ((*b)[0]) > 2) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "optimized" } } */ diff -Npur gcc/testsuite/gcc.dg/strlenopt-59.c gcc/testsuite/gcc.dg/strlenopt-59.c --- gcc/testsuite/gcc.dg/strlenopt-59.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.dg/strlenopt-59.c 2018-08-22 09:11:03.197092493 +0200 @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ + +typedef char B[2][3]; + +B b; + +void test (void) +{ + if (__builtin_strlen (b[0]) > 2) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump-not "__builtin_strlen" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "__builtin_abort" "optimized" } } */ diff -Npur gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c --- gcc/tree-ssa-dse.c 2018-08-19 17:11:34.000000000 +0200 +++ gcc/tree-ssa-dse.c 2018-08-22 09:04:53.768302320 +0200 @@ -248,6 +248,12 @@ compute_trims (ao_ref *ref, sbitmap live residual handling in mem* and str* functions is usually reasonably efficient. */ *trim_tail = last_orig - last_live; + /* Don't fold away an out of bounds access, as this defeats proper + warnings. */ + if (*trim_tail + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (ref->base)), + last_orig) <= 0) + *trim_tail = 0; } else *trim_tail = 0; diff -Npur gcc/tree-ssa-strlen.c gcc/tree-ssa-strlen.c --- gcc/tree-ssa-strlen.c 2018-08-21 10:51:08.000000000 +0200 +++ gcc/tree-ssa-strlen.c 2018-08-22 09:04:53.786302066 +0200 @@ -1156,11 +1156,13 @@ maybe_set_strlen_range (tree lhs, tree s if (TREE_CODE (src) == ADDR_EXPR) { + src = TREE_OPERAND (src, 0); + + if (!looks_like_a_char_array_without_typecast_p (src, false)) + ; /* The last array member of a struct can be bigger than its size suggests if it's treated as a poor-man's flexible array member. */ - src = TREE_OPERAND (src, 0); - bool src_is_array = TREE_CODE (TREE_TYPE (src)) == ARRAY_TYPE; - if (src_is_array && !array_at_struct_end_p (src)) + else if (!array_at_struct_end_p (src)) { tree type = TREE_TYPE (src); if (tree size = TYPE_SIZE_UNIT (type)) @@ -1177,8 +1179,6 @@ maybe_set_strlen_range (tree lhs, tree s } else { - if (TREE_CODE (src) == COMPONENT_REF && !src_is_array) - src = TREE_OPERAND (src, 1); if (DECL_P (src)) { /* Handle the unlikely case of strlen (&c) where c is some @@ -3192,7 +3192,9 @@ get_min_string_length (tree rhs, bool *f && TREE_READONLY (rhs)) rhs = DECL_INITIAL (rhs); - if (rhs && TREE_CODE (rhs) == STRING_CST) + if (rhs && TREE_CODE (rhs) == STRING_CST + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)), + TREE_STRING_LENGTH (rhs)) >= 0) { *full_string_p = true; return strlen (TREE_STRING_POINTER (rhs)); ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-21 22:43 ` Jeff Law 2018-08-22 4:16 ` Bernd Edlinger 2018-08-22 13:10 ` Bernd Edlinger @ 2018-10-24 9:14 ` Maxim Kuvyrkov 2018-10-24 13:38 ` Bernd Edlinger 2 siblings, 1 reply; 121+ messages in thread From: Maxim Kuvyrkov @ 2018-10-24 9:14 UTC (permalink / raw) To: Jeff Law, Bernd Edlinger Cc: Martin Sebor, Richard Guenther, GCC Patches, Jakub Jelinek, Ramana Radhakrishnan, Kugan Vivekanandarajah Hi Jeff, Hi Bernd, This change (git commit d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb / svn rev. 263793) causes a segfault when build Linux kernel for AArch64. The exact configuration is === git_repo[linux]=https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git git_branch[linux]=linux-4.14.y git_repo[gcc]=git://gcc.gnu.org/git/gcc.git git_branch[gcc]=master linux_config=allmodconfig === The bisection artifacts point at this exact commit: Parent commit 0584c3707994997f5dc9fa79732d01a53c25db6a can build 18083 objects. This commit d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb can build 18076 objects from the same linux tree. The relevant error (see [1]): == during GIMPLE pass: dse drivers/md/dm-mpath.c: In function 'multipath_init_per_bio_data': drivers/md/dm-mpath.c:2032:1: internal compiler error: Segmentation fault == Full bisection artifacts are at [2]. Bernd, would you please investigate? IMO, this should be easy to reproduce from the bisection logs, but let me know if it's not straightforward. Best ping on IRC (I'm maximk) or follow up here. FYI, there is another regression (either caused or unmasked by Kugan's gcc commit b88c25691cf8b153db44108935db871e1d40db89), but it appears orthogonal to this one. [1] https://ci.linaro.org/view/tcwg_kernel-gnu/job/tcwg_kernel-bisect-gnu-master-aarch64-lts-allmodconfig/8/artifact/artifacts/build-d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb/5-count_linux_objs/console.log/*view*/ [2] https://ci.linaro.org/view/tcwg_kernel-gnu/job/tcwg_kernel-bisect-gnu-master-aarch64-lts-allmodconfig/8/artifact/artifacts/ Regards, -- Maxim Kuvyrkov www.linaro.org > On Aug 22, 2018, at 2:43 AM, Jeff Law <law@redhat.com> wrote: > > [ I'm still digesting, but saw something in this that ought to be broken > out... ] > > On 08/19/2018 09:55 AM, Bernd Edlinger wrote: >> diff -Npur gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c >> --- gcc/tree-ssa-dse.c 2018-07-18 21:21:34.000000000 +0200 >> +++ gcc/tree-ssa-dse.c 2018-08-19 14:29:32.344498771 +0200 >> @@ -248,6 +248,12 @@ compute_trims (ao_ref *ref, sbitmap live >> residual handling in mem* and str* functions is usually >> reasonably efficient. */ >> *trim_tail = last_orig - last_live; >> + /* Don't fold away an out of bounds access, as this defeats proper >> + warnings. */ >> + if (*trim_tail >> + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (ref->base)), >> + last_orig) <= 0) >> + *trim_tail = 0; >> } >> else >> *trim_tail = 0; > This seems like a good change in and of itself and should be able to go > forward without further review work. Consider this hunk approved, > along with any testsuite you have which tickles this code (I didn't > immediately see one attached to this patch. But I could have missed it). > > Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-10-24 9:14 ` Maxim Kuvyrkov @ 2018-10-24 13:38 ` Bernd Edlinger 2018-10-24 14:26 ` Maxim Kuvyrkov 0 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-10-24 13:38 UTC (permalink / raw) To: Maxim Kuvyrkov, Jeff Law Cc: Martin Sebor, Richard Guenther, GCC Patches, Jakub Jelinek, Ramana Radhakrishnan, Kugan Vivekanandarajah Hi Maxim, short after the initial commit there came two more fix-ups in the same function: $ svn log -r263896 ------------------------------------------------------------------------ r263896 | law | 2018-08-27 22:31:14 +0200 (Mon, 27 Aug 2018) | 4 lines * tree-ssa-dse.c (compute_trims): Handle case where the reference's type does not have a TYPE_SIZE_UNIT. * gcc.c-torture/compile/dse.c: New test. ------------------------------------------------------------------------ $ svn log -r263906 ------------------------------------------------------------------------ r263906 | law | 2018-08-28 06:02:11 +0200 (Tue, 28 Aug 2018) | 6 lines PR tree-optimization/87110 * tree-ssa-dse.c (compute_trims): Handle non-constant TYPE_SIZE_UNIT. PR tree-optimization/87110 * gcc.c-torture/compile/pr87110.c: New test. ------------------------------------------------------------------------ I believe not having those applied is causing the segfault you are seeing. Bernd. On 10/24/18 7:46 AM, Maxim Kuvyrkov wrote: > Hi Jeff, > Hi Bernd, > > This change (git commit d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb / svn rev. 263793) causes a segfault when build Linux kernel for AArch64. The exact configuration is > === > git_repo[linux]=https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git > git_branch[linux]=linux-4.14.y > git_repo[gcc]=git://gcc.gnu.org/git/gcc.git > git_branch[gcc]=master > linux_config=allmodconfig > === > > The bisection artifacts point at this exact commit: > Parent commit 0584c3707994997f5dc9fa79732d01a53c25db6a can build 18083 objects. > This commit d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb can build 18076 objects from the same linux tree. > > The relevant error (see [1]): > == > during GIMPLE pass: dse > drivers/md/dm-mpath.c: In function 'multipath_init_per_bio_data': > drivers/md/dm-mpath.c:2032:1: internal compiler error: Segmentation fault > == > > Full bisection artifacts are at [2]. > > Bernd, would you please investigate? > > IMO, this should be easy to reproduce from the bisection logs, but let me know if it's not straightforward. Best ping on IRC (I'm maximk) or follow up here. > > FYI, there is another regression (either caused or unmasked by Kugan's gcc commit b88c25691cf8b153db44108935db871e1d40db89), but it appears orthogonal to this one. > > [1] https://ci.linaro.org/view/tcwg_kernel-gnu/job/tcwg_kernel-bisect-gnu-master-aarch64-lts-allmodconfig/8/artifact/artifacts/build-d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb/5-count_linux_objs/console.log/*view*/ > > [2] https://ci.linaro.org/view/tcwg_kernel-gnu/job/tcwg_kernel-bisect-gnu-master-aarch64-lts-allmodconfig/8/artifact/artifacts/ > > Regards, > > -- > Maxim Kuvyrkov > www.linaro.org > > > >> On Aug 22, 2018, at 2:43 AM, Jeff Law <law@redhat.com> wrote: >> >> [ I'm still digesting, but saw something in this that ought to be broken >> out... ] >> >> On 08/19/2018 09:55 AM, Bernd Edlinger wrote: >>> diff -Npur gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c >>> --- gcc/tree-ssa-dse.c 2018-07-18 21:21:34.000000000 +0200 >>> +++ gcc/tree-ssa-dse.c 2018-08-19 14:29:32.344498771 +0200 >>> @@ -248,6 +248,12 @@ compute_trims (ao_ref *ref, sbitmap live >>> residual handling in mem* and str* functions is usually >>> reasonably efficient. */ >>> *trim_tail = last_orig - last_live; >>> + /* Don't fold away an out of bounds access, as this defeats proper >>> + warnings. */ >>> + if (*trim_tail >>> + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (ref->base)), >>> + last_orig) <= 0) >>> + *trim_tail = 0; >>> } >>> else >>> *trim_tail = 0; >> This seems like a good change in and of itself and should be able to go >> forward without further review work. Consider this hunk approved, >> along with any testsuite you have which tickles this code (I didn't >> immediately see one attached to this patch. But I could have missed it). >> >> Jeff > ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-10-24 13:38 ` Bernd Edlinger @ 2018-10-24 14:26 ` Maxim Kuvyrkov 0 siblings, 0 replies; 121+ messages in thread From: Maxim Kuvyrkov @ 2018-10-24 14:26 UTC (permalink / raw) To: Bernd Edlinger Cc: Jeff Law, Martin Sebor, Richard Guenther, GCC Patches, Jakub Jelinek, Ramana Radhakrishnan, Kugan Vivekanandarajah Hi Bernd, Ack, thanks! I've just enabled this CI loop, and it is churning through commits since May. Regards, -- Maxim Kuvyrkov www.linaro.org > On Oct 24, 2018, at 5:04 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > > Hi Maxim, > > short after the initial commit there came two more fix-ups in the same function: > > $ svn log -r263896 > ------------------------------------------------------------------------ > r263896 | law | 2018-08-27 22:31:14 +0200 (Mon, 27 Aug 2018) | 4 lines > > * tree-ssa-dse.c (compute_trims): Handle case where the reference's > type does not have a TYPE_SIZE_UNIT. > > * gcc.c-torture/compile/dse.c: New test. > ------------------------------------------------------------------------ > $ svn log -r263906 > ------------------------------------------------------------------------ > r263906 | law | 2018-08-28 06:02:11 +0200 (Tue, 28 Aug 2018) | 6 lines > > PR tree-optimization/87110 > * tree-ssa-dse.c (compute_trims): Handle non-constant > TYPE_SIZE_UNIT. > > PR tree-optimization/87110 > * gcc.c-torture/compile/pr87110.c: New test. > ------------------------------------------------------------------------ > > I believe not having those applied is causing the segfault you are seeing. > > > Bernd. > > > On 10/24/18 7:46 AM, Maxim Kuvyrkov wrote: >> Hi Jeff, >> Hi Bernd, >> >> This change (git commit d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb / svn rev. 263793) causes a segfault when build Linux kernel for AArch64. The exact configuration is >> === >> git_repo[linux]=https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git >> git_branch[linux]=linux-4.14.y >> git_repo[gcc]=git://gcc.gnu.org/git/gcc.git >> git_branch[gcc]=master >> linux_config=allmodconfig >> === >> >> The bisection artifacts point at this exact commit: >> Parent commit 0584c3707994997f5dc9fa79732d01a53c25db6a can build 18083 objects. >> This commit d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb can build 18076 objects from the same linux tree. >> >> The relevant error (see [1]): >> == >> during GIMPLE pass: dse >> drivers/md/dm-mpath.c: In function 'multipath_init_per_bio_data': >> drivers/md/dm-mpath.c:2032:1: internal compiler error: Segmentation fault >> == >> >> Full bisection artifacts are at [2]. >> >> Bernd, would you please investigate? >> >> IMO, this should be easy to reproduce from the bisection logs, but let me know if it's not straightforward. Best ping on IRC (I'm maximk) or follow up here. >> >> FYI, there is another regression (either caused or unmasked by Kugan's gcc commit b88c25691cf8b153db44108935db871e1d40db89), but it appears orthogonal to this one. >> >> [1] https://ci.linaro.org/view/tcwg_kernel-gnu/job/tcwg_kernel-bisect-gnu-master-aarch64-lts-allmodconfig/8/artifact/artifacts/build-d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb/5-count_linux_objs/console.log/*view*/ >> >> [2] https://ci.linaro.org/view/tcwg_kernel-gnu/job/tcwg_kernel-bisect-gnu-master-aarch64-lts-allmodconfig/8/artifact/artifacts/ >> >> Regards, >> >> -- >> Maxim Kuvyrkov >> www.linaro.org >> >> >> >>> On Aug 22, 2018, at 2:43 AM, Jeff Law <law@redhat.com> wrote: >>> >>> [ I'm still digesting, but saw something in this that ought to be broken >>> out... ] >>> >>> On 08/19/2018 09:55 AM, Bernd Edlinger wrote: >>>> diff -Npur gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c >>>> --- gcc/tree-ssa-dse.c 2018-07-18 21:21:34.000000000 +0200 >>>> +++ gcc/tree-ssa-dse.c 2018-08-19 14:29:32.344498771 +0200 >>>> @@ -248,6 +248,12 @@ compute_trims (ao_ref *ref, sbitmap live >>>> residual handling in mem* and str* functions is usually >>>> reasonably efficient. */ >>>> *trim_tail = last_orig - last_live; >>>> + /* Don't fold away an out of bounds access, as this defeats proper >>>> + warnings. */ >>>> + if (*trim_tail >>>> + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (ref->base)), >>>> + last_orig) <= 0) >>>> + *trim_tail = 0; >>>> } >>>> else >>>> *trim_tail = 0; >>> This seems like a good change in and of itself and should be able to go >>> forward without further review work. Consider this hunk approved, >>> along with any testsuite you have which tickles this code (I didn't >>> immediately see one attached to this patch. But I could have missed it). >>> >>> Jeff >> ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-25 19:37 ` Martin Sebor 2018-07-26 8:55 ` Richard Biener @ 2018-08-03 7:29 ` Jeff Law 1 sibling, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-03 7:29 UTC (permalink / raw) To: Martin Sebor, Richard Biener, Bernd Edlinger; +Cc: GCC Patches, Jakub Jelinek On 07/25/2018 01:36 PM, Martin Sebor wrote: >> BUT - for the string_constant and c_strlen functions we are, >> in all cases we return something interesting, able to look >> at an initializer which then determines that type. Hopefully. >> I think the strlen() folding code when it sets SSA ranges >> now looks at types ...? >> >> Consider >> >> struct X { int i; char c[4]; int j;}; >> struct Y { char c[16]; }; >> >> void foo (struct X *p, struct Y *q) >> { >>  memcpy (p, q, sizeof (struct Y)); >>  if (strlen ((char *)(struct Y *)p + 4) < 7) >>    abort (); >> } >> >> here the GIMPLE IL looks like >> >>  const char * _1; >> >>  <bb 2> [local count: 1073741825]: >>  _5 = MEM[(char * {ref-all})q_4(D)]; >>  MEM[(char * {ref-all})p_6(D)] = _5; >>  _1 = p_6(D) + 4; >>  _2 = __builtin_strlen (_1); >> >> and I guess Martin would argue that since p is of type struct X >> + 4 gets you to c[4] and thus strlen of that cannot be larger >> than 3. But of course the middle-end doesn't work like that >> and luckily we do not try to draw such conclusions or we >> are somehow lucky that for the testcase as written above we do not >> (I'm not sure whether Martins changes in this area would derive >> such conclusions in principle). > > Only if the strlen argument were p->c. Right. In that case the argument passed to strlen is of type char[4] and we can derive range info from that. In the testcase as posted, the type is char * and we can't derive anything from that. > >> NOTE - we do not know the dynamic type here since we do not know >> the dynamic type of the memory pointed-to by q! We can only >> derive that at q+4 there must be some object that we can >> validly call strlen on (where Martin again thinks strlen >> imposes constrains that memchr does not - sth I do not agree >> with from a QOI perspective) > > The dynamic type is a murky area. As you said, above we don't > know whether *p is an allocated object or not. Strictly speaking, > we would need to treat it as such. It would basically mean > throwing out all type information and treating objects simply > as blobs of bytes. But that's not what GCC or other compilers do > either. For instance, in the modified foo below, GCC eliminates > the test because it assumes that *p and *q don't overlap. It > does that because they are members of structs of unrelated types > access to which cannot alias. I.e., not just the type of > the access matters (here int and char) but so does the type of > the enclosing object. If it were otherwise and only the type > of the access mattered then eliminating the test below wouldn't > be valid (objects can have their stored value accessed by either > an lvalue of a compatible type or char). I think we're getting off base here. The more I think about this problem the more it seems to me like the real issue is we can't look through casts unless doing so allows us to get to an initializer (which in turn allows us to compute the length as a compile-time constant). jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-25 7:23 ` Richard Biener 2018-07-25 19:37 ` Martin Sebor @ 2018-08-03 7:19 ` Jeff Law 2018-08-03 7:48 ` Jakub Jelinek 2018-08-20 10:06 ` Richard Biener 1 sibling, 2 replies; 121+ messages in thread From: Jeff Law @ 2018-08-03 7:19 UTC (permalink / raw) To: Richard Biener, Bernd Edlinger; +Cc: GCC Patches, Jakub Jelinek, Martin Sebor On 07/25/2018 01:23 AM, Richard Biener wrote: > On Tue, 24 Jul 2018, Bernd Edlinger wrote: > >> On 07/24/18 23:46, Jeff Law wrote: >>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>> Hi! >>>> >>>> This patch makes strlen range computations more conservative. >>>> >>>> Firstly if there is a visible type cast from type A to B before passing >>>> then value to strlen, don't expect the type layout of B to restrict the >>>> possible return value range of strlen. >>> Why do you think this is the right thing to do? ie, is there language >>> in the standards that makes you think the code as it stands today is >>> incorrect from a conformance standpoint? Is there a significant body of >>> code that is affected in an adverse way by the current code? If so, >>> what code? >>> >>> >> >> I think if you have an object, of an effective type A say char[100], then >> you can cast the address of A to B, say typedef char (*B)[2] for instance >> and then to const char *, say for use in strlen. I may be wrong, but I think >> that we should at least try not to pick up char[2] from B, but instead >> use A for strlen ranges, or leave this range open. Currently the range >> info for strlen is [0..1] in this case, even if we see the type cast >> in the generic tree. > > You raise a valid point - namely that the middle-end allows > any object (including storage with a declared type) to change > its dynamic type (even of a piece of it). So unless you can > prove that the dynamic type of the thing you are looking at > matches your idea of that type you may not derive any string > lengths (or ranges) from it. > > BUT - for the string_constant and c_strlen functions we are, > in all cases we return something interesting, able to look > at an initializer which then determines that type. Hopefully. > I think the strlen() folding code when it sets SSA ranges > now looks at types ...? I'm leaning towards a similar conclusion, namely that we can only rely on type information for the pointer that actually gets passed to strlen, which 99.9% of the time is (char *), potentially with const qualifiers. It's tempting to look back through the cast to find a cast from a char array but I'm more and more concerned that it's not safe unless we can walk back to an initializer. What this might argue is that we need to distinguish between a known range and a likely range. I really dislike doing that again. We may have to see more real world cases where the likely range allows us to improve the precision of the sprintf warnings (since that's really the goal of improved string length ranges). > > Consider > > struct X { int i; char c[4]; int j;}; > struct Y { char c[16]; }; > > void foo (struct X *p, struct Y *q) > { > memcpy (p, q, sizeof (struct Y)); > if (strlen ((char *)(struct Y *)p + 4) < 7) > abort (); > } > > here the GIMPLE IL looks like > > const char * _1; > > <bb 2> [local count: 1073741825]: > _5 = MEM[(char * {ref-all})q_4(D)]; > MEM[(char * {ref-all})p_6(D)] = _5; > _1 = p_6(D) + 4; > _2 = __builtin_strlen (_1); > > and I guess Martin would argue that since p is of type struct X > + 4 gets you to c[4] and thus strlen of that cannot be larger > than 3. But _1 is of type const char * and that's what's passed to strlen. The type of P and Q are irrelevant ISTM. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-03 7:19 ` Jeff Law @ 2018-08-03 7:48 ` Jakub Jelinek 2018-08-06 14:58 ` Jeff Law 2018-08-20 10:06 ` Richard Biener 1 sibling, 1 reply; 121+ messages in thread From: Jakub Jelinek @ 2018-08-03 7:48 UTC (permalink / raw) To: Jeff Law; +Cc: Richard Biener, Bernd Edlinger, GCC Patches, Martin Sebor On Fri, Aug 03, 2018 at 01:19:14AM -0600, Jeff Law wrote: > I'm leaning towards a similar conclusion, namely that we can only rely > on type information for the pointer that actually gets passed to strlen, > which 99.9% of the time is (char *), potentially with const qualifiers. You can't derive anything from the pointer type of the strlen argument, because pointer conversions are useless in the middle-end, so if there was some conversion at some point, it might be gone, or you might get there a completely different pointer type of something that happened to have the same value. That is why the information, if it matters, needs to be stored elsewhere, on the memory access (MEM_REF has such info, TARGET_MEM_REF too, handled_component_p do too). Jakub ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-03 7:48 ` Jakub Jelinek @ 2018-08-06 14:58 ` Jeff Law 0 siblings, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-06 14:58 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Richard Biener, Bernd Edlinger, GCC Patches, Martin Sebor On 08/03/2018 01:48 AM, Jakub Jelinek wrote: > On Fri, Aug 03, 2018 at 01:19:14AM -0600, Jeff Law wrote: >> I'm leaning towards a similar conclusion, namely that we can only rely >> on type information for the pointer that actually gets passed to strlen, >> which 99.9% of the time is (char *), potentially with const qualifiers. > > You can't derive anything from the pointer type of the strlen argument, > because pointer conversions are useless in the middle-end, so if there was > some conversion at some point, it might be gone, or you might get there a > completely different pointer type of something that happened to have the > same value. That is why the information, if it matters, needs to be stored > elsewhere, on the memory access (MEM_REF has such info, TARGET_MEM_REF too, > handled_component_p do too). So ISTM that you, Richi and I are in broad agreement that we can't walk backwards through casts to refine the potential range for string lengths because of GIMPLE semantics. That also matches which Bernd wants to see happen. With that in mind I propose we either revert that aspect of Martin's patch or move forward with Bernd's patch for that issue, whichever makes the most sense. However, I want to give Martin a chance to chime in explicitly on the subject of looking through casts before making that change. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-03 7:19 ` Jeff Law 2018-08-03 7:48 ` Jakub Jelinek @ 2018-08-20 10:06 ` Richard Biener 1 sibling, 0 replies; 121+ messages in thread From: Richard Biener @ 2018-08-20 10:06 UTC (permalink / raw) To: Jeff Law Cc: Richard Guenther, Bernd Edlinger, GCC Patches, Jakub Jelinek, Martin Sebor On Fri, Aug 3, 2018 at 9:19 AM Jeff Law <law@redhat.com> wrote: > > On 07/25/2018 01:23 AM, Richard Biener wrote: > > On Tue, 24 Jul 2018, Bernd Edlinger wrote: > > > >> On 07/24/18 23:46, Jeff Law wrote: > >>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: > >>>> Hi! > >>>> > >>>> This patch makes strlen range computations more conservative. > >>>> > >>>> Firstly if there is a visible type cast from type A to B before passing > >>>> then value to strlen, don't expect the type layout of B to restrict the > >>>> possible return value range of strlen. > >>> Why do you think this is the right thing to do? ie, is there language > >>> in the standards that makes you think the code as it stands today is > >>> incorrect from a conformance standpoint? Is there a significant body of > >>> code that is affected in an adverse way by the current code? If so, > >>> what code? > >>> > >>> > >> > >> I think if you have an object, of an effective type A say char[100], then > >> you can cast the address of A to B, say typedef char (*B)[2] for instance > >> and then to const char *, say for use in strlen. I may be wrong, but I think > >> that we should at least try not to pick up char[2] from B, but instead > >> use A for strlen ranges, or leave this range open. Currently the range > >> info for strlen is [0..1] in this case, even if we see the type cast > >> in the generic tree. > > > > You raise a valid point - namely that the middle-end allows > > any object (including storage with a declared type) to change > > its dynamic type (even of a piece of it). So unless you can > > prove that the dynamic type of the thing you are looking at > > matches your idea of that type you may not derive any string > > lengths (or ranges) from it. > > > > BUT - for the string_constant and c_strlen functions we are, > > in all cases we return something interesting, able to look > > at an initializer which then determines that type. Hopefully. > > I think the strlen() folding code when it sets SSA ranges > > now looks at types ...? > I'm leaning towards a similar conclusion, namely that we can only rely > on type information for the pointer that actually gets passed to strlen, > which 99.9% of the time is (char *), potentially with const qualifiers. It's 100% (char *) because the C standard says arguments are converted to the argument type. > It's tempting to look back through the cast to find a cast from a char > array but I'm more and more concerned that it's not safe unless we can > walk back to an initializer. > > What this might argue is that we need to distinguish between a known > range and a likely range. I really dislike doing that again. We may > have to see more real world cases where the likely range allows us to > improve the precision of the sprintf warnings (since that's really the > goal of improved string length ranges). > > > > > > > Consider > > > > struct X { int i; char c[4]; int j;}; > > struct Y { char c[16]; }; > > > > void foo (struct X *p, struct Y *q) > > { > > memcpy (p, q, sizeof (struct Y)); > > if (strlen ((char *)(struct Y *)p + 4) < 7) > > abort (); > > } > > > > here the GIMPLE IL looks like > > > > const char * _1; > > > > <bb 2> [local count: 1073741825]: > > _5 = MEM[(char * {ref-all})q_4(D)]; > > MEM[(char * {ref-all})p_6(D)] = _5; > > _1 = p_6(D) + 4; > > _2 = __builtin_strlen (_1); > > > > and I guess Martin would argue that since p is of type struct X > > + 4 gets you to c[4] and thus strlen of that cannot be larger > > than 3. > But _1 is of type const char * and that's what's passed to strlen. The > type of P and Q are irrelevant ISTM. > > Jeff > > ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-24 23:18 ` Bernd Edlinger 2018-07-25 4:52 ` Jeff Law 2018-07-25 7:23 ` Richard Biener @ 2018-07-25 17:31 ` Martin Sebor 2018-07-27 6:49 ` Bernd Edlinger 2018-08-03 7:00 ` Jeff Law 2018-08-09 5:26 ` Jeff Law 4 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-07-25 17:31 UTC (permalink / raw) To: Bernd Edlinger, Jeff Law, GCC Patches; +Cc: Richard Biener, Jakub Jelinek > One other example I have found in one of the test cases: > > char c; > > if (strlen(&c) != 0) abort(); > > this is now completely elided, but why? Because the only string that can be stored in an array of one element is the empty string. Expanding that call to strlen() is in all likelihood going to result in zero. The only two cases when it doesn't are invalid: either the character is uninitialized (GCC may not see it so it may not warn about it), or it is initialized to a non-zero value (which makes it not a string -- I have submitted an enhancement to detect a subset of these cases). The cases where the user expects to be able to read past the end of the character and what follows are both exceedingly unlikely and also undefined. So in my view, it is safer to fold the call into zero than not. Is there a code base where > that is used? I doubt, but why do we care to eliminate something > stupid like that? If we would emit a warning for that I'm fine with it, > But if we silently remove code like that I don't think that it > will improve anything. So I ask, where is the code base which > gets an improvement from that optimization? Jonathan suggested issuing a warning in this case. That sounds reasonable to me, but not everyone is in favor of issuing warnings out of the folder. (I'm guilty of having done that in a few important cases despite it.) I am fully supportive of enhancing warnings to detect more problems, but I am opposed to gratuitously removing solutions that have been put in after a great deal of thought, without as much as bring them up for discussion. > This work concentrates mostly on avoiding to interfere with code that > actually deserves warnings, but which is not being warned about. Then help by adding the missing warnings. It will help drive improvements to user code and will ultimately lead to greater efficiency. Dumbing down the analyses and accommodating undefined code is not a good way forward. It will only lead to a kludgy compiler with hacks for this or that bad practice and compromise our ability to implement new optimizations (and detect more bugs). Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-25 17:31 ` Martin Sebor @ 2018-07-27 6:49 ` Bernd Edlinger 2018-07-31 3:45 ` Martin Sebor 2018-08-06 15:34 ` Jeff Law 0 siblings, 2 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-07-27 6:49 UTC (permalink / raw) To: Martin Sebor, Jeff Law, GCC Patches; +Cc: Richard Biener, Jakub Jelinek I have one more example similar to PR86259, that resembles IMHO real world code: Consider the following: int fun (char *p) { char buf[16]; assert(strlen(p) < 4); //here: security relevant check sprintf(buf, "echo %s - %s", p, p); //here: security relevant code return system(buf); } What is wrong with the assertion? Nothing, except it is removed, when this function is called from untrusted code: untrused_fun () { char b[2] = "ab"; fun(b); } !!!! don't try to execute that: after "ab" there can be "; rm -rF / ;" on your stack!!!! Even the slightly more safe check "assert(strnlen(p, 4) < 4);" would have been removed. Now that is a simple error and it would be easy to fix -- normally. But when the assertion is removed, the security relevant code is allowed to continue where it creates more damage and is suddenly much harder to debug. So, I start to believe that strlen range assumptions are unsafe, unless we can prove that the string is in fact zero terminated. I would like to guard the strlen range checks with a new option, maybe -fassume-zero-terminated-char-arrays, and enable that under -Ofast only. What do you think? Thanks Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-27 6:49 ` Bernd Edlinger @ 2018-07-31 3:45 ` Martin Sebor 2018-07-31 6:38 ` Jakub Jelinek 2018-08-06 15:34 ` Jeff Law 1 sibling, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-07-31 3:45 UTC (permalink / raw) To: Bernd Edlinger, Jeff Law, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 07/27/2018 12:48 AM, Bernd Edlinger wrote: > I have one more example similar to PR86259, that resembles IMHO real world code: > > Consider the following: > > > int fun (char *p) > { > char buf[16]; > > assert(strlen(p) < 4); //here: security relevant check > > sprintf(buf, "echo %s - %s", p, p); //here: security relevant code > return system(buf); > } > > > What is wrong with the assertion? > > Nothing, except it is removed, when this function is called from untrusted code: > > untrused_fun () > { > char b[2] = "ab"; > fun(b); > } > > !!!! don't try to execute that: after "ab" there can be "; rm -rF / ;" on your stack!!!! > > Even the slightly more safe check "assert(strnlen(p, 4) < 4);" would have > been removed. > > Now that is a simple error and it would be easy to fix -- normally. > But when the assertion is removed, the security relevant code > is allowed to continue where it creates more damage and is > suddenly much harder to debug. sprintf() is a known source of buffer overflows. The recommended practice is to use snprintf. An alternate mechanism to constrain the number of bytes formatted by an individual %s directive is to use the precision, such as %.4s. > So, I start to believe that strlen range assumptions are unsafe, unless > we can prove that the string is in fact zero terminated. > > I would like to guard the strlen range checks with a new option, maybe > -fassume-zero-terminated-char-arrays, and enable that under -Ofast only. > > What do you think? I'm not opposed to providing options to control various features but I'm not in favor of disabling them by default as a solution to accommodate buggy code. For every instance of a bug in a program with undefined behavior, whether it's reading or writing past the end of an object or subobject, or integer overflow, it's possible to show security-related consequences. One could just as easily create a test case where allowing strlen to read past the end of a member array could be exploited to cause a subsequent buffer overflow. Some of these consequences might be in some cases mitigated by one strategy and others in other cases by another. There's no silver bullet -- the best approach is to drive improvements to code to help weed out these bugs. Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past the end of subobjects by string functions. With _FORTIFY_SOURCE=2 it calls abort. This is the default on popular distributions, including Fedora, RHEL, and Ubuntu. -Wstringop-truncation tries to help detect the creation of unterminated strings by strncpy and strncat. There is little reason in my mind to treat strlen or any other function as special, except perhaps for the few existing exceptions of the raw memory functions (memcpy, et al.) As you know, I have already posted a patch to detect a subset of the problem of calling strlen on non-terminated arrays. More such issues, including uses of dynamically created and uninitialized arrays, can be detected by relatively modest enhancements to the tree-ssa-strlen pass (also on my list of things to do). It may also be worth considering moving the "initializer-string for array chars is too long" warning from -Wc++-compat to -Wall or -Wextra. But I would much rather focus on these solutions and work toward overall improvements than on weakening optimization to accommodate undefined code. With sufficient awareness as a result of warnings such code should all but disappear. Following stricter rules opens up opportunities for deeper analyses to enable more optimization and detect even more bugs. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-31 3:45 ` Martin Sebor @ 2018-07-31 6:38 ` Jakub Jelinek 2018-07-31 15:17 ` Martin Sebor 0 siblings, 1 reply; 121+ messages in thread From: Jakub Jelinek @ 2018-07-31 6:38 UTC (permalink / raw) To: Martin Sebor; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Richard Biener On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote: > Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past > the end of subobjects by string functions. With _FORTIFY_SOURCE=2 > it calls abort. This is the default on popular distributions, Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the standard requires, imposes extra requirements. So from what this mode accepts or rejects we shouldn't determine what is or isn't considered valid. Jakub ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-31 6:38 ` Jakub Jelinek @ 2018-07-31 15:17 ` Martin Sebor 2018-07-31 15:48 ` Jakub Jelinek 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-07-31 15:17 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Richard Biener On 07/31/2018 12:38 AM, Jakub Jelinek wrote: > On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote: >> Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past >> the end of subobjects by string functions. With _FORTIFY_SOURCE=2 >> it calls abort. This is the default on popular distributions, > > Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the standard > requires, imposes extra requirements. So from what this mode accepts or > rejects we shouldn't determine what is or isn't considered valid. I'm not sure what the additional requirements are but the ones I am referring to are the enforcing of struct member boundaries. This is in line with the standard requirements of not accessing [sub]objects via pointers derived from other [sub]objects. The one area where Builtin Object Size doesn't faithfully reflect subobject boundaries is arrays of of arrays. This was a serious concern for the security group at my last company (see bug 44384) We developed (proprietary) patches to mitigate the shortcoming. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-31 15:17 ` Martin Sebor @ 2018-07-31 15:48 ` Jakub Jelinek 2018-07-31 23:20 ` Martin Sebor 0 siblings, 1 reply; 121+ messages in thread From: Jakub Jelinek @ 2018-07-31 15:48 UTC (permalink / raw) To: Martin Sebor; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Richard Biener On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote: > On 07/31/2018 12:38 AM, Jakub Jelinek wrote: > > On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote: > > > Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past > > > the end of subobjects by string functions. With _FORTIFY_SOURCE=2 > > > it calls abort. This is the default on popular distributions, > > > > Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the standard > > requires, imposes extra requirements. So from what this mode accepts or > > rejects we shouldn't determine what is or isn't considered valid. > > I'm not sure what the additional requirements are but the ones > I am referring to are the enforcing of struct member boundaries. > This is in line with the standard requirements of not accessing > [sub]objects via pointers derived from other [sub]objects. In the middle-end the distinction between what was originally a reference to subobjects and what was a reference to objects is quickly lost (whether through SCCVN or other optimizations). We've run into this many times with the __builtin_object_size already. So, if e.g. struct S { char a[3]; char b[5]; } s = { "abc", "defg" }; ... strlen ((char *) &s) is well defined but strlen (s.a) is not in C, for the middle-end you might not figure out which one is which. Jakub ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-31 15:48 ` Jakub Jelinek @ 2018-07-31 23:20 ` Martin Sebor 2018-08-01 6:55 ` Bernd Edlinger 2018-08-01 7:19 ` Richard Biener 0 siblings, 2 replies; 121+ messages in thread From: Martin Sebor @ 2018-07-31 23:20 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Bernd Edlinger, Jeff Law, GCC Patches, Richard Biener On 07/31/2018 09:48 AM, Jakub Jelinek wrote: > On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote: >> On 07/31/2018 12:38 AM, Jakub Jelinek wrote: >>> On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote: >>>> Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past >>>> the end of subobjects by string functions. With _FORTIFY_SOURCE=2 >>>> it calls abort. This is the default on popular distributions, >>> >>> Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the standard >>> requires, imposes extra requirements. So from what this mode accepts or >>> rejects we shouldn't determine what is or isn't considered valid. >> >> I'm not sure what the additional requirements are but the ones >> I am referring to are the enforcing of struct member boundaries. >> This is in line with the standard requirements of not accessing >> [sub]objects via pointers derived from other [sub]objects. > > In the middle-end the distinction between what was originally a reference > to subobjects and what was a reference to objects is quickly lost > (whether through SCCVN or other optimizations). > We've run into this many times with the __builtin_object_size already. > So, if e.g. > struct S { char a[3]; char b[5]; } s = { "abc", "defg" }; > ... > strlen ((char *) &s) is well defined but > strlen (s.a) is not in C, for the middle-end you might not figure out which > one is which. Yes, I'm aware of the middle-end transformation to MEM_REF -- it's one of the reasons why detecting invalid accesses by the middle end warnings, including -Warray-bounds, -Wformat-overflow, -Wsprintf-overflow, and even -Wrestrict, is less than perfect. But is strlen(s.a) also meant to be well-defined in the middle end (with the semantics of computing the length or "abcdefg"?) And if so, what makes it well defined? Certainly not every "strlen" has these semantics. For example, this open-coded one doesn't: int len = 0; for (int i = 0; s.a[i]; ++i) ++len; It computes 2 (with no warning for the out-of-bounds access). So if the standard doesn't guarantee it and different kinds of accesses behave differently, how do we explain what "works" and what doesn't without relying on GCC implementation details? If we can't then the only language we have in common with users is the standard. (This, by the way, is what the C memory model group is trying to address -- the language or feature that's missing from the standard that says when, if ever, these things might be valid.) Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-31 23:20 ` Martin Sebor @ 2018-08-01 6:55 ` Bernd Edlinger 2018-08-03 4:19 ` Martin Sebor 2018-08-01 7:19 ` Richard Biener 1 sibling, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-01 6:55 UTC (permalink / raw) To: Martin Sebor, Jakub Jelinek; +Cc: Jeff Law, GCC Patches, Richard Biener > Certainly not every "strlen" has these semantics. For example, > this open-coded one doesn't: > > int len = 0; > for (int i = 0; s.a[i]; ++i) > ++len; > > It computes 2 (with no warning for the out-of-bounds access). > yes, which is questionable as well, but that happens only if the source code accesses the array via s.a[i] not if it happens to use char *, as this experiment shows: $ cat y1.c int len (const char *x) { int len = 0; for (int i = 0; x[i]; ++i) ++len; return len; } const char a[3] = "123"; int main () { return len(a); } $ gcc -O3 y1.c $ ./a.out ; echo $? 3 The loop is not optimized away. $ cat y2.c const char a[3] = "123"; int main () { int len = 0; for (int i = 0; a[i]; ++i) ++len; return len; } $ gcc -O3 y2.c $ ./a.out ; echo $? 2 The point I make is that it is impossible to know where the function is inlined, and if the original code can be broken in surprising ways. And most importantly strlen is often used in security relevant ways. > So if the standard doesn't guarantee it and different kinds > of accesses behave differently, how do we explain what "works" > and what doesn't without relying on GCC implementation details? > > If we can't then the only language we have in common with users > is the standard. (This, by the way, is what the C memory model > group is trying to address -- the language or feature that's > missing from the standard that says when, if ever, these things > might be valid.) Sorry, but there are examples of undefined behaviour that GCC does deliberately not use for code optimizations, but only for warnings. I mean undefinedness of signed shift left overflow for instance. I think the possible return value of strlen should be also not used for code optimizations. Because your optimization assumes the return value of strlen is always in the range 0..size-1 even if the string is not nul terminated. But that is the only value that can _never_ be returned if the string is not nul terminated. Therefore this is often used as check for zero-termination. (*) But in reality the return value is always in range size..infinity or the function aborts, code like assert(strlen(x) < sizeof(x)) uses this basic knowledge. The standard should mention these magic powers of strlen, and state that it will either abort or return >= sizeof(x). It does not help anybody to be unclear. (*): This is even done here: __strcpy_chk (char *__restrict__ dest, const char *__restrict__ src, size_t slen) { size_t len = strlen (src); if (len >= slen) __chk_fail (); return memcpy (dest, src, len + 1); } If you are right __chk_fail will never be called. So why not optimize it away? Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-01 6:55 ` Bernd Edlinger @ 2018-08-03 4:19 ` Martin Sebor 2018-08-06 15:39 ` Jeff Law 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-03 4:19 UTC (permalink / raw) To: Bernd Edlinger, Jakub Jelinek; +Cc: Jeff Law, GCC Patches, Richard Biener On 08/01/2018 12:55 AM, Bernd Edlinger wrote: >> Certainly not every "strlen" has these semantics. For example, >> this open-coded one doesn't: >> >> int len = 0; >> for (int i = 0; s.a[i]; ++i) >> ++len; >> >> It computes 2 (with no warning for the out-of-bounds access). >> > > yes, which is questionable as well, but that happens only > if the source code accesses the array via s.a[i] > not if it happens to use char *, as this experiment shows: Yes, that just happens to be the case with GCC in some situations, and not in others. That's why it shouldn't be relied on. > The point I make is that it is impossible to know where the function > is inlined, and if the original code can be broken in surprising ways. > And most importantly strlen is often used in security relevant ways. Code that's concerned with security or safety (which should be all of it) needs to follow the basic rules of the language. Calling strlen() on a char[4] argument expecting it to return a value larger than 3 as an indication that the array isn't nul-terminated is not a secure coding practice -- it's a plain old bug. Don't take my word for it -- read any of the secure coding standards: CEERT STR32-C. Do not pass a non-null-terminated character sequence to a library function that expects a string, CWE-170: Improper Null Termination, OWASP String Termination Error. This is elementary material that shouldn't need explaining. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-03 4:19 ` Martin Sebor @ 2018-08-06 15:39 ` Jeff Law 0 siblings, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-06 15:39 UTC (permalink / raw) To: Martin Sebor, Bernd Edlinger, Jakub Jelinek; +Cc: GCC Patches, Richard Biener On 08/02/2018 10:19 PM, Martin Sebor wrote: > On 08/01/2018 12:55 AM, Bernd Edlinger wrote: >>> Certainly not every "strlen" has these semantics. For example, >>> this open-coded one doesn't: >>> >>>   int len = 0; >>>   for (int i = 0; s.a[i]; ++i) >>>     ++len; >>> >>> It computes 2 (with no warning for the out-of-bounds access). >>> >> >> yes, which is questionable as well, but that happens only >> if the source code accesses the array via s.a[i] >> not if it happens to use char *, as this experiment shows: > > Yes, that just happens to be the case with GCC in some > situations, and not in others. That's why it shouldn't > be relied on. Right. An access via an ARRAY_REF carries some semantic information that is not carried by a INDIRECT_REF. However, we may at times lower an ARRAY_REF to an INDIRECT_REF -- and I believe we've had code in the front-ends to go the other way. > >> The point I make is that it is impossible to know where the function >> is inlined, and if the original code can be broken in surprising ways. >> And most importantly strlen is often used in security relevant ways. > > Code that's concerned with security or safety (which should > be all of it) needs to follow the basic rules of the language. > Calling strlen() on a char[4] argument expecting it to return > a value larger than 3 as an indication that the array isn't > nul-terminated is not a secure coding practice -- it's a plain > old bug. Don't take my word for it -- read any of the secure > coding standards: CEERT STR32-C. Do not pass a non-null-terminated > character sequence to a library function that expects a string, > CWE-170: Improper Null Termination, OWASP String Termination > Error. This is elementary material that shouldn't need > explaining. I couldn't agree more with this. jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-31 23:20 ` Martin Sebor 2018-08-01 6:55 ` Bernd Edlinger @ 2018-08-01 7:19 ` Richard Biener 2018-08-01 8:40 ` Jakub Jelinek ` (2 more replies) 1 sibling, 3 replies; 121+ messages in thread From: Richard Biener @ 2018-08-01 7:19 UTC (permalink / raw) To: Martin Sebor; +Cc: Jakub Jelinek, Bernd Edlinger, Jeff Law, GCC Patches On Tue, 31 Jul 2018, Martin Sebor wrote: > On 07/31/2018 09:48 AM, Jakub Jelinek wrote: > > On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote: > > > On 07/31/2018 12:38 AM, Jakub Jelinek wrote: > > > > On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote: > > > > > Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past > > > > > the end of subobjects by string functions. With _FORTIFY_SOURCE=2 > > > > > it calls abort. This is the default on popular distributions, > > > > > > > > Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the > > > > standard > > > > requires, imposes extra requirements. So from what this mode accepts or > > > > rejects we shouldn't determine what is or isn't considered valid. > > > > > > I'm not sure what the additional requirements are but the ones > > > I am referring to are the enforcing of struct member boundaries. > > > This is in line with the standard requirements of not accessing > > > [sub]objects via pointers derived from other [sub]objects. > > > > In the middle-end the distinction between what was originally a reference > > to subobjects and what was a reference to objects is quickly lost > > (whether through SCCVN or other optimizations). > > We've run into this many times with the __builtin_object_size already. > > So, if e.g. > > struct S { char a[3]; char b[5]; } s = { "abc", "defg" }; > > ... > > strlen ((char *) &s) is well defined but > > strlen (s.a) is not in C, for the middle-end you might not figure out which > > one is which. > > Yes, I'm aware of the middle-end transformation to MEM_REF > -- it's one of the reasons why detecting invalid accesses > by the middle end warnings, including -Warray-bounds, > -Wformat-overflow, -Wsprintf-overflow, and even -Wrestrict, > is less than perfect. > > But is strlen(s.a) also meant to be well-defined in the middle > end (with the semantics of computing the length or "abcdefg"?) Yes. > And if so, what makes it well defined? The fact that strlen takes a char * argument and thus inline-expansion of a trivial implementation like int len = 0; for (; *p; ++p) ++len; will have p = &s.a; and the middle-end doesn't reconstruct s.a[..] from the pointer access. > > Certainly not every "strlen" has these semantics. For example, > this open-coded one doesn't: > > int len = 0; > for (int i = 0; s.a[i]; ++i) > ++len; > > It computes 2 (with no warning for the out-of-bounds access). Yes. > So if the standard doesn't guarantee it and different kinds > of accesses behave differently, how do we explain what "works" > and what doesn't without relying on GCC implementation details? In the middle-end accesses via pointers - accesses where the access path is not visible in the access itself - are not constrained by the "access" path of how the pointer was built. > If we can't then the only language we have in common with users > is the standard. (This, by the way, is what the C memory model > group is trying to address -- the language or feature that's > missing from the standard that says when, if ever, these things > might be valid.) Well, you simply have to not compare apples and oranges, a strlen implementation that isn't a strlen implementation and strlen. Richard. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-01 7:19 ` Richard Biener @ 2018-08-01 8:40 ` Jakub Jelinek 2018-08-03 3:59 ` Martin Sebor 2018-08-02 3:13 ` Martin Sebor 2018-08-03 7:38 ` Jeff Law 2 siblings, 1 reply; 121+ messages in thread From: Jakub Jelinek @ 2018-08-01 8:40 UTC (permalink / raw) To: Richard Biener; +Cc: Martin Sebor, Bernd Edlinger, Jeff Law, GCC Patches On Wed, Aug 01, 2018 at 09:19:43AM +0200, Richard Biener wrote: > > And if so, what makes it well defined? > > The fact that strlen takes a char * argument and thus inline-expansion > of a trivial implementation like > > int len = 0; > for (; *p; ++p) > ++len; > > will have > > p = &s.a; > > and the middle-end doesn't reconstruct s.a[..] from the pointer > access. Some testcases: struct S { char a[3]; char b[5]; } s[3] = { { "ab", "defg" }, { "h", "klmn" }, { "opq", "rstu" } }; __SIZE_TYPE__ foo (int i, int a) { volatile char *p = (volatile char *) &s[i].a; if (a) p = (volatile char *) &s[i]; return __builtin_strlen ((char *) p); } If I call this with foo (2, 1), do you still claim it is not valid C? I don't see in C anything that would say that this is not valid for strlen, but valid for memcpy, and if you say it is not valid even for memcpy, then pretty much nothing will work, we need memcpy to be able to copy whole objects containing subobjects. The middle-end optimizes the above into p_3 = &s[i_2(D)].a; _7 = __builtin_strlen (p_3); (in fre3 in particular). Or: int bar (char *) __attribute__((pure)); int v; __SIZE_TYPE__ baz (int i, int a) { int b = bar (&s[i].a[0]); __SIZE_TYPE__ t = __builtin_strlen ((char *) &s[i]); v = b; return t; } where bar may say just int bar (char *p) { return *p; } The middle-end optimizes this into: _1 = &s[i_3(D)].a[0]; b_5 = bar (_1); t_6 = __builtin_strlen (_1); That is why for __builtin_object_size (..., [13]) where we want to differentiate between that we need to handle it very early, because later on the distinction is gone. Jakub ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-01 8:40 ` Jakub Jelinek @ 2018-08-03 3:59 ` Martin Sebor 2018-08-03 7:43 ` Jakub Jelinek 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-03 3:59 UTC (permalink / raw) To: Jakub Jelinek, Richard Biener; +Cc: Bernd Edlinger, Jeff Law, GCC Patches On 08/01/2018 02:40 AM, Jakub Jelinek wrote: > On Wed, Aug 01, 2018 at 09:19:43AM +0200, Richard Biener wrote: >>> And if so, what makes it well defined? >> >> The fact that strlen takes a char * argument and thus inline-expansion >> of a trivial implementation like >> >> int len = 0; >> for (; *p; ++p) >> ++len; >> >> will have >> >> p = &s.a; >> >> and the middle-end doesn't reconstruct s.a[..] from the pointer >> access. > > Some testcases: > struct S { char a[3]; char b[5]; } s[3] = { { "ab", "defg" }, { "h", "klmn" }, { "opq", "rstu" } }; > > __SIZE_TYPE__ > foo (int i, int a) > { > volatile char *p = (volatile char *) &s[i].a; > if (a) > p = (volatile char *) &s[i]; > return __builtin_strlen ((char *) p); > } > > If I call this with foo (2, 1), do you still claim it is not valid C? String functions like strlen operate on character strings stored in character arrays. Calling strlen (&s[1]) is invalid because &s[1] is not the address of a character array. The fact that objects can be represented as arrays of bytes doesn't change that. The standard may be somewhat loose with words on this distinction but the intent certainly isn't for strlen to traverse arbitrary sequences of bytes that cross subobject boundaries. (That is the intent behind the raw memory functions, but the current text doesn't make the distinction clear.) > I don't see in C anything that would say that this is not valid for strlen, > but valid for memcpy, and if you say it is not valid even for memcpy, then > pretty much nothing will work, we need memcpy to be able to copy whole objects > containing subobjects. Yes, I understand the concern, and it is something for the standard to reconcile. As I said, the object model study group is considering how to reflect this in the model. One suggestion was to treat unsigned char (and only it) special. Another was to introduce a new type, byte, like C++ has, and treat it special. Yet another is to come up with a special cast to change the provenance of a pointer. There probably will be others. > The middle-end optimizes the above into > p_3 = &s[i_2(D)].a; > _7 = __builtin_strlen (p_3); > (in fre3 in particular). > > Or: > int bar (char *) __attribute__((pure)); > int v; > > __SIZE_TYPE__ > baz (int i, int a) > { > int b = bar (&s[i].a[0]); > __SIZE_TYPE__ t = __builtin_strlen ((char *) &s[i]); > v = b; > return t; > } > > where bar may say just int bar (char *p) { return *p; } > The middle-end optimizes this into: > _1 = &s[i_3(D)].a[0]; > b_5 = bar (_1); > t_6 = __builtin_strlen (_1); > > That is why for __builtin_object_size (..., [13]) where we want to > differentiate between that we need to handle it very early, because later on > the distinction is gone. Yep. It would be nice to be able to hold on to the type. In this case it doesn't look like it's a problem since &s[i] is also the address of the first member of struct S and those two need to be interchangeable (this also might need tweaking in the standard). Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-03 3:59 ` Martin Sebor @ 2018-08-03 7:43 ` Jakub Jelinek 2018-08-04 20:52 ` Martin Sebor 0 siblings, 1 reply; 121+ messages in thread From: Jakub Jelinek @ 2018-08-03 7:43 UTC (permalink / raw) To: Martin Sebor; +Cc: Richard Biener, Bernd Edlinger, Jeff Law, GCC Patches On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote: > > If I call this with foo (2, 1), do you still claim it is not valid C? > > String functions like strlen operate on character strings stored > in character arrays. Calling strlen (&s[1]) is invalid because > &s[1] is not the address of a character array. The fact that > objects can be represented as arrays of bytes doesn't change > that. The standard may be somewhat loose with words on this > distinction but the intent certainly isn't for strlen to traverse > arbitrary sequences of bytes that cross subobject boundaries. > (That is the intent behind the raw memory functions, but > the current text doesn't make the distinction clear.) But the standard doesn't say that right now. Plus, at least from the middle-end POV, there is also the case of placement new and stores changing the dynamic type of the object, previously say a struct with two fields, then a placement new with a single char array over it (the placement new will not survive in the middle-end, so it will be just a memcpy or strcpy or some other byte copy over the original object, and due to the CSE/SCCVN etc. of pointer to pointer conversions being in the middle-end useless means you can see a pointer to the struct with two fields rather than pointer to char array. Consider e.g. typedef __typeof__ (sizeof 0) size_t; void *operator new (size_t, void *p) { return p; } void *operator new[] (size_t, void *p) { return p; } struct S { char a; char b[64]; }; void baz (char *); size_t foo (S *p) { baz (&p->a); char *q = new (p) char [16]; baz (q); return __builtin_strlen (q); } I don't think it is correct to say that strlen must be 0. In this testcase the pointer passed to strlen is still S *, though I think with enough tweaking you could also have something where the argument is &p->a. I have no problem for strlen to return 0 if it sees a toplevel object of size 1, but note that if it is extern, it already might be a problem in some cases: struct T { char a; char a2[]; } b; extern struct T c; void foo (int *p) { p[0] = strlen (b); p[1] = strlen (c); } If c's definition is struct T c = { ' ', "abcde" }; then the object doesn't have length of 1. But for subobjects and especially trying to derive anything from what kind of pointer you are called with just doesn't work, unless you do it in the FE only. Jakub ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-03 7:43 ` Jakub Jelinek @ 2018-08-04 20:52 ` Martin Sebor 2018-08-05 6:51 ` Bernd Edlinger ` (2 more replies) 0 siblings, 3 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-04 20:52 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Richard Biener, Bernd Edlinger, Jeff Law, GCC Patches On 08/03/2018 01:43 AM, Jakub Jelinek wrote: > On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote: >>> If I call this with foo (2, 1), do you still claim it is not valid C? >> >> String functions like strlen operate on character strings stored >> in character arrays. Calling strlen (&s[1]) is invalid because >> &s[1] is not the address of a character array. The fact that >> objects can be represented as arrays of bytes doesn't change >> that. The standard may be somewhat loose with words on this >> distinction but the intent certainly isn't for strlen to traverse >> arbitrary sequences of bytes that cross subobject boundaries. >> (That is the intent behind the raw memory functions, but >> the current text doesn't make the distinction clear.) > > But the standard doesn't say that right now. It does, in the restriction on multi-dimensional array accesses. Given the array 'char a[2][2];' it's only valid to access a[0][0] and a[0][1], and a[1][0], and a[1][1]. It's not valid to access a[2][0] or a[2][1], even though they happen to be located at the same addresses as a[1][0] and a[1][1]. There is no exception for distinct struct members. So in a struct { char a[2], b[2]; }, even though a and b and laid out the same way as char[2][2] would be, it's not valid to treat a as such. There is no distinction between array subscripting and pointer arithmetic, so it doesn't matter what form the access takes. Yes, the standard could be clearer. There probably even are ambiguities and contradictions (the authors of the Object Model proposal believe there are and are trying to clarify/remove them). But the intent is clearly there. It's especially important for adjacent members of different types (say a char[8] followed by a function pointer. We definitely don't want writes to the array to be allowed to change the function pointer.) > Plus, at least from the middle-end POV, there is also the case of > placement new and stores changing the dynamic type of the object, > previously say a struct with two fields, then a placement new with a single > char array over it (the placement new will not survive in the middle-end, so > it will be just a memcpy or strcpy or some other byte copy over the original > object, and due to the CSE/SCCVN etc. of pointer to pointer conversions > being in the middle-end useless means you can see a pointer to the struct > with two fields rather than pointer to char array. There may be challenges in the middle-end, you would know much better than me. All I'm saying is that it's not valid to access [sub]objects by dereferencing pointers to other subobjects. All the examples in this discussion have been of that form. > > Consider e.g. > typedef __typeof__ (sizeof 0) size_t; > void *operator new (size_t, void *p) { return p; } > void *operator new[] (size_t, void *p) { return p; } > struct S { char a; char b[64]; }; > void baz (char *); > > size_t > foo (S *p) > { > baz (&p->a); > char *q = new (p) char [16]; > baz (q); > return __builtin_strlen (q); > } > > I don't think it is correct to say that strlen must be 0. In this testcase > the pointer passed to strlen is still S *, though I think with enough > tweaking you could also have something where the argument is &p->a. I think the problem here is changing the type of p->a. I'm not up on the latest C++ changes here but I think it's a known problem with the specification. A similar (known) problem also comes in the case of dynamically allocated objects: char *p = (char*)operator new (2); char *p1 = new (p) char ('a'); char *p2 = new (p) char ('\0'); strlen (p1); Is the strlen(p) call valid when there's no string or array at p: there is a singlelton char object that just happens to be followed by another singleton char object. It's not an array of two elements. Each is [an array of] one char. This is a (specification) problem for sequence containers like vector where strictly speaking, it's not valid to iterate over them because of the array restriction. > > I have no problem for strlen to return 0 if it sees a toplevel object of > size 1, but note that if it is extern, it already might be a problem in some > cases: > struct T { char a; char a2[]; } b; > extern struct T c; > void foo (int *p) { p[0] = strlen (b); p[1] = strlen (c); } > If c's definition is struct T c = { ' ', "abcde" }; > then the object doesn't have length of 1. I'm assuming above you meant strlen(&b) and strlen(&c) (or equivalently, strlen(&b.a) and strlen(&c.a). If so, it's the same problem. The strlen call is invalid unless b.a and c.a are nul. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-04 20:52 ` Martin Sebor @ 2018-08-05 6:51 ` Bernd Edlinger 2018-08-05 15:49 ` Jeff Law 2018-08-05 17:00 ` Jeff Law 2018-08-05 17:27 ` Richard Biener 2 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-05 6:51 UTC (permalink / raw) To: Martin Sebor, Jakub Jelinek; +Cc: Richard Biener, Jeff Law, GCC Patches On 08/04/18 22:52, Martin Sebor wrote: > On 08/03/2018 01:43 AM, Jakub Jelinek wrote: >> On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote: >>>> If I call this with foo (2, 1), do you still claim it is not valid C? >>> >>> String functions like strlen operate on character strings stored >>> in character arrays. Calling strlen (&s[1]) is invalid because >>> &s[1] is not the address of a character array. The fact that >>> objects can be represented as arrays of bytes doesn't change >>> that. The standard may be somewhat loose with words on this >>> distinction but the intent certainly isn't for strlen to traverse >>> arbitrary sequences of bytes that cross subobject boundaries. >>> (That is the intent behind the raw memory functions, but >>> the current text doesn't make the distinction clear.) >> >> But the standard doesn't say that right now. > > It does, in the restriction on multi-dimensional array accesses. > Given the array 'char a[2][2];' it's only valid to access a[0][0] > and a[0][1], and a[1][0], and a[1][1]. It's not valid to access > a[2][0] or a[2][1], even though they happen to be located at > the same addresses as a[1][0] and a[1][1]. > > There is no exception for distinct struct members. So in > a struct { char a[2], b[2]; }, even though a and b and laid > out the same way as char[2][2] would be, it's not valid to > treat a as such. There is no distinction between array > subscripting and pointer arithmetic, so it doesn't matter > what form the access takes. > > Yes, the standard could be clearer. There probably even are > ambiguities and contradictions (the authors of the Object Model > proposal believe there are and are trying to clarify/remove > them). But the intent is clearly there. It's especially > important for adjacent members of different types (say a char[8] > followed by a function pointer. We definitely don't want writes > to the array to be allowed to change the function pointer.) > >> Plus, at least from the middle-end POV, there is also the case of >> placement new and stores changing the dynamic type of the object, >> previously say a struct with two fields, then a placement new with a single >> char array over it (the placement new will not survive in the middle-end, so >> it will be just a memcpy or strcpy or some other byte copy over the original >> object, and due to the CSE/SCCVN etc. of pointer to pointer conversions >> being in the middle-end useless means you can see a pointer to the struct >> with two fields rather than pointer to char array. > > There may be challenges in the middle-end, you would know much > better than me. All I'm saying is that it's not valid to access > [sub]objects by dereferencing pointers to other subobjects. All > the examples in this discussion have been of that form. > These examples do not aim to be valid C, they just point out limitations of the middle-end design, and a good deal of the problems are due to trying to do things that are not safe within the boundaries given by the middle-end design. Bernd. >> >> Consider e.g. >> typedef __typeof__ (sizeof 0) size_t; >> void *operator new (size_t, void *p) { return p; } >> void *operator new[] (size_t, void *p) { return p; } >> struct S { char a; char b[64]; }; >> void baz (char *); >> >> size_t >> foo (S *p) >> { >> baz (&p->a); >> char *q = new (p) char [16]; >> baz (q); >> return __builtin_strlen (q); >> } >> >> I don't think it is correct to say that strlen must be 0. In this testcase >> the pointer passed to strlen is still S *, though I think with enough >> tweaking you could also have something where the argument is &p->a. > > I think the problem here is changing the type of p->a. I'm > not up on the latest C++ changes here but I think it's a known > problem with the specification. A similar (known) problem also > comes in the case of dynamically allocated objects: > > char *p = (char*)operator new (2); > char *p1 = new (p) char ('a'); > char *p2 = new (p) char ('\0'); > strlen (p1); > > Is the strlen(p) call valid when there's no string or array > at p: there is a singlelton char object that just happens > to be followed by another singleton char object. It's not > an array of two elements. Each is [an array of] one char. > > This is a (specification) problem for sequence containers like > vector where strictly speaking, it's not valid to iterate over > them because of the array restriction. > >> >> I have no problem for strlen to return 0 if it sees a toplevel object of >> size 1, but note that if it is extern, it already might be a problem in some >> cases: >> struct T { char a; char a2[]; } b; >> extern struct T c; >> void foo (int *p) { p[0] = strlen (b); p[1] = strlen (c); } >> If c's definition is struct T c = { ' ', "abcde" }; >> then the object doesn't have length of 1. > > I'm assuming above you meant strlen(&b) and strlen(&c) (or > equivalently, strlen(&b.a) and strlen(&c.a). If so, it's > the same problem. The strlen call is invalid unless b.a and > c.a are nul. > > Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-05 6:51 ` Bernd Edlinger @ 2018-08-05 15:49 ` Jeff Law 2018-08-06 17:15 ` Martin Sebor 0 siblings, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-08-05 15:49 UTC (permalink / raw) To: Bernd Edlinger, Martin Sebor, Jakub Jelinek; +Cc: Richard Biener, GCC Patches On 08/05/2018 12:51 AM, Bernd Edlinger wrote: > On 08/04/18 22:52, Martin Sebor wrote: >> On 08/03/2018 01:43 AM, Jakub Jelinek wrote: >>> On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote: >>>>> If I call this with foo (2, 1), do you still claim it is not valid C? >>>> >>>> String functions like strlen operate on character strings stored >>>> in character arrays. Calling strlen (&s[1]) is invalid because >>>> &s[1] is not the address of a character array. The fact that >>>> objects can be represented as arrays of bytes doesn't change >>>> that. The standard may be somewhat loose with words on this >>>> distinction but the intent certainly isn't for strlen to traverse >>>> arbitrary sequences of bytes that cross subobject boundaries. >>>> (That is the intent behind the raw memory functions, but >>>> the current text doesn't make the distinction clear.) >>> >>> But the standard doesn't say that right now. >> >> It does, in the restriction on multi-dimensional array accesses. >> Given the array 'char a[2][2];' it's only valid to access a[0][0] >> and a[0][1], and a[1][0], and a[1][1]. It's not valid to access >> a[2][0] or a[2][1], even though they happen to be located at >> the same addresses as a[1][0] and a[1][1]. >> >> There is no exception for distinct struct members. So in >> a struct { char a[2], b[2]; }, even though a and b and laid >> out the same way as char[2][2] would be, it's not valid to >> treat a as such. There is no distinction between array >> subscripting and pointer arithmetic, so it doesn't matter >> what form the access takes. >> >> Yes, the standard could be clearer. There probably even are >> ambiguities and contradictions (the authors of the Object Model >> proposal believe there are and are trying to clarify/remove >> them). But the intent is clearly there. It's especially >> important for adjacent members of different types (say a char[8] >> followed by a function pointer. We definitely don't want writes >> to the array to be allowed to change the function pointer.) >> >>> Plus, at least from the middle-end POV, there is also the case of >>> placement new and stores changing the dynamic type of the object, >>> previously say a struct with two fields, then a placement new with a single >>> char array over it (the placement new will not survive in the middle-end, so >>> it will be just a memcpy or strcpy or some other byte copy over the original >>> object, and due to the CSE/SCCVN etc. of pointer to pointer conversions >>> being in the middle-end useless means you can see a pointer to the struct >>> with two fields rather than pointer to char array. >> >> There may be challenges in the middle-end, you would know much >> better than me. All I'm saying is that it's not valid to access >> [sub]objects by dereferencing pointers to other subobjects. All >> the examples in this discussion have been of that form. >> > > These examples do not aim to be valid C, they just point out limitations > of the middle-end design, and a good deal of the problems are due > to trying to do things that are not safe within the boundaries given > by the middle-end design. I really think this is important -- and as such I think we need to move away from trying to describe scenarios in C because doing so keeps bringing us back to the "C doesn't allow XYZ" kinds of arguments when what we're really discussing are GIMPLE semantic issues. So examples should be GIMPLE. You might start with (possibly invalid) C code to generate the GIMPLE, but the actual discussion needs to be looking at GIMPLE. We might include the C code in case someone wants to look at things in a debugger, but bringing the focus to GIMPLE is really important here. jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-05 15:49 ` Jeff Law @ 2018-08-06 17:15 ` Martin Sebor 2018-08-06 17:40 ` Jeff Law 2018-08-06 22:39 ` Jeff Law 0 siblings, 2 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-06 17:15 UTC (permalink / raw) To: Jeff Law, Bernd Edlinger, Jakub Jelinek; +Cc: Richard Biener, GCC Patches >> These examples do not aim to be valid C, they just point out limitations >> of the middle-end design, and a good deal of the problems are due >> to trying to do things that are not safe within the boundaries given >> by the middle-end design. > I really think this is important -- and as such I think we need to move > away from trying to describe scenarios in C because doing so keeps > bringing us back to the "C doesn't allow XYZ" kinds of arguments when > what we're really discussing are GIMPLE semantic issues. > > So examples should be GIMPLE. You might start with (possibly invalid) C > code to generate the GIMPLE, but the actual discussion needs to be > looking at GIMPLE. We might include the C code in case someone wants to > look at things in a debugger, but bringing the focus to GIMPLE is really > important here. I don't understand the goal of this exercise. Unless the GIMPLE code is the result of a valid test case (in some language GCC supports), what does it matter what it looks like? The basis of every single transformation done by a compiler is that the source code is correct. If it isn't then all bets are off. I'm no GIMPLE expert but even I can come up with any number of GIMPLE expressions that have undefined behavior. What would that prove? But let me try anyway. Here's a simplified (and gimplified) version of the test case that started this debate: struct S { char a[4], b; }; f (struct S * p) { int D.1908; _1 = &p->a; _2 = __builtin_strlen (_1); // strlen (p->a); D.1908 = (int) _2; return D.1908; } and one involving a pointer: g (struct S * p) { int D.1910; char * q; q = &p->a; _1 = __builtin_strlen (q); D.1910 = (int) _1; return D.1910; } and another one involving a pointer and strcpy and _FORTIFY_SOURCE: h (struct S * p) { int D.2208; char * q; q = &p->a; _1 = strcpy (q, "1234"); _2 = (long int) _1; D.2208 = (int) _2; return D.2208; } with strcpy defined as: __attribute__((artificial, gnu_inline, always_inline, leaf, nothrow)) strcpy (char * restrict __dest, const char * restrict __src) { char * D.2210; _1 = __builtin_object_size (__dest, 1); D.2210 = __builtin___strcpy_chk (__dest, __src, _1); return D.2210; } What does this show? AFAICS, all three functions are equivalent GIMPLE, yet I'm being told that the first one is different in some important detail from the second, and that even though it's the same as the third and even though it's good to have __strcpy_chk() abort in the third case it's bad for the strlen() call to return a value constrained to [0, 3]. Would defining strlen like so __attribute__((artificial, gnu_inline, always_inline, leaf, nothrow)) strlen (const char * __src) { char * D.2210; _1 = __builtin_object_size (__src, 1); D.2210 = __builtin___strlen_chk (__src, _1); return D.2210; } and having __strlen_chk() abort if __strc were longer than _1 be also bad? (If not -- I sincerely hope that's the answer -- then I'll be happy to put together a patch for that. In fact, I think it would be useful to extend this to all string functions (i.e., have them all abort on reads past the end, just as they abort on writes). Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-06 17:15 ` Martin Sebor @ 2018-08-06 17:40 ` Jeff Law 2018-08-07 3:39 ` Martin Sebor 2018-08-06 22:39 ` Jeff Law 1 sibling, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-08-06 17:40 UTC (permalink / raw) To: Martin Sebor, Bernd Edlinger, Jakub Jelinek; +Cc: Richard Biener, GCC Patches On 08/06/2018 11:15 AM, Martin Sebor wrote: >>> These examples do not aim to be valid C, they just point out limitations >>> of the middle-end design, and a good deal of the problems are due >>> to trying to do things that are not safe within the boundaries given >>> by the middle-end design. >> I really think this is important -- and as such I think we need to move >> away from trying to describe scenarios in C because doing so keeps >> bringing us back to the "C doesn't allow XYZ" kinds of arguments when >> what we're really discussing are GIMPLE semantic issues. >> >> So examples should be GIMPLE. You might start with (possibly invalid) C >> code to generate the GIMPLE, but the actual discussion needs to be >> looking at GIMPLE. We might include the C code in case someone wants to >> look at things in a debugger, but bringing the focus to GIMPLE is really >> important here. > > I don't understand the goal of this exercise. Unless the GIMPLE > code is the result of a valid test case (in some language GCC > supports), what does it matter what it looks like? The basis of > every single transformation done by a compiler is that the source > code is correct. If it isn't then all bets are off. I'm no GIMPLE > expert but even I can come up with any number of GIMPLE expressions > that have undefined behavior. What would that prove? The GIMPLE IL is less restrictive than the original source language. The process of translation into GIMPLE and optimization can create situations in the GIMPLE IL that can't be validly represented in the original source language. Subobject crossing being one such case, there are certainly others. We have to handle these scenarios correctly. My favorite from a long time ago was the RTL loop optimizer creating a pointer well outside the bounds of an object. That pointer was then used in a reg+d addressing mode where the displacement brought the final effective address back into the bounds of the object. You can't validly do that in C/C++, but it was certainly valid RTL and it was useful to allow creation of such pointers which were outside the bounds of the object. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-06 17:40 ` Jeff Law @ 2018-08-07 3:39 ` Martin Sebor 2018-08-07 5:45 ` Richard Biener 2018-08-07 15:32 ` Jeff Law 0 siblings, 2 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-07 3:39 UTC (permalink / raw) To: Jeff Law, Bernd Edlinger, Jakub Jelinek; +Cc: Richard Biener, GCC Patches On 08/06/2018 11:40 AM, Jeff Law wrote: > On 08/06/2018 11:15 AM, Martin Sebor wrote: >>>> These examples do not aim to be valid C, they just point out limitations >>>> of the middle-end design, and a good deal of the problems are due >>>> to trying to do things that are not safe within the boundaries given >>>> by the middle-end design. >>> I really think this is important -- and as such I think we need to move >>> away from trying to describe scenarios in C because doing so keeps >>> bringing us back to the "C doesn't allow XYZ" kinds of arguments when >>> what we're really discussing are GIMPLE semantic issues. >>> >>> So examples should be GIMPLE. You might start with (possibly invalid) C >>> code to generate the GIMPLE, but the actual discussion needs to be >>> looking at GIMPLE. We might include the C code in case someone wants to >>> look at things in a debugger, but bringing the focus to GIMPLE is really >>> important here. >> >> I don't understand the goal of this exercise. Unless the GIMPLE >> code is the result of a valid test case (in some language GCC >> supports), what does it matter what it looks like? The basis of >> every single transformation done by a compiler is that the source >> code is correct. If it isn't then all bets are off. I'm no GIMPLE >> expert but even I can come up with any number of GIMPLE expressions >> that have undefined behavior. What would that prove? > The GIMPLE IL is less restrictive than the original source language. > The process of translation into GIMPLE and optimization can create > situations in the GIMPLE IL that can't be validly represented in the > original source language. Subobject crossing being one such case, there > are certainly others. We have to handle these scenarios correctly. Sure, but a valid C test case still needs to exist to show that such a transformation is possible. Until someone comes up with one it's all speculation. Under normal circumstances the burden of proof that there is a problem is on the reporter. In this case, the requirement has turned into one to prove a negative. Effectively, you are asking for a proof that there is no bug, either in the assumptions behind the strlen optimization, or somewhere else in GCC that would lead the optimization to invalidate a valid piece of code. That's impossible. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-07 3:39 ` Martin Sebor @ 2018-08-07 5:45 ` Richard Biener 2018-08-07 15:02 ` Martin Sebor 2018-08-07 15:32 ` Jeff Law 1 sibling, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-08-07 5:45 UTC (permalink / raw) To: Martin Sebor, Jeff Law, Bernd Edlinger, Jakub Jelinek; +Cc: GCC Patches On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >On 08/06/2018 11:40 AM, Jeff Law wrote: >> On 08/06/2018 11:15 AM, Martin Sebor wrote: >>>>> These examples do not aim to be valid C, they just point out >limitations >>>>> of the middle-end design, and a good deal of the problems are due >>>>> to trying to do things that are not safe within the boundaries >given >>>>> by the middle-end design. >>>> I really think this is important -- and as such I think we need to >move >>>> away from trying to describe scenarios in C because doing so keeps >>>> bringing us back to the "C doesn't allow XYZ" kinds of arguments >when >>>> what we're really discussing are GIMPLE semantic issues. >>>> >>>> So examples should be GIMPLE. You might start with (possibly >invalid) C >>>> code to generate the GIMPLE, but the actual discussion needs to be >>>> looking at GIMPLE. We might include the C code in case someone >wants to >>>> look at things in a debugger, but bringing the focus to GIMPLE is >really >>>> important here. >>> >>> I don't understand the goal of this exercise. Unless the GIMPLE >>> code is the result of a valid test case (in some language GCC >>> supports), what does it matter what it looks like? The basis of >>> every single transformation done by a compiler is that the source >>> code is correct. If it isn't then all bets are off. I'm no GIMPLE >>> expert but even I can come up with any number of GIMPLE expressions >>> that have undefined behavior. What would that prove? >> The GIMPLE IL is less restrictive than the original source language. >> The process of translation into GIMPLE and optimization can create >> situations in the GIMPLE IL that can't be validly represented in the >> original source language. Subobject crossing being one such case, >there >> are certainly others. We have to handle these scenarios correctly. > >Sure, but a valid C test case still needs to exist to show that >such a transformation is possible. Until someone comes up with >one it's all speculation. Jakub showed you one wrt CSE of addresses. Richard. >Under normal circumstances the burden of proof that there is >a problem is on the reporter. In this case, the requirement >has turned into one to prove a negative. Effectively, you >are asking for a proof that there is no bug, either in >the assumptions behind the strlen optimization, or somewhere >else in GCC that would lead the optimization to invalidate >a valid piece of code. That's impossible. > >Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-07 5:45 ` Richard Biener @ 2018-08-07 15:02 ` Martin Sebor 2018-08-07 15:33 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-07 15:02 UTC (permalink / raw) To: Richard Biener, Jeff Law, Bernd Edlinger, Jakub Jelinek; +Cc: GCC Patches On 08/06/2018 11:45 PM, Richard Biener wrote: > On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >> On 08/06/2018 11:40 AM, Jeff Law wrote: >>> On 08/06/2018 11:15 AM, Martin Sebor wrote: >>>>>> These examples do not aim to be valid C, they just point out >> limitations >>>>>> of the middle-end design, and a good deal of the problems are due >>>>>> to trying to do things that are not safe within the boundaries >> given >>>>>> by the middle-end design. >>>>> I really think this is important -- and as such I think we need to >> move >>>>> away from trying to describe scenarios in C because doing so keeps >>>>> bringing us back to the "C doesn't allow XYZ" kinds of arguments >> when >>>>> what we're really discussing are GIMPLE semantic issues. >>>>> >>>>> So examples should be GIMPLE. You might start with (possibly >> invalid) C >>>>> code to generate the GIMPLE, but the actual discussion needs to be >>>>> looking at GIMPLE. We might include the C code in case someone >> wants to >>>>> look at things in a debugger, but bringing the focus to GIMPLE is >> really >>>>> important here. >>>> >>>> I don't understand the goal of this exercise. Unless the GIMPLE >>>> code is the result of a valid test case (in some language GCC >>>> supports), what does it matter what it looks like? The basis of >>>> every single transformation done by a compiler is that the source >>>> code is correct. If it isn't then all bets are off. I'm no GIMPLE >>>> expert but even I can come up with any number of GIMPLE expressions >>>> that have undefined behavior. What would that prove? >>> The GIMPLE IL is less restrictive than the original source language. >>> The process of translation into GIMPLE and optimization can create >>> situations in the GIMPLE IL that can't be validly represented in the >>> original source language. Subobject crossing being one such case, >> there >>> are certainly others. We have to handle these scenarios correctly. >> >> Sure, but a valid C test case still needs to exist to show that >> such a transformation is possible. Until someone comes up with >> one it's all speculation. > > Jakub showed you one wrt CSE of addresses. Sorry, there have been so many examples I've lost track. Can you please copy it here or point to it in the archive? In any event, I would find it reasonable for the strlen optimization to be subject to the same constraints as the aggressive loop optimization. If there are valid test cases where the strlen optimization goes beyond that then let's throttle those. Doing more than that would be arbitrary and result in confusing inconsistencies (as the proposed patch does). For example, these two equivalent functions should continue to result in the same optimal code: extern char b[2][4]; void f (int i) { if (__builtin_strlen (b[i]) >= sizeof b[i]) __builtin_abort (); } void g (int i) { unsigned n = 0; while (b[i][n]) ++n; if (n >= sizeof b[i]) __builtin_abort (); } Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-07 15:02 ` Martin Sebor @ 2018-08-07 15:33 ` Bernd Edlinger 2018-08-07 16:31 ` Martin Sebor 0 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-07 15:33 UTC (permalink / raw) To: Martin Sebor, Richard Biener, Jeff Law, Jakub Jelinek; +Cc: GCC Patches On 08/07/18 17:02, Martin Sebor wrote: > On 08/06/2018 11:45 PM, Richard Biener wrote: >> On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >>> On 08/06/2018 11:40 AM, Jeff Law wrote: >>>> On 08/06/2018 11:15 AM, Martin Sebor wrote: >>>>>>> These examples do not aim to be valid C, they just point out >>> limitations >>>>>>> of the middle-end design, and a good deal of the problems are due >>>>>>> to trying to do things that are not safe within the boundaries >>> given >>>>>>> by the middle-end design. >>>>>> I really think this is important -- and as such I think we need to >>> move >>>>>> away from trying to describe scenarios in C because doing so keeps >>>>>> bringing us back to the "C doesn't allow XYZ" kinds of arguments >>> when >>>>>> what we're really discussing are GIMPLE semantic issues. >>>>>> >>>>>> So examples should be GIMPLE. You might start with (possibly >>> invalid) C >>>>>> code to generate the GIMPLE, but the actual discussion needs to be >>>>>> looking at GIMPLE. We might include the C code in case someone >>> wants to >>>>>> look at things in a debugger, but bringing the focus to GIMPLE is >>> really >>>>>> important here. >>>>> >>>>> I don't understand the goal of this exercise. Unless the GIMPLE >>>>> code is the result of a valid test case (in some language GCC >>>>> supports), what does it matter what it looks like? The basis of >>>>> every single transformation done by a compiler is that the source >>>>> code is correct. If it isn't then all bets are off. I'm no GIMPLE >>>>> expert but even I can come up with any number of GIMPLE expressions >>>>> that have undefined behavior. What would that prove? >>>> The GIMPLE IL is less restrictive than the original source language. >>>> The process of translation into GIMPLE and optimization can create >>>> situations in the GIMPLE IL that can't be validly represented in the >>>> original source language. Subobject crossing being one such case, >>> there >>>> are certainly others. We have to handle these scenarios correctly. >>> >>> Sure, but a valid C test case still needs to exist to show that >>> such a transformation is possible. Until someone comes up with >>> one it's all speculation. >> >> Jakub showed you one wrt CSE of addresses. > > Sorry, there have been so many examples I've lost track. Can > you please copy it here or point to it in the archive? > This is based on Jakubs example here: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00260.html $ cat y.cc typedef __typeof__ (sizeof 0) size_t; void *operator new (size_t, void *p) { return p; } void *operator new[] (size_t, void *p) { return p; } struct S { int x; unsigned char a[1]; char b[64]; }; void baz (char *); size_t foo (S *p) { char *q = new ((char*)p->a) char [16]; baz (q); size_t x = __builtin_strlen (q); if (x==0) __builtin_abort(); return x; } $ gcc -O3 -S y.cc $ cat y.s .LFB2: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 addq $4, %rdi call _Z3bazPc call abort .cfi_endproc I think this is not a correct optimization. Bernd. > In any event, I would find it reasonable for the strlen > optimization to be subject to the same constraints as > the aggressive loop optimization. If there are valid test > cases where the strlen optimization goes beyond that then > let's throttle those. Doing more than that would be > arbitrary and result in confusing inconsistencies (as > the proposed patch does). For example, these two equivalent > functions should continue to result in the same optimal code: > > extern char b[2][4]; > > void f (int i) > { > if (__builtin_strlen (b[i]) >= sizeof b[i]) > __builtin_abort (); > } > > void g (int i) > { > unsigned n = 0; > while (b[i][n]) > ++n; > if (n >= sizeof b[i]) > __builtin_abort (); > } > > Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-07 15:33 ` Bernd Edlinger @ 2018-08-07 16:31 ` Martin Sebor 2018-08-07 17:46 ` Richard Biener 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-07 16:31 UTC (permalink / raw) To: Bernd Edlinger, Richard Biener, Jeff Law, Jakub Jelinek; +Cc: GCC Patches On 08/07/2018 09:33 AM, Bernd Edlinger wrote: > On 08/07/18 17:02, Martin Sebor wrote: >> On 08/06/2018 11:45 PM, Richard Biener wrote: >>> On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >>>> On 08/06/2018 11:40 AM, Jeff Law wrote: >>>>> On 08/06/2018 11:15 AM, Martin Sebor wrote: >>>>>>>> These examples do not aim to be valid C, they just point out >>>> limitations >>>>>>>> of the middle-end design, and a good deal of the problems are due >>>>>>>> to trying to do things that are not safe within the boundaries >>>> given >>>>>>>> by the middle-end design. >>>>>>> I really think this is important -- and as such I think we need to >>>> move >>>>>>> away from trying to describe scenarios in C because doing so keeps >>>>>>> bringing us back to the "C doesn't allow XYZ" kinds of arguments >>>> when >>>>>>> what we're really discussing are GIMPLE semantic issues. >>>>>>> >>>>>>> So examples should be GIMPLE. You might start with (possibly >>>> invalid) C >>>>>>> code to generate the GIMPLE, but the actual discussion needs to be >>>>>>> looking at GIMPLE. We might include the C code in case someone >>>> wants to >>>>>>> look at things in a debugger, but bringing the focus to GIMPLE is >>>> really >>>>>>> important here. >>>>>> >>>>>> I don't understand the goal of this exercise. Unless the GIMPLE >>>>>> code is the result of a valid test case (in some language GCC >>>>>> supports), what does it matter what it looks like? The basis of >>>>>> every single transformation done by a compiler is that the source >>>>>> code is correct. If it isn't then all bets are off. I'm no GIMPLE >>>>>> expert but even I can come up with any number of GIMPLE expressions >>>>>> that have undefined behavior. What would that prove? >>>>> The GIMPLE IL is less restrictive than the original source language. >>>>> The process of translation into GIMPLE and optimization can create >>>>> situations in the GIMPLE IL that can't be validly represented in the >>>>> original source language. Subobject crossing being one such case, >>>> there >>>>> are certainly others. We have to handle these scenarios correctly. >>>> >>>> Sure, but a valid C test case still needs to exist to show that >>>> such a transformation is possible. Until someone comes up with >>>> one it's all speculation. >>> >>> Jakub showed you one wrt CSE of addresses. >> >> Sorry, there have been so many examples I've lost track. Can >> you please copy it here or point to it in the archive? >> > > This is based on Jakubs example here: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00260.html > > $ cat y.cc > typedef __typeof__ (sizeof 0) size_t; > void *operator new (size_t, void *p) { return p; } > void *operator new[] (size_t, void *p) { return p; } > struct S { int x; unsigned char a[1]; char b[64]; }; > void baz (char *); > > size_t > foo (S *p) > { > char *q = new ((char*)p->a) char [16]; > baz (q); > size_t x = __builtin_strlen (q); > if (x==0) > __builtin_abort(); > return x; > } > > $ gcc -O3 -S y.ccup > $ cat y.s > .LFB2: > .cfi_startproc > subq $8, %rsp > .cfi_def_cfa_offset 16 > addq $4, %rdi > call _Z3bazPc > call abort > .cfi_endproc > > > > I think this is not a correct optimization. I see. This narrows it down to the placement new expression exposing the type of the original object rather than that of the newly constructed object. We end up with strlen (_2) where _2 = &p_1(D)->a. The aggressive loop optimization trigger in this case because the access has been transformed to MEM[(char *)p_5(D) + 4B] early on which obviates the structure of the accessed type. This is the case that I think is worth solving -- ideally not by removing the optimization but by preserving the conversion to the type of the newly constructed object. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-07 16:31 ` Martin Sebor @ 2018-08-07 17:46 ` Richard Biener 2018-08-08 15:51 ` Martin Sebor 0 siblings, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-08-07 17:46 UTC (permalink / raw) To: Martin Sebor, Bernd Edlinger, Jeff Law, Jakub Jelinek; +Cc: GCC Patches On August 7, 2018 6:31:36 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >On 08/07/2018 09:33 AM, Bernd Edlinger wrote: >> On 08/07/18 17:02, Martin Sebor wrote: >>> On 08/06/2018 11:45 PM, Richard Biener wrote: >>>> On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor ><msebor@gmail.com> wrote: >>>>> On 08/06/2018 11:40 AM, Jeff Law wrote: >>>>>> On 08/06/2018 11:15 AM, Martin Sebor wrote: >>>>>>>>> These examples do not aim to be valid C, they just point out >>>>> limitations >>>>>>>>> of the middle-end design, and a good deal of the problems are >due >>>>>>>>> to trying to do things that are not safe within the boundaries >>>>> given >>>>>>>>> by the middle-end design. >>>>>>>> I really think this is important -- and as such I think we need >to >>>>> move >>>>>>>> away from trying to describe scenarios in C because doing so >keeps >>>>>>>> bringing us back to the "C doesn't allow XYZ" kinds of >arguments >>>>> when >>>>>>>> what we're really discussing are GIMPLE semantic issues. >>>>>>>> >>>>>>>> So examples should be GIMPLE. You might start with (possibly >>>>> invalid) C >>>>>>>> code to generate the GIMPLE, but the actual discussion needs to >be >>>>>>>> looking at GIMPLE. We might include the C code in case someone >>>>> wants to >>>>>>>> look at things in a debugger, but bringing the focus to GIMPLE >is >>>>> really >>>>>>>> important here. >>>>>>> >>>>>>> I don't understand the goal of this exercise. Unless the GIMPLE >>>>>>> code is the result of a valid test case (in some language GCC >>>>>>> supports), what does it matter what it looks like? The basis of >>>>>>> every single transformation done by a compiler is that the >source >>>>>>> code is correct. If it isn't then all bets are off. I'm no >GIMPLE >>>>>>> expert but even I can come up with any number of GIMPLE >expressions >>>>>>> that have undefined behavior. What would that prove? >>>>>> The GIMPLE IL is less restrictive than the original source >language. >>>>>> The process of translation into GIMPLE and optimization can >create >>>>>> situations in the GIMPLE IL that can't be validly represented in >the >>>>>> original source language. Subobject crossing being one such >case, >>>>> there >>>>>> are certainly others. We have to handle these scenarios >correctly. >>>>> >>>>> Sure, but a valid C test case still needs to exist to show that >>>>> such a transformation is possible. Until someone comes up with >>>>> one it's all speculation. >>>> >>>> Jakub showed you one wrt CSE of addresses. >>> >>> Sorry, there have been so many examples I've lost track. Can >>> you please copy it here or point to it in the archive? >>> >> >> This is based on Jakubs example here: >https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00260.html >> >> $ cat y.cc >> typedef __typeof__ (sizeof 0) size_t; >> void *operator new (size_t, void *p) { return p; } >> void *operator new[] (size_t, void *p) { return p; } >> struct S { int x; unsigned char a[1]; char b[64]; }; >> void baz (char *); >> >> size_t >> foo (S *p) >> { >> char *q = new ((char*)p->a) char [16]; >> baz (q); >> size_t x = __builtin_strlen (q); >> if (x==0) >> __builtin_abort(); >> return x; >> } >> >> $ gcc -O3 -S y.ccup >> $ cat y.s >> .LFB2: >> .cfi_startproc >> subq $8, %rsp >> .cfi_def_cfa_offset 16 >> addq $4, %rdi >> call _Z3bazPc >> call abort >> .cfi_endproc >> >> >> >> I think this is not a correct optimization. > >I see. This narrows it down to the placement new expression >exposing the type of the original object rather than that of >the newly constructed object. We end up with strlen (_2) >where _2 = &p_1(D)->a. > >The aggressive loop optimization trigger in this case because >the access has been transformed to MEM[(char *)p_5(D) + 4B] >early on which obviates the structure of the accessed type. > >This is the case that I think is worth solving -- ideally not >by removing the optimization but by preserving the conversion >to the type of the newly constructed object. Pointer types carry no information in GIMPLE. Richard. > >Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-07 17:46 ` Richard Biener @ 2018-08-08 15:51 ` Martin Sebor 2018-08-08 16:12 ` Bernd Edlinger 2018-08-08 17:19 ` Richard Biener 0 siblings, 2 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-08 15:51 UTC (permalink / raw) To: Richard Biener, Bernd Edlinger, Jeff Law, Jakub Jelinek; +Cc: GCC Patches On 08/07/2018 11:46 AM, Richard Biener wrote: > On August 7, 2018 6:31:36 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >> On 08/07/2018 09:33 AM, Bernd Edlinger wrote: >>> On 08/07/18 17:02, Martin Sebor wrote: >>>> On 08/06/2018 11:45 PM, Richard Biener wrote: >>>>> On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor >> <msebor@gmail.com> wrote: >>>>>> On 08/06/2018 11:40 AM, Jeff Law wrote: >>>>>>> On 08/06/2018 11:15 AM, Martin Sebor wrote: >>>>>>>>>> These examples do not aim to be valid C, they just point out >>>>>> limitations >>>>>>>>>> of the middle-end design, and a good deal of the problems are >> due >>>>>>>>>> to trying to do things that are not safe within the boundaries >>>>>> given >>>>>>>>>> by the middle-end design. >>>>>>>>> I really think this is important -- and as such I think we need >> to >>>>>> move >>>>>>>>> away from trying to describe scenarios in C because doing so >> keeps >>>>>>>>> bringing us back to the "C doesn't allow XYZ" kinds of >> arguments >>>>>> when >>>>>>>>> what we're really discussing are GIMPLE semantic issues. >>>>>>>>> >>>>>>>>> So examples should be GIMPLE. You might start with (possibly >>>>>> invalid) C >>>>>>>>> code to generate the GIMPLE, but the actual discussion needs to >> be >>>>>>>>> looking at GIMPLE. We might include the C code in case someone >>>>>> wants to >>>>>>>>> look at things in a debugger, but bringing the focus to GIMPLE >> is >>>>>> really >>>>>>>>> important here. >>>>>>>> >>>>>>>> I don't understand the goal of this exercise. Unless the GIMPLE >>>>>>>> code is the result of a valid test case (in some language GCC >>>>>>>> supports), what does it matter what it looks like? The basis of >>>>>>>> every single transformation done by a compiler is that the >> source >>>>>>>> code is correct. If it isn't then all bets are off. I'm no >> GIMPLE >>>>>>>> expert but even I can come up with any number of GIMPLE >> expressions >>>>>>>> that have undefined behavior. What would that prove? >>>>>>> The GIMPLE IL is less restrictive than the original source >> language. >>>>>>> The process of translation into GIMPLE and optimization can >> create >>>>>>> situations in the GIMPLE IL that can't be validly represented in >> the >>>>>>> original source language. Subobject crossing being one such >> case, >>>>>> there >>>>>>> are certainly others. We have to handle these scenarios >> correctly. >>>>>> >>>>>> Sure, but a valid C test case still needs to exist to show that >>>>>> such a transformation is possible. Until someone comes up with >>>>>> one it's all speculation. >>>>> >>>>> Jakub showed you one wrt CSE of addresses. >>>> >>>> Sorry, there have been so many examples I've lost track. Can >>>> you please copy it here or point to it in the archive? >>>> >>> >>> This is based on Jakubs example here: >> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00260.html >>> >>> $ cat y.cc >>> typedef __typeof__ (sizeof 0) size_t; >>> void *operator new (size_t, void *p) { return p; } >>> void *operator new[] (size_t, void *p) { return p; } >>> struct S { int x; unsigned char a[1]; char b[64]; }; >>> void baz (char *); >>> >>> size_t >>> foo (S *p) >>> { >>> char *q = new ((char*)p->a) char [16]; >>> baz (q); >>> size_t x = __builtin_strlen (q); >>> if (x==0) >>> __builtin_abort(); >>> return x; >>> } >>> >>> $ gcc -O3 -S y.ccup >>> $ cat y.s >>> .LFB2: >>> .cfi_startproc >>> subq $8, %rsp >>> .cfi_def_cfa_offset 16 >>> addq $4, %rdi >>> call _Z3bazPc >>> call abort >>> .cfi_endproc >>> >>> >>> >>> I think this is not a correct optimization. >> >> I see. This narrows it down to the placement new expression >> exposing the type of the original object rather than that of >> the newly constructed object. We end up with strlen (_2) >> where _2 = &p_1(D)->a. >> >> The aggressive loop optimization trigger in this case because >> the access has been transformed to MEM[(char *)p_5(D) + 4B] >> early on which obviates the structure of the accessed type. >> >> This is the case that I think is worth solving -- ideally not >> by removing the optimization but by preserving the conversion >> to the type of the newly constructed object. > > Pointer types carry no information in GIMPLE. So what do you suggest as a solution? The strlen optimization can be decoupled from warnings and disabled, and the aggressive loop optimization can be disabled altogether. But the same issue affects all string functions with _FORTIFY_SOURCE=2. The modified example below aborts at runime (and gets diagnosed by -Wstringop-overflow). GCC certainly needs to generate valid object code for valid source code. But keeping track of object type information is also important for correctness and security, as has been done by _FORTIFY_SOURCE and by many middle-end warnings. When accurate, it also benefits optimization. What can we do to make it more reliable? Would annotating placement new solve the problem? If not, what would? Martin #include <new> #include <string.h> struct S { int x; unsigned char a[1]; char b[64]; }; void foo (S *p) { char *q = new ((char*)p->a) char [16]; strcpy (q, "12345678"); // abort here } int main () { foo (new S); } ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-08 15:51 ` Martin Sebor @ 2018-08-08 16:12 ` Bernd Edlinger 2018-08-08 17:19 ` Richard Biener 1 sibling, 0 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-08-08 16:12 UTC (permalink / raw) To: Martin Sebor, Richard Biener, Jeff Law, Jakub Jelinek; +Cc: GCC Patches On 08/08/18 17:51, Martin Sebor wrote: > On 08/07/2018 11:46 AM, Richard Biener wrote: >> >> Pointer types carry no information in GIMPLE. > > So what do you suggest as a solution? > > The strlen optimization can be decoupled from warnings and > disabled, and the aggressive loop optimization can be disabled > altogether. But the same issue affects all string functions > with _FORTIFY_SOURCE=2. The modified example below aborts at > runime (and gets diagnosed by -Wstringop-overflow). > > GCC certainly needs to generate valid object code for valid > source code. But keeping track of object type information > is also important for correctness and security, as has been > done by _FORTIFY_SOURCE and by many middle-end warnings. > When accurate, it also benefits optimization. > > What can we do to make it more reliable? Would annotating > placement new solve the problem? If not, what would? > > Martin > > #include <new> > #include <string.h> > > struct S { int x; unsigned char a[1]; char b[64]; }; > > void foo (S *p) > { > char *q = new ((char*)p->a) char [16]; > > strcpy (q, "12345678"); // abort here > } > > int main () > { > foo (new S); > } > I quote Jakub's E-mail from 07/31/18 08:38: "Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the standard requires, imposes extra requirements. So from what this mode accepts or rejects we shouldn't determine what is or isn't considered valid." So just use _FORTIFY_SOURCE=1 ? But what would be good to improve in _FORTIFY_SOURCE is, intercepting strlen to catch cases early, where the char buffer is not zero terminated. Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-08 15:51 ` Martin Sebor 2018-08-08 16:12 ` Bernd Edlinger @ 2018-08-08 17:19 ` Richard Biener 1 sibling, 0 replies; 121+ messages in thread From: Richard Biener @ 2018-08-08 17:19 UTC (permalink / raw) To: Martin Sebor, Bernd Edlinger, Jeff Law, Jakub Jelinek; +Cc: GCC Patches On August 8, 2018 5:51:16 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >On 08/07/2018 11:46 AM, Richard Biener wrote: >> On August 7, 2018 6:31:36 PM GMT+02:00, Martin Sebor ><msebor@gmail.com> wrote: >>> On 08/07/2018 09:33 AM, Bernd Edlinger wrote: >>>> On 08/07/18 17:02, Martin Sebor wrote: >>>>> On 08/06/2018 11:45 PM, Richard Biener wrote: >>>>>> On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor >>> <msebor@gmail.com> wrote: >>>>>>> On 08/06/2018 11:40 AM, Jeff Law wrote: >>>>>>>> On 08/06/2018 11:15 AM, Martin Sebor wrote: >>>>>>>>>>> These examples do not aim to be valid C, they just point out >>>>>>> limitations >>>>>>>>>>> of the middle-end design, and a good deal of the problems >are >>> due >>>>>>>>>>> to trying to do things that are not safe within the >boundaries >>>>>>> given >>>>>>>>>>> by the middle-end design. >>>>>>>>>> I really think this is important -- and as such I think we >need >>> to >>>>>>> move >>>>>>>>>> away from trying to describe scenarios in C because doing so >>> keeps >>>>>>>>>> bringing us back to the "C doesn't allow XYZ" kinds of >>> arguments >>>>>>> when >>>>>>>>>> what we're really discussing are GIMPLE semantic issues. >>>>>>>>>> >>>>>>>>>> So examples should be GIMPLE. You might start with (possibly >>>>>>> invalid) C >>>>>>>>>> code to generate the GIMPLE, but the actual discussion needs >to >>> be >>>>>>>>>> looking at GIMPLE. We might include the C code in case >someone >>>>>>> wants to >>>>>>>>>> look at things in a debugger, but bringing the focus to >GIMPLE >>> is >>>>>>> really >>>>>>>>>> important here. >>>>>>>>> >>>>>>>>> I don't understand the goal of this exercise. Unless the >GIMPLE >>>>>>>>> code is the result of a valid test case (in some language GCC >>>>>>>>> supports), what does it matter what it looks like? The basis >of >>>>>>>>> every single transformation done by a compiler is that the >>> source >>>>>>>>> code is correct. If it isn't then all bets are off. I'm no >>> GIMPLE >>>>>>>>> expert but even I can come up with any number of GIMPLE >>> expressions >>>>>>>>> that have undefined behavior. What would that prove? >>>>>>>> The GIMPLE IL is less restrictive than the original source >>> language. >>>>>>>> The process of translation into GIMPLE and optimization can >>> create >>>>>>>> situations in the GIMPLE IL that can't be validly represented >in >>> the >>>>>>>> original source language. Subobject crossing being one such >>> case, >>>>>>> there >>>>>>>> are certainly others. We have to handle these scenarios >>> correctly. >>>>>>> >>>>>>> Sure, but a valid C test case still needs to exist to show that >>>>>>> such a transformation is possible. Until someone comes up with >>>>>>> one it's all speculation. >>>>>> >>>>>> Jakub showed you one wrt CSE of addresses. >>>>> >>>>> Sorry, there have been so many examples I've lost track. Can >>>>> you please copy it here or point to it in the archive? >>>>> >>>> >>>> This is based on Jakubs example here: >>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00260.html >>>> >>>> $ cat y.cc >>>> typedef __typeof__ (sizeof 0) size_t; >>>> void *operator new (size_t, void *p) { return p; } >>>> void *operator new[] (size_t, void *p) { return p; } >>>> struct S { int x; unsigned char a[1]; char b[64]; }; >>>> void baz (char *); >>>> >>>> size_t >>>> foo (S *p) >>>> { >>>> char *q = new ((char*)p->a) char [16]; >>>> baz (q); >>>> size_t x = __builtin_strlen (q); >>>> if (x==0) >>>> __builtin_abort(); >>>> return x; >>>> } >>>> >>>> $ gcc -O3 -S y.ccup >>>> $ cat y.s >>>> .LFB2: >>>> .cfi_startproc >>>> subq $8, %rsp >>>> .cfi_def_cfa_offset 16 >>>> addq $4, %rdi >>>> call _Z3bazPc >>>> call abort >>>> .cfi_endproc >>>> >>>> >>>> >>>> I think this is not a correct optimization. >>> >>> I see. This narrows it down to the placement new expression >>> exposing the type of the original object rather than that of >>> the newly constructed object. We end up with strlen (_2) >>> where _2 = &p_1(D)->a. >>> >>> The aggressive loop optimization trigger in this case because >>> the access has been transformed to MEM[(char *)p_5(D) + 4B] >>> early on which obviates the structure of the accessed type. >>> >>> This is the case that I think is worth solving -- ideally not >>> by removing the optimization but by preserving the conversion >>> to the type of the newly constructed object. >> >> Pointer types carry no information in GIMPLE. > >So what do you suggest as a solution? > >The strlen optimization can be decoupled from warnings and >disabled, and the aggressive loop optimization can be disabled >altogether. But the same issue affects all string functions >with _FORTIFY_SOURCE=2. The modified example below aborts at >runime (and gets diagnosed by -Wstringop-overflow). > >GCC certainly needs to generate valid object code for valid >source code. But keeping track of object type information >is also important for correctness and security, as has been >done by _FORTIFY_SOURCE and by many middle-end warnings. >When accurate, it also benefits optimization. > >What can we do to make it more reliable? Would annotating >placement new solve the problem? If not, what would? > >Martin > >#include <new> >#include <string.h> > >struct S { int x; unsigned char a[1]; char b[64]; }; > >void foo (S *p) >{ > char *q = new ((char*)p->a) char [16]; > > strcpy (q, "12345678"); // abort here If fortify aborts here we have to fix it. We cannot annotate placement new since that is not necessary. All that is necessary according to the standard is reuse of storage. Object type information is stored at memory accesses in GIMPLE. A possibility to get more of that would be properly typed clobbers at object creation time (placement new). But certainly object reuse without those need to be treated conservatively. Note we arrived at the current state of things after a few painful transitions through non-working schemes, including having pointer types with 'semantics' and 'annotating' placement new. Richard. >} > >int main () >{ > foo (new S); >} ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-07 3:39 ` Martin Sebor 2018-08-07 5:45 ` Richard Biener @ 2018-08-07 15:32 ` Jeff Law 1 sibling, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-07 15:32 UTC (permalink / raw) To: Martin Sebor, Bernd Edlinger, Jakub Jelinek; +Cc: Richard Biener, GCC Patches On 08/06/2018 09:38 PM, Martin Sebor wrote: > On 08/06/2018 11:40 AM, Jeff Law wrote: >> On 08/06/2018 11:15 AM, Martin Sebor wrote: >>>>> These examples do not aim to be valid C, they just point out >>>>> limitations >>>>> of the middle-end design, and a good deal of the problems are due >>>>> to trying to do things that are not safe within the boundaries given >>>>> by the middle-end design. >>>> I really think this is important -- and as such I think we need to move >>>> away from trying to describe scenarios in C because doing so keeps >>>> bringing us back to the "C doesn't allow XYZ" kinds of arguments when >>>> what we're really discussing are GIMPLE semantic issues. >>>> >>>> So examples should be GIMPLE. You might start with (possibly >>>> invalid) C >>>> code to generate the GIMPLE, but the actual discussion needs to be >>>> looking at GIMPLE. We might include the C code in case someone >>>> wants to >>>> look at things in a debugger, but bringing the focus to GIMPLE is >>>> really >>>> important here. >>> >>> I don't understand the goal of this exercise. Unless the GIMPLE >>> code is the result of a valid test case (in some language GCC >>> supports), what does it matter what it looks like? The basis of >>> every single transformation done by a compiler is that the source >>> code is correct. If it isn't then all bets are off. I'm no GIMPLE >>> expert but even I can come up with any number of GIMPLE expressions >>> that have undefined behavior. What would that prove? >> The GIMPLE IL is less restrictive than the original source language. >> The process of translation into GIMPLE and optimization can create >> situations in the GIMPLE IL that can't be validly represented in the >> original source language. Subobject crossing being one such case, there >> are certainly others. We have to handle these scenarios correctly. > > Sure, but a valid C test case still needs to exist to show that > such a transformation is possible. Until someone comes up with > one it's all speculation. No, not at all. The defined semantics in this space come from actually bumping against these problems in this past -- resulting in defining the semantics in the way we have. > > Under normal circumstances the burden of proof that there is > a problem is on the reporter. In this case, the requirement > has turned into one to prove a negative. Effectively, you > are asking for a proof that there is no bug, either in > the assumptions behind the strlen optimization, or somewhere > else in GCC that would lead the optimization to invalidate > a valid piece of code. That's impossible. I disagree strongly. We have *defined* a set of semantics in GIMPLE based on the language lowering processes and needs of the optimizers. For anything which transforms the IL, you must adhere to the semantics of GIMPLE. It's that simple. I am sympathetic to the desire to use C semantics to get better refined ranges, but that's just wrong for anything which impacts code generation. This discussion doesn't seem to be moving beyond that basic point which is concerning. It really feels like we should be moving towards how do we avoid violating GIMPLE semantics for codegen/opt issues while still getting good warnings. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-06 17:15 ` Martin Sebor 2018-08-06 17:40 ` Jeff Law @ 2018-08-06 22:39 ` Jeff Law 1 sibling, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-06 22:39 UTC (permalink / raw) To: Martin Sebor, Bernd Edlinger, Jakub Jelinek; +Cc: Richard Biener, GCC Patches On 08/06/2018 11:15 AM, Martin Sebor wrote: >>> These examples do not aim to be valid C, they just point out limitations >>> of the middle-end design, and a good deal of the problems are due >>> to trying to do things that are not safe within the boundaries given >>> by the middle-end design. >> I really think this is important -- and as such I think we need to move >> away from trying to describe scenarios in C because doing so keeps >> bringing us back to the "C doesn't allow XYZ" kinds of arguments when >> what we're really discussing are GIMPLE semantic issues. >> >> So examples should be GIMPLE. You might start with (possibly invalid) C >> code to generate the GIMPLE, but the actual discussion needs to be >> looking at GIMPLE. We might include the C code in case someone wants to >> look at things in a debugger, but bringing the focus to GIMPLE is really >> important here. > > I don't understand the goal of this exercise. Unless the GIMPLE > code is the result of a valid test case (in some language GCC > supports), what does it matter what it looks like? The basis of > every single transformation done by a compiler is that the source > code is correct. If it isn't then all bets are off. I'm no GIMPLE > expert but even I can come up with any number of GIMPLE expressions > that have undefined behavior. What would that prove? > > But let me try anyway. Here's a simplified (and gimplified) version > of the test case that started this debate: > >  struct S { char a[4], b; }; > >  f (struct S * p) >  { >    int D.1908; > >    _1 = &p->a; >    _2 = __builtin_strlen (_1);  // strlen (p->a); >    D.1908 = (int) _2; >    return D.1908; >  } You need to include the declaration of _1 so that we can see its type. Assuming you're just calling strlen (p->a), it's type will be: char[4] * _1; Which provides you with useful type information. If I'm wrong on this one, I'm sure Jakub & Richi will chime in. > > and one involving a pointer: > >  g (struct S * p) >  { >    int D.1910; >    char * q; > >    q = &p->a; >    _1 = __builtin_strlen (q); >    D.1910 = (int) _1; >    return D.1910; >  } In this case the pointer passed is a char *, so you know nothing. It seems a bit silly since you can see that q = &p->a, but that's how it works. It all comes down to how information is lost as we go through the pipeline. It may not matter in this specific example, but it matters in the general case. > > and another one involving a pointer and strcpy and > _FORTIFY_SOURCE: > >  h (struct S * p) >  { >    int D.2208; >    char * q; > >    q = &p->a; >    _1 = strcpy (q, "1234"); >    _2 = (long int) _1; >    D.2208 = (int) _2; >    return D.2208; >  } > > with strcpy defined as: > >  __attribute__((artificial, gnu_inline, always_inline, leaf, nothrow)) >  strcpy (char * restrict __dest, const char * restrict __src) >  { >    char * D.2210; > >    _1 = __builtin_object_size (__dest, 1); >    D.2210 = __builtin___strcpy_chk (__dest, __src, _1); >    return D.2210; >  } > > What does this show? > > AFAICS, all three functions are equivalent GIMPLE, yet I'm being > told that the first one is different in some important detail from > the second, and that even though it's the same as the third and > even though it's good to have __strcpy_chk() abort in the third > case it's bad for the strlen() call to return a value constrained > to [0, 3]. Would defining strlen like so Note that _b_o_s is supposed to honor the somewhat wonky GIMPLE semantics in this space. If it doesn't that'd be a bug. Assuming it does honor those semantics, then the right things will happen. Note that the loss of information as we go through the pipeline may mean that _b_o_s could return "don't know" which is -1. It may also return the largest of multiple objects that the pointer points to. So you won't get a error from the fortification system in those cases where semantics between C and GIMPLE differ. > >  __attribute__((artificial, gnu_inline, always_inline, leaf, nothrow)) >  strlen (const char * __src) >  { >    char * D.2210; > >    _1 = __builtin_object_size (__src, 1); >    D.2210 = __builtin___strlen_chk (__src, _1); >    return D.2210; >  } > > and having __strlen_chk() abort if __strc were longer than _1 > be also bad? (If not -- I sincerely hope that's the answer -- > then I'll be happy to put together a patch for that. In fact, > I think it would be useful to extend this to all string > functions (i.e., have them all abort on reads past the end, > just as they abort on writes). So the key here is _b_o_s should be honoring the semantics of GIMPLE. It can/will return "don't know" or sizes potentially larger than the object in some cases. Jeff > > Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-04 20:52 ` Martin Sebor 2018-08-05 6:51 ` Bernd Edlinger @ 2018-08-05 17:00 ` Jeff Law 2018-08-05 17:27 ` Richard Biener 2 siblings, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-05 17:00 UTC (permalink / raw) To: Martin Sebor, Jakub Jelinek; +Cc: Richard Biener, Bernd Edlinger, GCC Patches On 08/04/2018 02:52 PM, Martin Sebor wrote: > On 08/03/2018 01:43 AM, Jakub Jelinek wrote: >> On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote: >>>> If I call this with foo (2, 1), do you still claim it is not valid C? >>> >>> String functions like strlen operate on character strings stored >>> in character arrays. Calling strlen (&s[1]) is invalid because >>> &s[1] is not the address of a character array. The fact that >>> objects can be represented as arrays of bytes doesn't change >>> that. The standard may be somewhat loose with words on this >>> distinction but the intent certainly isn't for strlen to traverse >>> arbitrary sequences of bytes that cross subobject boundaries. >>> (That is the intent behind the raw memory functions, but >>> the current text doesn't make the distinction clear.) >> >> But the standard doesn't say that right now. > > It does, in the restriction on multi-dimensional array accesses. > Given the array 'char a[2][2];' it's only valid to access a[0][0] > and a[0][1], and a[1][0], and a[1][1]. It's not valid to access > a[2][0] or a[2][1], even though they happen to be located at > the same addresses as a[1][0] and a[1][1]. > > There is no exception for distinct struct members. So in > a struct { char a[2], b[2]; }, even though a and b and laid > out the same way as char[2][2] would be, it's not valid to > treat a as such. There is no distinction between array > subscripting and pointer arithmetic, so it doesn't matter > what form the access takes. Understood WRT what the C language says. Let's bring it back to GIMPLE though. In GIMPLE we allow those some crossing of subobject boundaries as explained earlier in the thread. It's not ideal, but that's the way things are. So with that in mind, I think we need to reevaluate some of the assumptions we're making in this code. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-04 20:52 ` Martin Sebor 2018-08-05 6:51 ` Bernd Edlinger 2018-08-05 17:00 ` Jeff Law @ 2018-08-05 17:27 ` Richard Biener 2018-08-06 15:36 ` Martin Sebor 2 siblings, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-08-05 17:27 UTC (permalink / raw) To: Martin Sebor, Jakub Jelinek; +Cc: Bernd Edlinger, Jeff Law, GCC Patches On August 4, 2018 10:52:02 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >On 08/03/2018 01:43 AM, Jakub Jelinek wrote: >> On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote: >>>> If I call this with foo (2, 1), do you still claim it is not valid >C? >>> >>> String functions like strlen operate on character strings stored >>> in character arrays. Calling strlen (&s[1]) is invalid because >>> &s[1] is not the address of a character array. The fact that >>> objects can be represented as arrays of bytes doesn't change >>> that. The standard may be somewhat loose with words on this >>> distinction but the intent certainly isn't for strlen to traverse >>> arbitrary sequences of bytes that cross subobject boundaries. >>> (That is the intent behind the raw memory functions, but >>> the current text doesn't make the distinction clear.) >> >> But the standard doesn't say that right now. > >It does, in the restriction on multi-dimensional array accesses. >Given the array 'char a[2][2];' it's only valid to access a[0][0] >and a[0][1], and a[1][0], and a[1][1]. It's not valid to access >a[2][0] or a[2][1], even though they happen to be located at >the same addresses as a[1][0] and a[1][1]. > >There is no exception for distinct struct members. So in >a struct { char a[2], b[2]; }, even though a and b and laid >out the same way as char[2][2] would be, it's not valid to >treat a as such. There is no distinction between array >subscripting and pointer arithmetic, so it doesn't matter >what form the access takes. What does the standard say to comparing & s. a[2] and & s. b[0] and what does that mean when you consider converting those to uintptr_t and back and then access the data pointed to? Points-to analysis considers the first pointer to point to both subobjects while the second only to the second. (just pointing out other maybe inconsistent itself within GIMPLE handling of subobjects in points-to analysis) Richard. >Yes, the standard could be clearer. There probably even are >ambiguities and contradictions (the authors of the Object Model >proposal believe there are and are trying to clarify/remove >them). But the intent is clearly there. It's especially >important for adjacent members of different types (say a char[8] >followed by a function pointer. We definitely don't want writes >to the array to be allowed to change the function pointer.) > >> Plus, at least from the middle-end POV, there is also the case of >> placement new and stores changing the dynamic type of the object, >> previously say a struct with two fields, then a placement new with a >single >> char array over it (the placement new will not survive in the >middle-end, so >> it will be just a memcpy or strcpy or some other byte copy over the >original >> object, and due to the CSE/SCCVN etc. of pointer to pointer >conversions >> being in the middle-end useless means you can see a pointer to the >struct >> with two fields rather than pointer to char array. > >There may be challenges in the middle-end, you would know much >better than me. All I'm saying is that it's not valid to access >[sub]objects by dereferencing pointers to other subobjects. All >the examples in this discussion have been of that form. > >> >> Consider e.g. >> typedef __typeof__ (sizeof 0) size_t; >> void *operator new (size_t, void *p) { return p; } >> void *operator new[] (size_t, void *p) { return p; } >> struct S { char a; char b[64]; }; >> void baz (char *); >> >> size_t >> foo (S *p) >> { >> baz (&p->a); >> char *q = new (p) char [16]; >> baz (q); >> return __builtin_strlen (q); >> } >> >> I don't think it is correct to say that strlen must be 0. In this >testcase >> the pointer passed to strlen is still S *, though I think with enough >> tweaking you could also have something where the argument is &p->a. > >I think the problem here is changing the type of p->a. I'm >not up on the latest C++ changes here but I think it's a known >problem with the specification. A similar (known) problem also >comes in the case of dynamically allocated objects: > > char *p = (char*)operator new (2); > char *p1 = new (p) char ('a'); > char *p2 = new (p) char ('\0'); > strlen (p1); > >Is the strlen(p) call valid when there's no string or array >at p: there is a singlelton char object that just happens >to be followed by another singleton char object. It's not >an array of two elements. Each is [an array of] one char. > >This is a (specification) problem for sequence containers like >vector where strictly speaking, it's not valid to iterate over >them because of the array restriction. > >> >> I have no problem for strlen to return 0 if it sees a toplevel object >of >> size 1, but note that if it is extern, it already might be a problem >in some >> cases: >> struct T { char a; char a2[]; } b; >> extern struct T c; >> void foo (int *p) { p[0] = strlen (b); p[1] = strlen (c); } >> If c's definition is struct T c = { ' ', "abcde" }; >> then the object doesn't have length of 1. > >I'm assuming above you meant strlen(&b) and strlen(&c) (or >equivalently, strlen(&b.a) and strlen(&c.a). If so, it's >the same problem. The strlen call is invalid unless b.a and >c.a are nul. > >Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-05 17:27 ` Richard Biener @ 2018-08-06 15:36 ` Martin Sebor 0 siblings, 0 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-06 15:36 UTC (permalink / raw) To: Richard Biener, Jakub Jelinek; +Cc: Bernd Edlinger, Jeff Law, GCC Patches On 08/05/2018 11:27 AM, Richard Biener wrote: > On August 4, 2018 10:52:02 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >> On 08/03/2018 01:43 AM, Jakub Jelinek wrote: >>> On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote: >>>>> If I call this with foo (2, 1), do you still claim it is not valid >> C? >>>> >>>> String functions like strlen operate on character strings stored >>>> in character arrays. Calling strlen (&s[1]) is invalid because >>>> &s[1] is not the address of a character array. The fact that >>>> objects can be represented as arrays of bytes doesn't change >>>> that. The standard may be somewhat loose with words on this >>>> distinction but the intent certainly isn't for strlen to traverse >>>> arbitrary sequences of bytes that cross subobject boundaries. >>>> (That is the intent behind the raw memory functions, but >>>> the current text doesn't make the distinction clear.) >>> >>> But the standard doesn't say that right now. >> >> It does, in the restriction on multi-dimensional array accesses. >> Given the array 'char a[2][2];' it's only valid to access a[0][0] >> and a[0][1], and a[1][0], and a[1][1]. It's not valid to access >> a[2][0] or a[2][1], even though they happen to be located at >> the same addresses as a[1][0] and a[1][1]. >> >> There is no exception for distinct struct members. So in >> a struct { char a[2], b[2]; }, even though a and b and laid >> out the same way as char[2][2] would be, it's not valid to >> treat a as such. There is no distinction between array >> subscripting and pointer arithmetic, so it doesn't matter >> what form the access takes. > > What does the standard say to comparing & s. a[2] and & s. b[0] and what does that mean when you consider converting those to uintptr_t and back and then access the data pointed to? > Points-to analysis considers the first pointer to point to both subobjects while the second only to the second. (just pointing out other maybe inconsistent itself within GIMPLE handling of subobjects in points-to analysis) The text (since C99) says that such pointers compare equal. This doesn't imply that it's intended to be valid to access the adjacent object using the past-the-end pointer. Making this clear is one of the main goals of the (evolving) provenance proposal. Converting to uintptr_t isn't meant to change that either (the provenance is preserved through such conversions). Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-01 7:19 ` Richard Biener 2018-08-01 8:40 ` Jakub Jelinek @ 2018-08-02 3:13 ` Martin Sebor 2018-08-02 10:22 ` Bernd Edlinger 2018-08-03 7:47 ` Jeff Law 2018-08-03 7:38 ` Jeff Law 2 siblings, 2 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-02 3:13 UTC (permalink / raw) To: Richard Biener; +Cc: Jakub Jelinek, Bernd Edlinger, Jeff Law, GCC Patches On 08/01/2018 01:19 AM, Richard Biener wrote: > On Tue, 31 Jul 2018, Martin Sebor wrote: > >> On 07/31/2018 09:48 AM, Jakub Jelinek wrote: >>> On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote: >>>> On 07/31/2018 12:38 AM, Jakub Jelinek wrote: >>>>> On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote: >>>>>> Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past >>>>>> the end of subobjects by string functions. With _FORTIFY_SOURCE=2 >>>>>> it calls abort. This is the default on popular distributions, >>>>> >>>>> Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the >>>>> standard >>>>> requires, imposes extra requirements. So from what this mode accepts or >>>>> rejects we shouldn't determine what is or isn't considered valid. >>>> >>>> I'm not sure what the additional requirements are but the ones >>>> I am referring to are the enforcing of struct member boundaries. >>>> This is in line with the standard requirements of not accessing >>>> [sub]objects via pointers derived from other [sub]objects. >>> >>> In the middle-end the distinction between what was originally a reference >>> to subobjects and what was a reference to objects is quickly lost >>> (whether through SCCVN or other optimizations). >>> We've run into this many times with the __builtin_object_size already. >>> So, if e.g. >>> struct S { char a[3]; char b[5]; } s = { "abc", "defg" }; >>> ... >>> strlen ((char *) &s) is well defined but >>> strlen (s.a) is not in C, for the middle-end you might not figure out which >>> one is which. >> >> Yes, I'm aware of the middle-end transformation to MEM_REF >> -- it's one of the reasons why detecting invalid accesses >> by the middle end warnings, including -Warray-bounds, >> -Wformat-overflow, -Wsprintf-overflow, and even -Wrestrict, >> is less than perfect. >> >> But is strlen(s.a) also meant to be well-defined in the middle >> end (with the semantics of computing the length or "abcdefg"?) > > Yes. > >> And if so, what makes it well defined? > > The fact that strlen takes a char * argument and thus inline-expansion > of a trivial implementation like > > int len = 0; > for (; *p; ++p) > ++len; > > will have > > p = &s.a; > > and the middle-end doesn't reconstruct s.a[..] from the pointer > access. > >> >> Certainly not every "strlen" has these semantics. For example, >> this open-coded one doesn't: >> >> int len = 0; >> for (int i = 0; s.a[i]; ++i) >> ++len; >> >> It computes 2 (with no warning for the out-of-bounds access). > > Yes. If that's not a problem then why is it one when strlen() does the same thing? Presumably the answer is: "because here the access is via array indexing and in strlen via pointer dereferences." (But in C there is no difference between the two. Also see below.) >> So if the standard doesn't guarantee it and different kinds >> of accesses behave differently, how do we explain what "works" >> and what doesn't without relying on GCC implementation details? > > In the middle-end accesses via pointers - accesses where the > access path is not visible in the access itself - are not > constrained by the "access" path of how the pointer was built. I have seen and I think shown in this discussion examples where this is not so. For instance: struct S { char a[1], b[1]; }; void f (struct S *s, int i) { char *p = &s->a[i]; char *q = &s->b[0]; char x = *p; *q = 11; if (x != *p) // folded to false __builtin_abort (); // eliminated } Is this a bug? (I hope not.) >> If we can't then the only language we have in common with users >> is the standard. (This, by the way, is what the C memory model >> group is trying to address -- the language or feature that's >> missing from the standard that says when, if ever, these things >> might be valid.) > > Well, you simply have to not compare apples and oranges, > a strlen implementation that isn't a strlen implementation > and strlen. As I'm sure you know, the C standard doesn't differentiate between the semantics of array subscript expressions and pointer dereferencing. They both mean the same thing. (Nothing prevents an implementation from defining strlen as a macro that expands into a loop using array indices for array arguments.) But this, I suspect, might be behind the disagreement. You seem to think in terms of GIMPLE and GCC internals, and have a clear idea in your head what's meant to be valid and what isn't. I suspect only a few GCC developers think this way. Most of the rest of us think in terms of the language specification. Not just because that's the contract between programmers and the compiler, but also because it's the only specification available (the GCC internals manual doesn't go into nearly enough detail to even hint at what the answers to some of these questions might be). Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-02 3:13 ` Martin Sebor @ 2018-08-02 10:22 ` Bernd Edlinger 2018-08-02 15:42 ` Martin Sebor 2018-08-03 7:47 ` Jeff Law 1 sibling, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-02 10:22 UTC (permalink / raw) To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Jeff Law, Martin Sebor [-- Attachment #1: Type: text/plain, Size: 1795 bytes --] Hi, this is an update of the patch to prevent unsafe optimizations due to strlen range assuming always zero-terminated char arrays. Since the previous version I do no longer try to locate the outermost char array, but just bail out if there is a typecast, that means, supposed we have a 2-dimensional array, char a[x][y], strlen (s.a[x]) may assume a[x] is zero-terminated, if the optimization is enabled. strlen ((char*)s.a) involves a type cast and should assume nothing. Additionally due to the discussion, I came to the conclusion that this strlen range optimization should only be used when enabled with -fassume-zero-terminated-char-arrays or -Ofast. Note that I included the test case with the removed assertion as gcc/testsuite/gcc.dg/strlenopt-57.c Initially it would only miscompile with -Ofast, but with r263018 aka PR tree-optimization/86043, PR tree-optimization/86042 it was miscompiled with -O3 as well. I located the reason in gcc/tree-ssa-strlen.c (get_min_string_length) which did not check if the string constant is in fact zero terminated: @@ -3192,7 +3194,9 @@ get_min_string_length (tree rhs, bool *full_string && TREE_READONLY (rhs)) rhs = DECL_INITIAL (rhs); - if (rhs && TREE_CODE (rhs) == STRING_CST) + if (rhs && TREE_CODE (rhs) == STRING_CST + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)), + TREE_STRING_LENGTH (rhs)) >= 0) { *full_string_p = true; return strlen (TREE_STRING_POINTER (rhs)); Fortunately this also shows a way how to narrow strlen return value expectations when we are able to positively prove that a string must be zero terminated. Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd. [-- Attachment #2: changelog-range-strlen-v3.txt --] [-- Type: text/plain, Size: 1573 bytes --] gcc: 2018-08-01 Bernd Edlinger <bernd.edlinger@hotmail.de> * common.opt: Add new optimization option -fassume-zero-terminated-char-arrays. * opts.c (default_options): Enable -fassume-zero-terminated-char-arrays with -Ofast. * gimple-fold.c (get_inner_char_array_unless_typecast): Helper function for strlen range estimations. (get_range_strlen): Use get_inner_char_array_unless_typecast. * gimple-fold.h (get_inner_char_array_unless_typecast): Declare. * tree-ssa-strlen.c (maybe_set_strlen_range): Likewise. (get_min_string_length): Avoid not NUL terminated string literals. * doc/invoke.texi: Document -fassume-zero-terminated-char-arrays. testsuite: 2018-08-01 Bernd Edlinger <bernd.edlinger@hotmail.de> * gcc.dg/tree-ssa/builtin-snprintf-warn-1.c: Add -fassume-zero-terminated-char-arrays. * gcc.dg/tree-ssa/builtin-snprintf-warn-2.c: Likewise * gcc.dg/tree-ssa/builtin-snprintf-warn-3.c: Likewise * gcc.dg/tree-ssa/builtin-sprintf-warn-14.c: Likewise * gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: Likewise * gcc.dg/tree-ssa/pr79376.c: Likewise * gcc.dg/Wstringop-overflow-5.c: Likewise. * gcc.dg/Wstringop-truncation-3.c: Likewise. * gcc.dg/Wstringop-truncation.c: Likewise. * gcc.dg/pr79538.c: Likewise. * gcc.dg/pr83373.c: Likewise. * gcc.dg/strlenopt-36.c: Likewise. * gcc.dg/strlenopt-40.c: Adjust test expectations. * gcc.dg/strlenopt-45.c: Likewise. * gcc.dg/strlenopt-48.c: Likewise. * gcc.dg/strlenopt-51.c: Likewise. * gcc.dg/strlenopt-55.c: New test. * gcc.dg/strlenopt-56.c: New test. * gcc.dg/strlenopt-57.c: New test. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: patch-range-strlen-v3.diff --] [-- Type: text/x-patch; name="patch-range-strlen-v3.diff", Size: 29376 bytes --] Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 263029) +++ gcc/common.opt (working copy) @@ -1025,6 +1025,10 @@ fsanitize-undefined-trap-on-error Common Driver Report Var(flag_sanitize_undefined_trap_on_error) Init(0) Use trap instead of a library function for undefined behavior sanitization. +fassume-zero-terminated-char-arrays +Common Var(flag_assume_zero_terminated_char_arrays) Optimization Init(0) +Optimize under the assumption that char arrays must always be zero terminated. + fasynchronous-unwind-tables Common Report Var(flag_asynchronous_unwind_tables) Optimization Generate unwind tables that are exact at each instruction boundary. Index: gcc/gimple-fold.c =================================================================== --- gcc/gimple-fold.c (revision 263029) +++ gcc/gimple-fold.c (working copy) @@ -1257,7 +1257,45 @@ gimple_fold_builtin_memset (gimple_stmt_iterator * return true; } +/* Obtain the inner char array for strlen range estimations. + Return NULL if ARG is not a char array, or if the inner reference + chain goes through a type cast. */ +tree +get_inner_char_array_unless_typecast (tree arg) +{ + if (!flag_assume_zero_terminated_char_arrays) + return NULL_TREE; + + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (arg)) != ARRAY_TYPE + || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != INTEGER_TYPE + || TYPE_MODE (TREE_TYPE (TREE_TYPE (arg))) != TYPE_MODE (char_type_node) + || TYPE_PRECISION (TREE_TYPE (TREE_TYPE (arg))) + != TYPE_PRECISION (char_type_node)) + return NULL_TREE; + + tree base = arg; + while (TREE_CODE (base) == ARRAY_REF + || TREE_CODE (base) == ARRAY_RANGE_REF + || TREE_CODE (base) == COMPONENT_REF) + base = TREE_OPERAND (base, 0); + + /* If this looks like a type cast don't assume anything. */ + if ((TREE_CODE (base) == MEM_REF + && (! integer_zerop (TREE_OPERAND (base, 1)) + || TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0)))) + != TYPE_MAIN_VARIANT (TREE_TYPE (base)))) + || TREE_CODE (base) == VIEW_CONVERT_EXPR + /* Or other stuff that would be handled by get_inner_reference. */ + || TREE_CODE (base) == BIT_FIELD_REF + || TREE_CODE (base) == REALPART_EXPR + || TREE_CODE (base) == IMAGPART_EXPR) + return NULL_TREE; + + return arg; +} + /* Obtain the minimum and maximum string length or minimum and maximum value of ARG in LENGTH[0] and LENGTH[1], respectively. If ARG is an SSA name variable, follow its use-def chains. When @@ -1310,8 +1348,8 @@ get_range_strlen (tree arg, tree length[2], bitmap member. */ tree idx = TREE_OPERAND (op, 1); - arg = TREE_OPERAND (op, 0); - tree optype = TREE_TYPE (arg); + op = TREE_OPERAND (op, 0); + tree optype = TREE_TYPE (op); if (tree dom = TYPE_DOMAIN (optype)) if (tree bound = TYPE_MAX_VALUE (dom)) if (TREE_CODE (bound) == INTEGER_CST @@ -1339,19 +1377,13 @@ get_range_strlen (tree arg, tree length[2], bitmap if (TREE_CODE (arg) == ARRAY_REF) { - tree type = TREE_TYPE (TREE_OPERAND (arg, 0)); + arg = get_inner_char_array_unless_typecast (arg); + if (!arg) + return false; - /* Determine the "innermost" array type. */ - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); + tree type = TREE_TYPE (arg); - /* Avoid arrays of pointers. */ - tree eltype = TREE_TYPE (type); - if (TREE_CODE (type) != ARRAY_TYPE - || !INTEGRAL_TYPE_P (eltype)) - return false; - + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); if (!val || integer_zerop (val)) return false; @@ -1362,15 +1394,17 @@ get_range_strlen (tree arg, tree length[2], bitmap the array could have zero length. */ *minlen = ssize_int (0); - if (TREE_CODE (TREE_OPERAND (arg, 0)) == COMPONENT_REF - && type == TREE_TYPE (TREE_OPERAND (arg, 0)) - && array_at_struct_end_p (TREE_OPERAND (arg, 0))) + if (TREE_CODE (arg) == COMPONENT_REF + && type == TREE_TYPE (arg) + && array_at_struct_end_p (arg)) *flexp = true; } - else if (TREE_CODE (arg) == COMPONENT_REF - && (TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 1))) - == ARRAY_TYPE)) + else if (TREE_CODE (arg) == COMPONENT_REF) { + arg = get_inner_char_array_unless_typecast (arg); + if (!arg) + return false; + /* Use the type of the member array to determine the upper bound on the length of the array. This may be overly optimistic if the array itself isn't NUL-terminated and @@ -1386,10 +1420,6 @@ get_range_strlen (tree arg, tree length[2], bitmap tree type = TREE_TYPE (arg); - while (TREE_CODE (type) == ARRAY_TYPE - && TREE_CODE (TREE_TYPE (type)) == ARRAY_TYPE) - type = TREE_TYPE (type); - /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); if (!val || integer_zerop (val)) @@ -1400,8 +1430,7 @@ get_range_strlen (tree arg, tree length[2], bitmap the array could have zero length. */ *minlen = ssize_int (0); } - - if (VAR_P (arg)) + else if (VAR_P (arg) && flag_assume_zero_terminated_char_arrays) { tree type = TREE_TYPE (arg); if (POINTER_TYPE_P (type)) @@ -1409,13 +1438,20 @@ get_range_strlen (tree arg, tree length[2], bitmap if (TREE_CODE (type) == ARRAY_TYPE) { + /* We handle arrays of integer types. */ + if (TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE + || TYPE_MODE (TREE_TYPE (type)) + != TYPE_MODE (char_type_node) + || TYPE_PRECISION (TREE_TYPE (type)) + != TYPE_PRECISION (char_type_node)) + return false; + + /* Fail when the array bound is unknown or zero. */ val = TYPE_SIZE_UNIT (type); - if (!val - || TREE_CODE (val) != INTEGER_CST - || integer_zerop (val)) + if (!val || integer_zerop (val)) return false; - val = wide_int_to_tree (TREE_TYPE (val), - wi::sub (wi::to_wide (val), 1)); + val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, + integer_one_node); /* Set the minimum size to zero since the string in the array could have zero length. */ *minlen = ssize_int (0); Index: gcc/gimple-fold.h =================================================================== --- gcc/gimple-fold.h (revision 263029) +++ gcc/gimple-fold.h (working copy) @@ -61,6 +61,7 @@ extern bool gimple_fold_builtin_snprintf (gimple_s extern bool arith_code_with_undefined_signed_overflow (tree_code); extern gimple_seq rewrite_to_defined_overflow (gimple *); extern void replace_call_with_value (gimple_stmt_iterator *, tree); +extern tree get_inner_char_array_unless_typecast (tree); /* gimple_build, functionally matching fold_buildN, outputs stmts int the provided sequence, matching and simplifying them on-the-fly. Index: gcc/opts.c =================================================================== --- gcc/opts.c (revision 263029) +++ gcc/opts.c (working copy) @@ -547,6 +547,7 @@ static const struct default_options default_option /* -Ofast adds optimizations to -O3. */ { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 }, + { OPT_LEVELS_FAST, OPT_fassume_zero_terminated_char_arrays, NULL, 1 }, { OPT_LEVELS_NONE, 0, NULL, 0 } }; Index: gcc/tree-ssa-strlen.c =================================================================== --- gcc/tree-ssa-strlen.c (revision 263029) +++ gcc/tree-ssa-strlen.c (working copy) @@ -1149,11 +1149,15 @@ maybe_set_strlen_range (tree lhs, tree src, tree b if (TREE_CODE (src) == ADDR_EXPR) { + src = TREE_OPERAND (src, 0); + + src = get_inner_char_array_unless_typecast (src); + + if (!src) + ; /* The last array member of a struct can be bigger than its size suggests if it's treated as a poor-man's flexible array member. */ - src = TREE_OPERAND (src, 0); - bool src_is_array = TREE_CODE (TREE_TYPE (src)) == ARRAY_TYPE; - if (src_is_array && !array_at_struct_end_p (src)) + else if (!array_at_struct_end_p (src)) { tree type = TREE_TYPE (src); if (tree size = TYPE_SIZE_UNIT (type)) @@ -1170,8 +1174,6 @@ maybe_set_strlen_range (tree lhs, tree src, tree b } else { - if (TREE_CODE (src) == COMPONENT_REF && !src_is_array) - src = TREE_OPERAND (src, 1); if (DECL_P (src)) { /* Handle the unlikely case of strlen (&c) where c is some @@ -3192,7 +3194,9 @@ get_min_string_length (tree rhs, bool *full_string && TREE_READONLY (rhs)) rhs = DECL_INITIAL (rhs); - if (rhs && TREE_CODE (rhs) == STRING_CST) + if (rhs && TREE_CODE (rhs) == STRING_CST + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)), + TREE_STRING_LENGTH (rhs)) >= 0) { *full_string_p = true; return strlen (TREE_STRING_POINTER (rhs)); Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 263045) +++ gcc/doc/invoke.texi (working copy) @@ -387,7 +387,8 @@ Objective-C and Objective-C++ Dialects}. -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol --fassociative-math -fauto-profile -fauto-profile[=@var{path}] @gol +-fassociative-math -fassume-zero-terminated-char-arrays @gol +-fauto-profile -fauto-profile[=@var{path}] @gol -fauto-inc-dec -fbranch-probabilities @gol -fbranch-target-load-optimize -fbranch-target-load-optimize2 @gol -fbtr-bb-exclusive -fcaller-saves @gol @@ -9938,6 +9939,17 @@ is automatically enabled when both @option{-fno-si The default is @option{-fno-associative-math}. +@item -fassume-zero-terminated-char-arrays +@opindex fassume-zero-terminated-char-arrays + +Optimize under the assumption that char arrays must always be zero +terminated. This may have an effect on code that uses strlen to +check the string length, for instance in assertions. Under certain +conditions such checks can be optimized away. This option is enabled +by default at optimization level @option{-Ofast}. + +The default is @option{-fno-assume-zero-terminated-char-arrays}. + @item -freciprocal-math @opindex freciprocal-math Index: gcc/testsuite/gcc.dg/Wstringop-overflow-5.c =================================================================== --- gcc/testsuite/gcc.dg/Wstringop-overflow-5.c (revision 263029) +++ gcc/testsuite/gcc.dg/Wstringop-overflow-5.c (working copy) @@ -1,6 +1,6 @@ /* PR tree-optimization/85259 - Missing -Wstringop-overflow= since r256683 { dg-do compile } - { dg-options "-O2 -Wstringop-overflow" } */ + { dg-options "-O2 -Wstringop-overflow -fassume-zero-terminated-char-arrays" } */ extern char* strcpy (char*, const char*); extern char* strcat (char*, const char*); Index: gcc/testsuite/gcc.dg/Wstringop-truncation-3.c =================================================================== --- gcc/testsuite/gcc.dg/Wstringop-truncation-3.c (revision 263029) +++ gcc/testsuite/gcc.dg/Wstringop-truncation-3.c (working copy) @@ -1,6 +1,6 @@ /* PR c/85931 - -Wsizeof-pointer-memaccess for strncpy with size of source { dg-do compile } - { dg-options "-O2 -Wall -Wstringop-truncation -ftrack-macro-expansion=0" } */ + { dg-options "-O2 -Wall -Wstringop-truncation -fassume-zero-terminated-char-arrays -ftrack-macro-expansion=0" } */ typedef __SIZE_TYPE__ size_t; Index: gcc/testsuite/gcc.dg/Wstringop-truncation.c =================================================================== --- gcc/testsuite/gcc.dg/Wstringop-truncation.c (revision 263029) +++ gcc/testsuite/gcc.dg/Wstringop-truncation.c (working copy) @@ -1,7 +1,7 @@ /* PR tree-optimization/84468 - Inconsistent -Wstringop-truncation warnings with -O2 { dg-do compile } - { dg-options "-O2 -Wstringop-truncation -ftrack-macro-expansion=0 -g" } */ + { dg-options "-O2 -Wstringop-truncation -fassume-zero-terminated-char-arrays -ftrack-macro-expansion=0 -g" } */ #define strncpy __builtin_strncpy Index: gcc/testsuite/gcc.dg/pr79538.c =================================================================== --- gcc/testsuite/gcc.dg/pr79538.c (revision 263029) +++ gcc/testsuite/gcc.dg/pr79538.c (working copy) @@ -1,6 +1,6 @@ /* PR middle-end/79538 - missing -Wformat-overflow with %s and non-member array arguments { dg-do compile } - { dg-options "-O2 -Wformat-overflow" } */ + { dg-options "-O2 -Wformat-overflow -fassume-zero-terminated-char-arrays" } */ char a3[3]; char a4[4]; Index: gcc/testsuite/gcc.dg/pr83373.c =================================================================== --- gcc/testsuite/gcc.dg/pr83373.c (revision 263029) +++ gcc/testsuite/gcc.dg/pr83373.c (working copy) @@ -1,6 +1,6 @@ /* PR middle-end/83373 - False positive reported by -Wstringop-overflow { dg-do compile } - { dg-options "-O2 -Wstringop-overflow" } */ + { dg-options "-O2 -Wstringop-overflow -fassume-zero-terminated-char-arrays" } */ typedef __SIZE_TYPE__ size_t; Index: gcc/testsuite/gcc.dg/strlenopt-36.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-36.c (revision 263029) +++ gcc/testsuite/gcc.dg/strlenopt-36.c (working copy) @@ -1,7 +1,7 @@ /* PR tree-optimization/78450 - strlen(s) return value can be assumed to be less than the size of s { dg-do compile } - { dg-options "-O2 -fdump-tree-optimized" } */ + { dg-options "-O2 -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" Index: gcc/testsuite/gcc.dg/strlenopt-40.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-40.c (revision 263029) +++ gcc/testsuite/gcc.dg/strlenopt-40.c (working copy) @@ -1,7 +1,7 @@ /* PR tree-optimization/83671 - fix for false positive reported by -Wstringop-overflow does not work with inlining { dg-do compile } - { dg-options "-O1 -fdump-tree-optimized" } */ + { dg-options "-O1 -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" @@ -219,10 +219,15 @@ void elim_member_arrays_ptr (struct MemArrays0 *ma ELIM_TRUE (strlen (ma0->a5_7[0]) < 7); ELIM_TRUE (strlen (ma0[0].a5_7[0]) < 7); +#if 0 + /* This is transformed into strlen ((const char *) &(ma0 + 64)->a5_7[0]) + which looks like a type cast and fails the check in + get_inner_char_array_unless_typecast. */ ELIM_TRUE (strlen (ma0[1].a5_7[0]) < 7); ELIM_TRUE (strlen (ma0[1].a5_7[4]) < 7); ELIM_TRUE (strlen (ma0[9].a5_7[0]) < 7); ELIM_TRUE (strlen (ma0[9].a5_7[4]) < 7); +#endif ELIM_TRUE (strlen (ma0->a3) < sizeof ma0->a3); ELIM_TRUE (strlen (ma0->a5) < sizeof ma0->a5); Index: gcc/testsuite/gcc.dg/strlenopt-45.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-45.c (revision 263029) +++ gcc/testsuite/gcc.dg/strlenopt-45.c (working copy) @@ -2,7 +2,7 @@ Test to verify that strnlen built-in expansion works correctly in the absence of tree strlen optimization. { dg-do compile } - { dg-options "-O2 -Wall -fdump-tree-optimized" } */ + { dg-options "-O2 -Wall -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" @@ -43,7 +43,6 @@ extern size_t strnlen (const char *, size_t); else \ FAIL (made_in_false_branch) -extern char c; extern char a1[1]; extern char a3[3]; extern char a5[5]; @@ -52,18 +51,6 @@ extern char ax[]; void elim_strnlen_arr_cst (void) { - /* The length of a string stored in a one-element array must be zero. - The result reported by strnlen() for such an array can be non-zero - only when the bound is equal to 1 (in which case the result must - be one). */ - ELIM (strnlen (&c, 0) == 0); - ELIM (strnlen (&c, 1) < 2); - ELIM (strnlen (&c, 2) == 0); - ELIM (strnlen (&c, 9) == 0); - ELIM (strnlen (&c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&c, SIZE_MAX) == 0); - ELIM (strnlen (&c, -1) == 0); - ELIM (strnlen (a1, 0) == 0); ELIM (strnlen (a1, 1) < 2); ELIM (strnlen (a1, 2) == 0); @@ -99,17 +86,18 @@ void elim_strnlen_arr_cst (void) ELIM (strnlen (a3_7[2], SIZE_MAX) < 8); ELIM (strnlen (a3_7[2], -1) < 8); - ELIM (strnlen ((char*)a3_7, 0) == 0); - ELIM (strnlen ((char*)a3_7, 1) < 2); - ELIM (strnlen ((char*)a3_7, 2) < 3); - ELIM (strnlen ((char*)a3_7, 3) < 4); - ELIM (strnlen ((char*)a3_7, 9) < 10); - ELIM (strnlen ((char*)a3_7, 19) < 20); - ELIM (strnlen ((char*)a3_7, 21) < 22); - ELIM (strnlen ((char*)a3_7, 23) < 22); - ELIM (strnlen ((char*)a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)a3_7, -1) < 22); + ELIM (strnlen ((char*)a3_7[0], 0) == 0); + ELIM (strnlen ((char*)a3_7[0], 1) < 2); + ELIM (strnlen ((char*)a3_7[0], 2) < 3); + ELIM (strnlen ((char*)a3_7[0], 3) < 4); + ELIM (strnlen ((char*)a3_7[0], 7) < 8); + ELIM (strnlen ((char*)a3_7[0], 9) < 7); + ELIM (strnlen ((char*)a3_7[0], 19) < 7); + ELIM (strnlen ((char*)a3_7[0], 21) < 7); + ELIM (strnlen ((char*)a3_7[0], 23) < 7); + ELIM (strnlen ((char*)a3_7[0], PTRDIFF_MAX) < 7); + ELIM (strnlen ((char*)a3_7[0], SIZE_MAX) < 7); + ELIM (strnlen ((char*)a3_7[0], -1) < 7); ELIM (strnlen (ax, 0) == 0); ELIM (strnlen (ax, 1) < 2); @@ -122,7 +110,6 @@ void elim_strnlen_arr_cst (void) struct MemArrays { - char c; char a0[0]; char a1[1]; char a3[3]; @@ -133,13 +120,6 @@ struct MemArrays void elim_strnlen_memarr_cst (struct MemArrays *p, int i) { - ELIM (strnlen (&p->c, 0) == 0); - ELIM (strnlen (&p->c, 1) < 2); - ELIM (strnlen (&p->c, 9) == 0); - ELIM (strnlen (&p->c, PTRDIFF_MAX) == 0); - ELIM (strnlen (&p->c, SIZE_MAX) == 0); - ELIM (strnlen (&p->c, -1) == 0); - /* Other accesses to internal zero-length arrays are undefined. */ ELIM (strnlen (p->a0, 0) == 0); @@ -154,19 +134,19 @@ void elim_strnlen_memarr_cst (struct MemArrays *p, ELIM (strnlen (p->a3, 1) < 2); ELIM (strnlen (p->a3, 2) < 3); ELIM (strnlen (p->a3, 3) < 4); - ELIM (strnlen (p->a3, 9) < 4); - ELIM (strnlen (p->a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p->a3, SIZE_MAX) < 4); - ELIM (strnlen (p->a3, -1) < 4); + ELIM (strnlen (p->a3, 9) < 3); + ELIM (strnlen (p->a3, PTRDIFF_MAX) < 3); + ELIM (strnlen (p->a3, SIZE_MAX) < 3); + ELIM (strnlen (p->a3, -1) < 3); ELIM (strnlen (p[i].a3, 0) == 0); ELIM (strnlen (p[i].a3, 1) < 2); ELIM (strnlen (p[i].a3, 2) < 3); ELIM (strnlen (p[i].a3, 3) < 4); - ELIM (strnlen (p[i].a3, 9) < 4); - ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 4); - ELIM (strnlen (p[i].a3, SIZE_MAX) < 4); - ELIM (strnlen (p[i].a3, -1) < 4); + ELIM (strnlen (p[i].a3, 9) < 3); + ELIM (strnlen (p[i].a3, PTRDIFF_MAX) < 3); + ELIM (strnlen (p[i].a3, SIZE_MAX) < 3); + ELIM (strnlen (p[i].a3, -1) < 3); ELIM (strnlen (p->a3_7[0], 0) == 0); ELIM (strnlen (p->a3_7[0], 1) < 2); @@ -203,17 +183,18 @@ void elim_strnlen_memarr_cst (struct MemArrays *p, ELIM (strnlen (p->a3_7[i], 19) < 20); #endif - ELIM (strnlen ((char*)p->a3_7, 0) == 0); - ELIM (strnlen ((char*)p->a3_7, 1) < 2); - ELIM (strnlen ((char*)p->a3_7, 2) < 3); - ELIM (strnlen ((char*)p->a3_7, 3) < 4); - ELIM (strnlen ((char*)p->a3_7, 9) < 10); - ELIM (strnlen ((char*)p->a3_7, 19) < 20); - ELIM (strnlen ((char*)p->a3_7, 21) < 22); - ELIM (strnlen ((char*)p->a3_7, 23) < 22); - ELIM (strnlen ((char*)p->a3_7, PTRDIFF_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, SIZE_MAX) < 22); - ELIM (strnlen ((char*)p->a3_7, -1) < 22); + ELIM (strnlen ((char*)p->a3_7[0], 0) == 0); + ELIM (strnlen ((char*)p->a3_7[0], 1) < 2); + ELIM (strnlen ((char*)p->a3_7[0], 2) < 3); + ELIM (strnlen ((char*)p->a3_7[0], 3) < 4); + ELIM (strnlen ((char*)p->a3_7[0], 7) < 8); + ELIM (strnlen ((char*)p->a3_7[0], 9) < 7); + ELIM (strnlen ((char*)p->a3_7[0], 19) < 7); + ELIM (strnlen ((char*)p->a3_7[0], 21) < 7); + ELIM (strnlen ((char*)p->a3_7[0], 23) < 7); + ELIM (strnlen ((char*)p->a3_7[0], PTRDIFF_MAX) < 7); + ELIM (strnlen ((char*)p->a3_7[0], SIZE_MAX) < 7); + ELIM (strnlen ((char*)p->a3_7[0], -1) < 7); ELIM (strnlen (p->ax, 0) == 0); ELIM (strnlen (p->ax, 1) < 2); @@ -290,9 +271,6 @@ void elim_strnlen_range (char *s) void keep_strnlen_arr_cst (void) { - KEEP (strnlen (&c, 1) == 0); - KEEP (strnlen (&c, 1) == 1); - KEEP (strnlen (a1, 1) == 0); KEEP (strnlen (a1, 1) == 1); @@ -301,7 +279,6 @@ void keep_strnlen_arr_cst (void) struct FlexArrays { - char c; char a0[0]; /* Access to internal zero-length arrays are undefined. */ char a1[1]; }; @@ -308,9 +285,6 @@ struct FlexArrays void keep_strnlen_memarr_cst (struct FlexArrays *p) { - KEEP (strnlen (&p->c, 1) == 0); - KEEP (strnlen (&p->c, 1) == 1); - #if 0 /* Accesses to internal zero-length arrays are undefined so avoid exercising them. */ @@ -331,5 +305,5 @@ void keep_strnlen_memarr_cst (struct FlexArrays *p /* { dg-final { scan-tree-dump-times "call_in_true_branch_not_eliminated_" 0 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 13 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 13 "optimized" } } */ + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 9 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 9 "optimized" } } */ Index: gcc/testsuite/gcc.dg/strlenopt-48.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-48.c (revision 263029) +++ gcc/testsuite/gcc.dg/strlenopt-48.c (working copy) @@ -3,7 +3,7 @@ Verify that strlen() calls with one-character array elements of multidimensional arrays are still folded. { dg-do compile } - { dg-options "-O2 -Wall -fdump-tree-optimized" } */ + { dg-options "-O2 -Wall -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #include "strlenopt.h" Index: gcc/testsuite/gcc.dg/strlenopt-51.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-51.c (revision 263029) +++ gcc/testsuite/gcc.dg/strlenopt-51.c (working copy) @@ -101,7 +101,7 @@ void test_keep_a9_9 (int i) { #undef T #define T(I) \ - KEEP (strlen (&a9_9[i][I][0]) > (1 + I) % 9); \ + KEEP (strlen (&a9_9[i][I][0]) > (0 + I) % 9); \ KEEP (strlen (&a9_9[i][I][1]) > (1 + I) % 9); \ KEEP (strlen (&a9_9[i][I][2]) > (2 + I) % 9); \ KEEP (strlen (&a9_9[i][I][3]) > (3 + I) % 9); \ @@ -115,7 +115,7 @@ void test_keep_a9_9 (int i) } /* { dg-final { scan-tree-dump-times "strlen" 72 "gimple" } } - { dg-final { scan-tree-dump-times "strlen" 63 "optimized" } } + { dg-final { scan-tree-dump-times "strlen" 72 "optimized" } } - { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 72 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 81 "optimized" } } */ Index: gcc/testsuite/gcc.dg/strlenopt-55.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-55.c (revision 0) +++ gcc/testsuite/gcc.dg/strlenopt-55.c (working copy) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-optimized" } */ + +typedef char A[6]; +typedef char B[2][3]; + +A a; + +void test (void) +{ + B* b = (B*) a; + if (__builtin_strlen ((*b)[0]) > 2) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "optimized" } } */ Index: gcc/testsuite/gcc.dg/strlenopt-56.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-56.c (revision 0) +++ gcc/testsuite/gcc.dg/strlenopt-56.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-optimized" } */ + +typedef char B[2][3]; + +B b; + +void test (void) +{ + if (__builtin_strlen (b[0]) > 2) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump-not "__builtin_strlen" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "__builtin_abort" "optimized" } } */ Index: gcc/testsuite/gcc.dg/strlenopt-57.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-57.c (revision 0) +++ gcc/testsuite/gcc.dg/strlenopt-57.c (working copy) @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +#define assert(x) do { if (!(x)) __builtin_abort (); } while (0) +extern int system (const char *); +static int fun (char *p) +{ + char buf[16]; + + assert (__builtin_strlen (p) < 4); + + __builtin_sprintf (buf, "echo %s - %s", p, p); + return system (buf); +} + +void test (void) +{ + char b[2] = "ab"; + fun (b); +} + +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_sprintf" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "system" 1 "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-1.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-1.c (revision 263029) +++ gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-1.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Wformat -Wformat-truncation=1 -ftrack-macro-expansion=0" } */ +/* { dg-options "-O2 -Wformat -Wformat-truncation=1 -fassume-zero-terminated-char-arrays -ftrack-macro-expansion=0" } */ typedef struct { Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-2.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-2.c (revision 263029) +++ gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-2.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Wformat -Wformat-truncation=2 -ftrack-macro-expansion=0" } */ +/* { dg-options "-O2 -Wformat -Wformat-truncation=2 -fassume-zero-terminated-char-arrays -ftrack-macro-expansion=0" } */ typedef struct { Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-3.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-3.c (revision 263029) +++ gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-3.c (working copy) @@ -1,6 +1,6 @@ /* PR middle-end/79448 - unhelpful -Wformat-truncation=2 warning { dg-do compile } - { dg-options "-O2 -Wformat -Wformat-truncation=2 -ftrack-macro-expansion=0" } + { dg-options "-O2 -Wformat -Wformat-truncation=2 -fassume-zero-terminated-char-arrays -ftrack-macro-expansion=0" } { dg-require-effective-target ptr32plus } */ typedef __SIZE_TYPE__ size_t; Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-14.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-14.c (revision 263029) +++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-14.c (working copy) @@ -1,7 +1,7 @@ /* PR middle-end/79376 - wrong lower bound with %s and non-constant strings in -Wformat-overflow { dg-do compile } - { dg-options "-O2 -Wall -Wformat-overflow=1 -ftrack-macro-expansion=0" } */ + { dg-options "-O2 -Wall -Wformat-overflow=1 -fassume-zero-terminated-char-arrays -ftrack-macro-expansion=0" } */ typedef __SIZE_TYPE__ size_t; Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-2.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-2.c (revision 263029) +++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-2.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-Wformat -Wformat-overflow=2 -ftrack-macro-expansion=0" } */ +/* { dg-options "-Wformat -Wformat-overflow=2 -fassume-zero-terminated-char-arrays -ftrack-macro-expansion=0" } */ /* When debugging, define LINE to the line number of the test case to exercise and avoid exercising any of the others. The buffer and objsize macros Index: gcc/testsuite/gcc.dg/tree-ssa/pr79376.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr79376.c (revision 263029) +++ gcc/testsuite/gcc.dg/tree-ssa/pr79376.c (working copy) @@ -1,7 +1,7 @@ /* PR tree-optimization/79376 - wrong lower bound with %s and non-constant strings in -Wformat-overflow { dg-do compile } - { dg-options "-O2 -fdump-tree-optimized" } */ + { dg-options "-O2 -fassume-zero-terminated-char-arrays -fdump-tree-optimized" } */ #define CAT(s, n) s ## n #define FAIL(line) CAT (failure_on_line_, line) ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-02 10:22 ` Bernd Edlinger @ 2018-08-02 15:42 ` Martin Sebor 2018-08-02 17:00 ` Martin Sebor 2018-08-09 5:36 ` Jeff Law 0 siblings, 2 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-02 15:42 UTC (permalink / raw) To: Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Jeff Law On 08/02/2018 04:22 AM, Bernd Edlinger wrote: > Hi, > > this is an update of the patch to prevent unsafe optimizations due to strlen range assuming > always zero-terminated char arrays. > > Since the previous version I do no longer try to locate the outermost char array, > but just bail out if there is a typecast, that means, supposed we have a 2-dimensional > array, char a[x][y], strlen (s.a[x]) may assume a[x] is zero-terminated, if the optimization > is enabled. strlen ((char*)s.a) involves a type cast and should assume nothing. > > Additionally due to the discussion, I came to the conclusion that this strlen range optimization > should only be used when enabled with -fassume-zero-terminated-char-arrays or -Ofast. > > Note that I included the test case with the removed assertion as > gcc/testsuite/gcc.dg/strlenopt-57.c > > Initially it would only miscompile with -Ofast, but with r263018 aka > PR tree-optimization/86043, PR tree-optimization/86042 it was miscompiled with -O3 as well. > > I located the reason in gcc/tree-ssa-strlen.c (get_min_string_length) which did not > check if the string constant is in fact zero terminated: > > @@ -3192,7 +3194,9 @@ get_min_string_length (tree rhs, bool *full_string > && TREE_READONLY (rhs)) > rhs = DECL_INITIAL (rhs); > > - if (rhs && TREE_CODE (rhs) == STRING_CST) > + if (rhs && TREE_CODE (rhs) == STRING_CST > + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)), > + TREE_STRING_LENGTH (rhs)) >= 0) > { > *full_string_p = true; > return strlen (TREE_STRING_POINTER (rhs)); > > Fortunately this also shows a way how to narrow strlen return value expectations when > we are able to positively prove that a string must be zero terminated. > > > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? The warning bits are definitely not okay by me. The purpose of the warnings (-W{format,sprintf}-{overflow,truncation} is to detect buffer overflows. When a warning doesn't have access to string length information for dynamically created strings (like the strlen pass does) it uses array sizes as a proxy. This is useful both to detect possible buffer overflows and to prevent false positives for overflows that cannot happen in correctly written programs. By changing the logic to not consider the bounds of the array the warnkngs will become prone to many false positives as is evident from your changes to the tests (you had to explicitly enable the new option). In effect, the patch undoes a couple of years worth of fine tuning of the warnings to strike a balance between true and false positives. That's completely unacceptable to me, and, frankly, proposals along these lines are exceedingly disruptive. Beyond that, I don't have a problem with adding a new option but I continue to strongly disagree with disabling the strlen optimization by default. It implies that by default GCC targets invalid code. If there is consensus that some new option is necessary, the right setting is on by default, along with warnings for programs that pass non-terminated arrays to string functions. I have already submitted a patch that effect for strlen and const arrays and am working on extending it to other built-in functions and dynamically created strings (when I have a chance between defending my past work). Either way, if a new option is introduced, the array bound computation for sprintf (and all other buffer overflow warnings, likely including -Wrestrict) will need to be decoupled from the option so the current behavior is preserved. As I have also said, the area of the provenance of subobject vs enclosing object pointers is under discussion in WG14 and the C Object Model study group. The informal consensus so far is to maintain the provenance of subobjects and introduce some mechanism to make it possible to extend the provenance to the enclosing object. This would be a pervasive change, not one just limited to strings. So if introducing some new option at this stage is thought to be necessary this should be taken into consideration. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-02 15:42 ` Martin Sebor @ 2018-08-02 17:00 ` Martin Sebor 2018-08-02 18:15 ` Bernd Edlinger 2018-08-02 18:20 ` Jakub Jelinek 2018-08-09 5:36 ` Jeff Law 1 sibling, 2 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-02 17:00 UTC (permalink / raw) To: Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Jeff Law As an alternate approach I have been thinking about, if there is a strong feeling that allowing strlen to iterate past the subobject boundary is necessary (I don't believe it is.) Rather than indiscriminately expanding the provenance of the subobject regardless of what members follow it in the enclosing structure, only consider doing that if the next member is an array of the same type. E.g., struct S { char a[4], b[3], c[2], d; }; extern struct S *p; strlen (p->a); // consider p->a's bounds to be char[9] I.e., treat p->a, p->b, and p->c as one array but exclude from it d because it's not an array. (This wouldn't solve the warning problem below -- a separate computation would still be necessary to determine the tighter bound of the member itself.) On 08/02/2018 09:42 AM, Martin Sebor wrote: > On 08/02/2018 04:22 AM, Bernd Edlinger wrote: >> Hi, >> >> this is an update of the patch to prevent unsafe optimizations due to >> strlen range assuming >> always zero-terminated char arrays. >> >> Since the previous version I do no longer try to locate the outermost >> char array, >> but just bail out if there is a typecast, that means, supposed we have >> a 2-dimensional >> array, char a[x][y], strlen (s.a[x]) may assume a[x] is >> zero-terminated, if the optimization >> is enabled. strlen ((char*)s.a) involves a type cast and should >> assume nothing. >> >> Additionally due to the discussion, I came to the conclusion that this >> strlen range optimization >> should only be used when enabled with >> -fassume-zero-terminated-char-arrays or -Ofast. >> >> Note that I included the test case with the removed assertion as >> gcc/testsuite/gcc.dg/strlenopt-57.c >> >> Initially it would only miscompile with -Ofast, but with r263018 aka >> PR tree-optimization/86043, PR tree-optimization/86042 it was >> miscompiled with -O3 as well. >> >> I located the reason in gcc/tree-ssa-strlen.c (get_min_string_length) >> which did not >> check if the string constant is in fact zero terminated: >> >> @@ -3192,7 +3194,9 @@ get_min_string_length (tree rhs, bool *full_string >> && TREE_READONLY (rhs)) >> rhs = DECL_INITIAL (rhs); >> >> - if (rhs && TREE_CODE (rhs) == STRING_CST) >> + if (rhs && TREE_CODE (rhs) == STRING_CST >> + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)), >> + TREE_STRING_LENGTH (rhs)) >= 0) >> { >> *full_string_p = true; >> return strlen (TREE_STRING_POINTER (rhs)); >> >> Fortunately this also shows a way how to narrow strlen return value >> expectations when >> we are able to positively prove that a string must be zero terminated. >> >> >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? > > The warning bits are definitely not okay by me. The purpose > of the warnings (-W{format,sprintf}-{overflow,truncation} is > to detect buffer overflows. When a warning doesn't have access > to string length information for dynamically created strings > (like the strlen pass does) it uses array sizes as a proxy. > This is useful both to detect possible buffer overflows and > to prevent false positives for overflows that cannot happen > in correctly written programs. > > By changing the logic to not consider the bounds of the array > the warnkngs will become prone to many false positives as is > evident from your changes to the tests (you had to explicitly > enable the new option). > > In effect, the patch undoes a couple of years worth of fine > tuning of the warnings to strike a balance between true and > false positives. That's completely unacceptable to me, and, > frankly, proposals along these lines are exceedingly > disruptive. > > Beyond that, I don't have a problem with adding a new option > but I continue to strongly disagree with disabling the strlen > optimization by default. It implies that by default GCC > targets invalid code. If there is consensus that some new > option is necessary, the right setting is on by default, > along with warnings for programs that pass non-terminated > arrays to string functions. I have already submitted a patch > that effect for strlen and const arrays and am working on > extending it to other built-in functions and dynamically > created strings (when I have a chance between defending > my past work). > > Either way, if a new option is introduced, the array bound > computation for sprintf (and all other buffer overflow > warnings, likely including -Wrestrict) will need to be > decoupled from the option so the current behavior is > preserved. > > As I have also said, the area of the provenance of subobject > vs enclosing object pointers is under discussion in WG14 and > the C Object Model study group. The informal consensus so > far is to maintain the provenance of subobjects and introduce > some mechanism to make it possible to extend the provenance > to the enclosing object. This would be a pervasive change, > not one just limited to strings. So if introducing some new > option at this stage is thought to be necessary this should > be taken into consideration. > > Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-02 17:00 ` Martin Sebor @ 2018-08-02 18:15 ` Bernd Edlinger 2018-08-03 3:06 ` Martin Sebor 2018-08-02 18:20 ` Jakub Jelinek 1 sibling, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-02 18:15 UTC (permalink / raw) To: Martin Sebor, GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Jeff Law On 08/02/18 19:00, Martin Sebor wrote: > As an alternate approach I have been thinking about, if > there is a strong feeling that allowing strlen to iterate > past the subobject boundary is necessary (I don't believe > it is.) > > Rather than indiscriminately expanding the provenance of > the subobject regardless of what members follow it in > the enclosing structure, only consider doing that if > the next member is an array of the same type. E.g., > > struct S { char a[4], b[3], c[2], d; }; > extern struct S *p; > > strlen (p->a); // consider p->a's bounds to be char[9] > No, initially I thought in the same direction, but looking at the way how X-server is broken, I realized that will probably not be sufficient. Maybe it would be good to have one set of optimistic range infos that follow the standards, and can be used to control the warnings, and another set of pessimistic range infos, that control optimizations. Consider extern char garbage[10]; char x[10]; memcpy(x, garbage, 10); x[9] = 0; strlen(x) will be 0..9 no matter what was in garbage. That information can be safely used for optimizations. But if we have extern char garbage[10]; char x[10]; memcpy(x, garbage, 10); char *y = x; if (strlen(y) < 10) <= may not be removed pessimistically char z[10]; sprintf(z, "%s", y); <= omitting warning would be okay optimistically. It is not really easy to do but possible. In the moment I would like to concentrate exclusively on wrong code issues and not new warnings, even some regressions on the warnings look acceptable to me. Bernd. > I.e., treat p->a, p->b, and p->c as one array but exclude > from it d because it's not an array. > > (This wouldn't solve the warning problem below -- a separate > computation would still be necessary to determine the tighter > bound of the member itself.) > > On 08/02/2018 09:42 AM, Martin Sebor wrote: >> On 08/02/2018 04:22 AM, Bernd Edlinger wrote: >>> Hi, >>> >>> this is an update of the patch to prevent unsafe optimizations due to >>> strlen range assuming >>> always zero-terminated char arrays. >>> >>> Since the previous version I do no longer try to locate the outermost >>> char array, >>> but just bail out if there is a typecast, that means, supposed we have >>> a 2-dimensional >>> array, char a[x][y], strlen (s.a[x]) may assume a[x] is >>> zero-terminated, if the optimization >>> is enabled. strlen ((char*)s.a) involves a type cast and should >>> assume nothing. >>> >>> Additionally due to the discussion, I came to the conclusion that this >>> strlen range optimization >>> should only be used when enabled with >>> -fassume-zero-terminated-char-arrays or -Ofast. >>> >>> Note that I included the test case with the removed assertion as >>> gcc/testsuite/gcc.dg/strlenopt-57.c >>> >>> Initially it would only miscompile with -Ofast, but with r263018 aka >>> PR tree-optimization/86043, PR tree-optimization/86042 it was >>> miscompiled with -O3 as well. >>> >>> I located the reason in gcc/tree-ssa-strlen.c (get_min_string_length) >>> which did not >>> check if the string constant is in fact zero terminated: >>> >>> @@ -3192,7 +3194,9 @@ get_min_string_length (tree rhs, bool *full_string >>> && TREE_READONLY (rhs)) >>> rhs = DECL_INITIAL (rhs); >>> >>> - if (rhs && TREE_CODE (rhs) == STRING_CST) >>> + if (rhs && TREE_CODE (rhs) == STRING_CST >>> + && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)), >>> + TREE_STRING_LENGTH (rhs)) >= 0) >>> { >>> *full_string_p = true; >>> return strlen (TREE_STRING_POINTER (rhs)); >>> >>> Fortunately this also shows a way how to narrow strlen return value >>> expectations when >>> we are able to positively prove that a string must be zero terminated. >>> >>> >>> >>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >>> Is it OK for trunk? >> >> The warning bits are definitely not okay by me. The purpose >> of the warnings (-W{format,sprintf}-{overflow,truncation} is >> to detect buffer overflows. When a warning doesn't have access >> to string length information for dynamically created strings >> (like the strlen pass does) it uses array sizes as a proxy. >> This is useful both to detect possible buffer overflows and >> to prevent false positives for overflows that cannot happen >> in correctly written programs. >> >> By changing the logic to not consider the bounds of the array >> the warnkngs will become prone to many false positives as is >> evident from your changes to the tests (you had to explicitly >> enable the new option). >> >> In effect, the patch undoes a couple of years worth of fine >> tuning of the warnings to strike a balance between true and >> false positives. That's completely unacceptable to me, and, >> frankly, proposals along these lines are exceedingly >> disruptive. >> >> Beyond that, I don't have a problem with adding a new option >> but I continue to strongly disagree with disabling the strlen >> optimization by default. It implies that by default GCC >> targets invalid code. If there is consensus that some new >> option is necessary, the right setting is on by default, >> along with warnings for programs that pass non-terminated >> arrays to string functions. I have already submitted a patch >> that effect for strlen and const arrays and am working on >> extending it to other built-in functions and dynamically >> created strings (when I have a chance between defending >> my past work). >> >> Either way, if a new option is introduced, the array bound >> computation for sprintf (and all other buffer overflow >> warnings, likely including -Wrestrict) will need to be >> decoupled from the option so the current behavior is >> preserved. >> >> As I have also said, the area of the provenance of subobject >> vs enclosing object pointers is under discussion in WG14 and >> the C Object Model study group. The informal consensus so >> far is to maintain the provenance of subobjects and introduce >> some mechanism to make it possible to extend the provenance >> to the enclosing object. This would be a pervasive change, >> not one just limited to strings. So if introducing some new >> option at this stage is thought to be necessary this should >> be taken into consideration. >> >> Martin > ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-02 18:15 ` Bernd Edlinger @ 2018-08-03 3:06 ` Martin Sebor 0 siblings, 0 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-03 3:06 UTC (permalink / raw) To: Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Jeff Law On 08/02/2018 12:15 PM, Bernd Edlinger wrote: > On 08/02/18 19:00, Martin Sebor wrote: >> As an alternate approach I have been thinking about, if >> there is a strong feeling that allowing strlen to iterate >> past the subobject boundary is necessary (I don't believe >> it is.) >> >> Rather than indiscriminately expanding the provenance of >> the subobject regardless of what members follow it in >> the enclosing structure, only consider doing that if >> the next member is an array of the same type. E.g., >> >> struct S { char a[4], b[3], c[2], d; }; >> extern struct S *p; >> >> strlen (p->a); // consider p->a's bounds to be char[9] >> > > No, initially I thought in the same direction, > but looking at the way how X-server is broken, > I realized that will probably not be sufficient. > > Maybe it would be good to have one set of optimistic range infos > that follow the standards, and can be used to control the warnings, > and another set of pessimistic range infos, that control optimizations. > > Consider > extern char garbage[10]; > char x[10]; > memcpy(x, garbage, 10); > x[9] = 0; > > strlen(x) will be 0..9 no matter what was in garbage. > That information can be safely used for optimizations. > > But if we have > extern char garbage[10]; > char x[10]; > memcpy(x, garbage, 10); > char *y = x; > if (strlen(y) < 10) <= may not be removed pessimistically This example involves a whole object. There should be no question that running off the end of a full object is undefined and a bug. It's far preferable to avoid such bugs. Emitting a library call that can return an arbitrary value or crash is the least secure and least user-friendly solution. > char z[10]; > sprintf(z, "%s", y); <= omitting warning would be okay optimistically. > > It is not really easy to do but possible. I see three cases here: 1) y points to a constant array whose initializer we know 2) y points to an array with contents (length) known to the strlen pass 3) we don't know what y points to or its contents (1) can be handled easily by extending the approach in my patch for bug 86552 to other built-ins. I have a patch that does that for sprintf. (2) can be handled by extending the strlen pass to also track the sizes of array objects into which it tracks stores. I'm working on a patch for GCC 9. (3) can only be handled by adding an even stricter setting for -Wformat-overflow by warning for all %s arguments of unknown length. I haven't considered this yet. > > In the moment I would like to concentrate exclusively on wrong code issues The only wrong code is in programs that run off the end of unterminated buffers. Those are the ones I believe we should focus on helping detect and prevent. Warnings are the fist step. Enhancing _FORTIFY_SOURCE and the sanitizers to also detect such reads would be another solution (in addition to warnings). Emitting traps (say under an option) would be yet another. But changing GCC to silently accept such programs is, in my opinion, the worst possible approach. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-02 17:00 ` Martin Sebor 2018-08-02 18:15 ` Bernd Edlinger @ 2018-08-02 18:20 ` Jakub Jelinek 2018-08-03 3:24 ` Martin Sebor 1 sibling, 1 reply; 121+ messages in thread From: Jakub Jelinek @ 2018-08-02 18:20 UTC (permalink / raw) To: Martin Sebor; +Cc: Bernd Edlinger, GCC Patches, Richard Biener, Jeff Law On Thu, Aug 02, 2018 at 11:00:32AM -0600, Martin Sebor wrote: > As an alternate approach I have been thinking about, if > there is a strong feeling that allowing strlen to iterate > past the subobject boundary is necessary (I don't believe > it is.) > > Rather than indiscriminately expanding the provenance of > the subobject regardless of what members follow it in > the enclosing structure, only consider doing that if > the next member is an array of the same type. E.g., > > struct S { char a[4], b[3], c[2], d; }; > extern struct S *p; > > strlen (p->a); // consider p->a's bounds to be char[9] See the mail with testcases where the middle-end doesn't distinguish between p->a and (char *) p, unless you want to warn or optimize in the FEs or extremely early in the lowering passes, that isn't going to work. Jakub ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-02 18:20 ` Jakub Jelinek @ 2018-08-03 3:24 ` Martin Sebor 0 siblings, 0 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-03 3:24 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Bernd Edlinger, GCC Patches, Richard Biener, Jeff Law On 08/02/2018 12:20 PM, Jakub Jelinek wrote: > On Thu, Aug 02, 2018 at 11:00:32AM -0600, Martin Sebor wrote: >> As an alternate approach I have been thinking about, if >> there is a strong feeling that allowing strlen to iterate >> past the subobject boundary is necessary (I don't believe >> it is.) >> >> Rather than indiscriminately expanding the provenance of >> the subobject regardless of what members follow it in >> the enclosing structure, only consider doing that if >> the next member is an array of the same type. E.g., >> >> struct S { char a[4], b[3], c[2], d; }; >> extern struct S *p; >> >> strlen (p->a); // consider p->a's bounds to be char[9] > > See the mail with testcases where the middle-end doesn't distinguish > between p->a and (char *) p, unless you want to warn or optimize > in the FEs or extremely early in the lowering passes, that isn't going to > work. When the object structure is lost (as in MEM_REF) the middle-end (specifically strlen) already considers the whole object, so that wouldn't change. The only impact would be on the cases where the middle end currently does consider the subobject: instead of taking just its size, it would consider the size of the subsequnt members. For the case of strlen, that would be a simple change to get_range_strlen(). Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-02 15:42 ` Martin Sebor 2018-08-02 17:00 ` Martin Sebor @ 2018-08-09 5:36 ` Jeff Law 2018-08-10 16:56 ` Martin Sebor 1 sibling, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-08-09 5:36 UTC (permalink / raw) To: Martin Sebor, Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 08/02/2018 09:42 AM, Martin Sebor wrote: > The warning bits are definitely not okay by me. The purpose > of the warnings (-W{format,sprintf}-{overflow,truncation} is > to detect buffer overflows. When a warning doesn't have access > to string length information for dynamically created strings > (like the strlen pass does) it uses array sizes as a proxy. > This is useful both to detect possible buffer overflows and > to prevent false positives for overflows that cannot happen > in correctly written programs. So how much of this falling-back to array sizes as a proxy would become unnecessary if sprintf had access to the strlen pass as an analysis module? As you know we've been kicking that around and from my investigations that doesn't really look hard to do. Encapsulate the data structures in a class, break up the statement handling into analysis and optimization and we should be good to go. I'm hoping to start prototyping this week. If we think that has a reasonable chance to eliminate the array-size fallback, then that seems like the most promising path forward. > > By changing the logic to not consider the bounds of the array > the warnkngs will become prone to many false positives as is > evident from your changes to the tests (you had to explicitly > enable the new option). ISTM that a reasonable goal to shoot for is to be able to run the existing warning tests without the array-size fallback once we've got a string length analysis module. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-09 5:36 ` Jeff Law @ 2018-08-10 16:56 ` Martin Sebor 2018-08-15 4:39 ` Jeff Law 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-10 16:56 UTC (permalink / raw) To: Jeff Law, Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 08/08/2018 11:36 PM, Jeff Law wrote: > On 08/02/2018 09:42 AM, Martin Sebor wrote: > >> The warning bits are definitely not okay by me. The purpose >> of the warnings (-W{format,sprintf}-{overflow,truncation} is >> to detect buffer overflows. When a warning doesn't have access >> to string length information for dynamically created strings >> (like the strlen pass does) it uses array sizes as a proxy. >> This is useful both to detect possible buffer overflows and >> to prevent false positives for overflows that cannot happen >> in correctly written programs. > So how much of this falling-back to array sizes as a proxy would become > unnecessary if sprintf had access to the strlen pass as an analysis module? > > As you know we've been kicking that around and from my investigations > that doesn't really look hard to do. Encapsulate the data structures in > a class, break up the statement handling into analysis and optimization > and we should be good to go. I'm hoping to start prototyping this week. > > If we think that has a reasonable chance to eliminate the array-size > fallback, then that seems like the most promising path forward. We discussed this idea this morning so let me respond here and reiterate the answer. Using the strlen data will help detect buffer overflow where the array size isn't available but it cannot replace the array size heuristic. Here's a simple example: struct S { char a[8]; }; char d[8]; void f (struct S *s, int i) { sprintf (d, "%s-%i", s[i].a, i); } We don't know the length of s->a but we do know that it can be up to 7 bytes long (assuming it's nul-terminated of course) so we know the sprintf call can overflow. Conversely, if the size of the destination is increased to 20 the sprintf call cannot overflow so the diagnostic can be avoided. Removing the array size heuristic would force us to either give up on diagnosing the first case or issue false positives for the second case. I think the second alternative would make the warning too noisy to be useful. The strlen pass will help detect buffer overflows in cases where the array size isn't known (e.g., with dynamically allocated buffers) but where the length of the string store in the array is known. It will also help avoid false positives in cases where the string stored in an array of known size is known to be too short to cause an overflow. For instance here: struct S { char a[8]; }; char d[8]; void f (struct S *s, int i) { if (strlen (s->a) < 4 && i >= 0 && i < 100) sprintf (d, "%s-%i", s->a, i); } Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-10 16:56 ` Martin Sebor @ 2018-08-15 4:39 ` Jeff Law 2018-08-20 10:12 ` Richard Biener 0 siblings, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-08-15 4:39 UTC (permalink / raw) To: Martin Sebor, Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 08/10/2018 10:56 AM, Martin Sebor wrote: > On 08/08/2018 11:36 PM, Jeff Law wrote: >> On 08/02/2018 09:42 AM, Martin Sebor wrote: >> >>> The warning bits are definitely not okay by me. The purpose >>> of the warnings (-W{format,sprintf}-{overflow,truncation} is >>> to detect buffer overflows. When a warning doesn't have access >>> to string length information for dynamically created strings >>> (like the strlen pass does) it uses array sizes as a proxy. >>> This is useful both to detect possible buffer overflows and >>> to prevent false positives for overflows that cannot happen >>> in correctly written programs. >> So how much of this falling-back to array sizes as a proxy would become >> unnecessary if sprintf had access to the strlen pass as an analysis >> module? >> >> As you know we've been kicking that around and from my investigations >> that doesn't really look hard to do. Encapsulate the data structures in >> a class, break up the statement handling into analysis and optimization >> and we should be good to go. I'm hoping to start prototyping this week. >> >> If we think that has a reasonable chance to eliminate the array-size >> fallback, then that seems like the most promising path forward. > > We discussed this idea this morning so let me respond here and > reiterate the answer. Using the strlen data will help detect > buffer overflow where the array size isn't available but it > cannot replace the array size heuristic. Here's a simple > example: > >  struct S { char a[8]; }; > >  char d[8]; >  void f (struct S *s, int i) >  { >    sprintf (d, "%s-%i", s[i].a, i); >  } > > We don't know the length of s->a but we do know that it can > be up to 7 bytes long (assuming it's nul-terminated of course) > so we know the sprintf call can overflow. Conversely, if > the size of the destination is increased to 20 the sprintf > call cannot overflow so the diagnostic can be avoided. > > Removing the array size heuristic would force us to either give > up on diagnosing the first case or issue false positives for > the second case. I think the second alternative would make > the warning too noisy to be useful. > > The strlen pass will help detect buffer overflows in cases > where the array size isn't known (e.g., with dynamically > allocated buffers) but where the length of the string store > in the array is known. It will also help avoid false positives > in cases where the string stored in an array of known size is > known to be too short to cause an overflow. For instance here: > >  struct S { char a[8]; }; > >  char d[8]; >  void f (struct S *s, int i) >  { >    if (strlen (s->a) < 4 && i >= 0 && i < 100) >      sprintf (d, "%s-%i", s->a, i); >  } ACK. Thanks for explaining things here too. I can't speak for others, but seeing examples along with the explanation is easier for me to absorb. For Bernd and others -- after kicking things around a bit with Martin, what we're recommending is that compute_string_length we compute the bounds using both GIMPLE and C semantics and return both. Anything which influences code generation or optimization must use the GIMPLE semantics. Warnings may use the C semantics in an effort to improve preciseness. Martin has some other stuff to flush out of his queue, then he'll be focused on the changes to compute_string_length. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-15 4:39 ` Jeff Law @ 2018-08-20 10:12 ` Richard Biener 2018-08-20 10:23 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-08-20 10:12 UTC (permalink / raw) To: Jeff Law Cc: Martin Sebor, Bernd Edlinger, GCC Patches, Richard Guenther, Jakub Jelinek On Wed, Aug 15, 2018 at 6:39 AM Jeff Law <law@redhat.com> wrote: > > On 08/10/2018 10:56 AM, Martin Sebor wrote: > > On 08/08/2018 11:36 PM, Jeff Law wrote: > >> On 08/02/2018 09:42 AM, Martin Sebor wrote: > >> > >>> The warning bits are definitely not okay by me. The purpose > >>> of the warnings (-W{format,sprintf}-{overflow,truncation} is > >>> to detect buffer overflows. When a warning doesn't have access > >>> to string length information for dynamically created strings > >>> (like the strlen pass does) it uses array sizes as a proxy. > >>> This is useful both to detect possible buffer overflows and > >>> to prevent false positives for overflows that cannot happen > >>> in correctly written programs. > >> So how much of this falling-back to array sizes as a proxy would become > >> unnecessary if sprintf had access to the strlen pass as an analysis > >> module? > >> > >> As you know we've been kicking that around and from my investigations > >> that doesn't really look hard to do. Encapsulate the data structures in > >> a class, break up the statement handling into analysis and optimization > >> and we should be good to go. I'm hoping to start prototyping this week. > >> > >> If we think that has a reasonable chance to eliminate the array-size > >> fallback, then that seems like the most promising path forward. > > > > We discussed this idea this morning so let me respond here and > > reiterate the answer. Using the strlen data will help detect > > buffer overflow where the array size isn't available but it > > cannot replace the array size heuristic. Here's a simple > > example: > > > > struct S { char a[8]; }; > > > > char d[8]; > > void f (struct S *s, int i) > > { > > sprintf (d, "%s-%i", s[i].a, i); > > } > > > > We don't know the length of s->a but we do know that it can > > be up to 7 bytes long (assuming it's nul-terminated of course) > > so we know the sprintf call can overflow. Conversely, if > > the size of the destination is increased to 20 the sprintf > > call cannot overflow so the diagnostic can be avoided. > > > > Removing the array size heuristic would force us to either give > > up on diagnosing the first case or issue false positives for > > the second case. I think the second alternative would make > > the warning too noisy to be useful. > > > > The strlen pass will help detect buffer overflows in cases > > where the array size isn't known (e.g., with dynamically > > allocated buffers) but where the length of the string store > > in the array is known. It will also help avoid false positives > > in cases where the string stored in an array of known size is > > known to be too short to cause an overflow. For instance here: > > > > struct S { char a[8]; }; > > > > char d[8]; > > void f (struct S *s, int i) > > { > > if (strlen (s->a) < 4 && i >= 0 && i < 100) > > sprintf (d, "%s-%i", s->a, i); > > } > ACK. Thanks for explaining things here too. I can't speak for others, > but seeing examples along with the explanation is easier for me to absorb. > > For Bernd and others -- after kicking things around a bit with Martin, > what we're recommending is that compute_string_length we compute the > bounds using both GIMPLE and C semantics and return both. But you can't do this because GIMPLE did transforms that are not valid in C, thus you can't interpret the GIMPLE IL as "C", you can only interpret it as GIMPLE. What you'd do is return GIMPLE semantics length and "foobar" semantics length which doesn't match the original source. > > Anything which influences code generation or optimization must use the > GIMPLE semantics. Warnings may use the C semantics in an effort to > improve preciseness. > > Martin has some other stuff to flush out of his queue, then he'll be > focused on the changes to compute_string_length. > Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-20 10:12 ` Richard Biener @ 2018-08-20 10:23 ` Bernd Edlinger 2018-08-20 14:26 ` Jeff Law 0 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-20 10:23 UTC (permalink / raw) To: Richard Biener, Jeff Law Cc: Martin Sebor, GCC Patches, Richard Guenther, Jakub Jelinek On 08/20/18 12:12, Richard Biener wrote: > On Wed, Aug 15, 2018 at 6:39 AM Jeff Law <law@redhat.com> wrote: >> >> On 08/10/2018 10:56 AM, Martin Sebor wrote: >>> On 08/08/2018 11:36 PM, Jeff Law wrote: >>>> On 08/02/2018 09:42 AM, Martin Sebor wrote: >>>> >>>>> The warning bits are definitely not okay by me. The purpose >>>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is >>>>> to detect buffer overflows. When a warning doesn't have access >>>>> to string length information for dynamically created strings >>>>> (like the strlen pass does) it uses array sizes as a proxy. >>>>> This is useful both to detect possible buffer overflows and >>>>> to prevent false positives for overflows that cannot happen >>>>> in correctly written programs. >>>> So how much of this falling-back to array sizes as a proxy would become >>>> unnecessary if sprintf had access to the strlen pass as an analysis >>>> module? >>>> >>>> As you know we've been kicking that around and from my investigations >>>> that doesn't really look hard to do. Encapsulate the data structures in >>>> a class, break up the statement handling into analysis and optimization >>>> and we should be good to go. I'm hoping to start prototyping this week. >>>> >>>> If we think that has a reasonable chance to eliminate the array-size >>>> fallback, then that seems like the most promising path forward. >>> >>> We discussed this idea this morning so let me respond here and >>> reiterate the answer. Using the strlen data will help detect >>> buffer overflow where the array size isn't available but it >>> cannot replace the array size heuristic. Here's a simple >>> example: >>> >>> struct S { char a[8]; }; >>> >>> char d[8]; >>> void f (struct S *s, int i) >>> { >>> sprintf (d, "%s-%i", s[i].a, i); >>> } >>> >>> We don't know the length of s->a but we do know that it can >>> be up to 7 bytes long (assuming it's nul-terminated of course) >>> so we know the sprintf call can overflow. Conversely, if >>> the size of the destination is increased to 20 the sprintf >>> call cannot overflow so the diagnostic can be avoided. >>> >>> Removing the array size heuristic would force us to either give >>> up on diagnosing the first case or issue false positives for >>> the second case. I think the second alternative would make >>> the warning too noisy to be useful. >>> >>> The strlen pass will help detect buffer overflows in cases >>> where the array size isn't known (e.g., with dynamically >>> allocated buffers) but where the length of the string store >>> in the array is known. It will also help avoid false positives >>> in cases where the string stored in an array of known size is >>> known to be too short to cause an overflow. For instance here: >>> >>> struct S { char a[8]; }; >>> >>> char d[8]; >>> void f (struct S *s, int i) >>> { >>> if (strlen (s->a) < 4 && i >= 0 && i < 100) >>> sprintf (d, "%s-%i", s->a, i); >>> } >> ACK. Thanks for explaining things here too. I can't speak for others, >> but seeing examples along with the explanation is easier for me to absorb. >> >> For Bernd and others -- after kicking things around a bit with Martin, >> what we're recommending is that compute_string_length we compute the >> bounds using both GIMPLE and C semantics and return both. > > But you can't do this because GIMPLE did transforms that are not valid in > C, thus you can't interpret the GIMPLE IL as "C", you can only interpret > it as GIMPLE. What you'd do is return GIMPLE semantics length > and "foobar" semantics length which doesn't match the original source. > If I understood that suggestion right, it means, we live with some false positive or missing warnings due to those transformations. That means, get_range_strlen with the 2-parameter overload is used for warnings only. And it returns most of the time a correct range info, that is good enough for warnings. The 4-parameter overload when called with strict=true, returns only range info that are based on hard facts, this range info does not use those unsafe type infos, but it can be safely used as input to the VRP machinery. Bernd. >> >> Anything which influences code generation or optimization must use the >> GIMPLE semantics. Warnings may use the C semantics in an effort to >> improve preciseness. >> >> Martin has some other stuff to flush out of his queue, then he'll be >> focused on the changes to compute_string_length. >> Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-20 10:23 ` Bernd Edlinger @ 2018-08-20 14:26 ` Jeff Law 2018-08-20 15:16 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-08-20 14:26 UTC (permalink / raw) To: Bernd Edlinger, Richard Biener Cc: Martin Sebor, GCC Patches, Richard Guenther, Jakub Jelinek On 08/20/2018 04:23 AM, Bernd Edlinger wrote: > On 08/20/18 12:12, Richard Biener wrote: >> On Wed, Aug 15, 2018 at 6:39 AM Jeff Law <law@redhat.com> wrote: >>> >>> On 08/10/2018 10:56 AM, Martin Sebor wrote: >>>> On 08/08/2018 11:36 PM, Jeff Law wrote: >>>>> On 08/02/2018 09:42 AM, Martin Sebor wrote: >>>>> >>>>>> The warning bits are definitely not okay by me. The purpose >>>>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is >>>>>> to detect buffer overflows. When a warning doesn't have access >>>>>> to string length information for dynamically created strings >>>>>> (like the strlen pass does) it uses array sizes as a proxy. >>>>>> This is useful both to detect possible buffer overflows and >>>>>> to prevent false positives for overflows that cannot happen >>>>>> in correctly written programs. >>>>> So how much of this falling-back to array sizes as a proxy would become >>>>> unnecessary if sprintf had access to the strlen pass as an analysis >>>>> module? >>>>> >>>>> As you know we've been kicking that around and from my investigations >>>>> that doesn't really look hard to do. Encapsulate the data structures in >>>>> a class, break up the statement handling into analysis and optimization >>>>> and we should be good to go. I'm hoping to start prototyping this week. >>>>> >>>>> If we think that has a reasonable chance to eliminate the array-size >>>>> fallback, then that seems like the most promising path forward. >>>> >>>> We discussed this idea this morning so let me respond here and >>>> reiterate the answer. Using the strlen data will help detect >>>> buffer overflow where the array size isn't available but it >>>> cannot replace the array size heuristic. Here's a simple >>>> example: >>>> >>>> struct S { char a[8]; }; >>>> >>>> char d[8]; >>>> void f (struct S *s, int i) >>>> { >>>> sprintf (d, "%s-%i", s[i].a, i); >>>> } >>>> >>>> We don't know the length of s->a but we do know that it can >>>> be up to 7 bytes long (assuming it's nul-terminated of course) >>>> so we know the sprintf call can overflow. Conversely, if >>>> the size of the destination is increased to 20 the sprintf >>>> call cannot overflow so the diagnostic can be avoided. >>>> >>>> Removing the array size heuristic would force us to either give >>>> up on diagnosing the first case or issue false positives for >>>> the second case. I think the second alternative would make >>>> the warning too noisy to be useful. >>>> >>>> The strlen pass will help detect buffer overflows in cases >>>> where the array size isn't known (e.g., with dynamically >>>> allocated buffers) but where the length of the string store >>>> in the array is known. It will also help avoid false positives >>>> in cases where the string stored in an array of known size is >>>> known to be too short to cause an overflow. For instance here: >>>> >>>> struct S { char a[8]; }; >>>> >>>> char d[8]; >>>> void f (struct S *s, int i) >>>> { >>>> if (strlen (s->a) < 4 && i >= 0 && i < 100) >>>> sprintf (d, "%s-%i", s->a, i); >>>> } >>> ACK. Thanks for explaining things here too. I can't speak for others, >>> but seeing examples along with the explanation is easier for me to absorb. >>> >>> For Bernd and others -- after kicking things around a bit with Martin, >>> what we're recommending is that compute_string_length we compute the >>> bounds using both GIMPLE and C semantics and return both. >> >> But you can't do this because GIMPLE did transforms that are not valid in >> C, thus you can't interpret the GIMPLE IL as "C", you can only interpret >> it as GIMPLE. What you'd do is return GIMPLE semantics length >> and "foobar" semantics length which doesn't match the original source. >> > > If I understood that suggestion right, it means, we > live with some false positive or missing warnings due to those transformations. > That means, get_range_strlen with the 2-parameter overload is used > for warnings only. And it returns most of the time a correct range info, > that is good enough for warnings. Correct. 99.9% of the time using the ranges implied by the array types is better for the warning code. So the idea is to return two ranges. One which uses GIMPLE semantics for code generation and optimization purposes, the other which uses ranges implied by the array types for warning purposes. Martin suggested that we always compute and return both rather than having a boolean argument to select between the behavior. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-20 14:26 ` Jeff Law @ 2018-08-20 15:16 ` Bernd Edlinger 2018-08-20 20:42 ` Martin Sebor 2018-08-21 21:58 ` Jeff Law 0 siblings, 2 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-08-20 15:16 UTC (permalink / raw) To: Jeff Law, Richard Biener Cc: Martin Sebor, GCC Patches, Richard Guenther, Jakub Jelinek On 08/20/18 16:26, Jeff Law wrote: > On 08/20/2018 04:23 AM, Bernd Edlinger wrote: >> On 08/20/18 12:12, Richard Biener wrote: >>> On Wed, Aug 15, 2018 at 6:39 AM Jeff Law <law@redhat.com> wrote: >>>> >>>> On 08/10/2018 10:56 AM, Martin Sebor wrote: >>>>> On 08/08/2018 11:36 PM, Jeff Law wrote: >>>>>> On 08/02/2018 09:42 AM, Martin Sebor wrote: >>>>>> >>>>>>> The warning bits are definitely not okay by me. The purpose >>>>>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is >>>>>>> to detect buffer overflows. When a warning doesn't have access >>>>>>> to string length information for dynamically created strings >>>>>>> (like the strlen pass does) it uses array sizes as a proxy. >>>>>>> This is useful both to detect possible buffer overflows and >>>>>>> to prevent false positives for overflows that cannot happen >>>>>>> in correctly written programs. >>>>>> So how much of this falling-back to array sizes as a proxy would become >>>>>> unnecessary if sprintf had access to the strlen pass as an analysis >>>>>> module? >>>>>> >>>>>> As you know we've been kicking that around and from my investigations >>>>>> that doesn't really look hard to do. Encapsulate the data structures in >>>>>> a class, break up the statement handling into analysis and optimization >>>>>> and we should be good to go. I'm hoping to start prototyping this week. >>>>>> >>>>>> If we think that has a reasonable chance to eliminate the array-size >>>>>> fallback, then that seems like the most promising path forward. >>>>> >>>>> We discussed this idea this morning so let me respond here and >>>>> reiterate the answer. Using the strlen data will help detect >>>>> buffer overflow where the array size isn't available but it >>>>> cannot replace the array size heuristic. Here's a simple >>>>> example: >>>>> >>>>> struct S { char a[8]; }; >>>>> >>>>> char d[8]; >>>>> void f (struct S *s, int i) >>>>> { >>>>> sprintf (d, "%s-%i", s[i].a, i); >>>>> } >>>>> >>>>> We don't know the length of s->a but we do know that it can >>>>> be up to 7 bytes long (assuming it's nul-terminated of course) >>>>> so we know the sprintf call can overflow. Conversely, if >>>>> the size of the destination is increased to 20 the sprintf >>>>> call cannot overflow so the diagnostic can be avoided. >>>>> >>>>> Removing the array size heuristic would force us to either give >>>>> up on diagnosing the first case or issue false positives for >>>>> the second case. I think the second alternative would make >>>>> the warning too noisy to be useful. >>>>> >>>>> The strlen pass will help detect buffer overflows in cases >>>>> where the array size isn't known (e.g., with dynamically >>>>> allocated buffers) but where the length of the string store >>>>> in the array is known. It will also help avoid false positives >>>>> in cases where the string stored in an array of known size is >>>>> known to be too short to cause an overflow. For instance here: >>>>> >>>>> struct S { char a[8]; }; >>>>> >>>>> char d[8]; >>>>> void f (struct S *s, int i) >>>>> { >>>>> if (strlen (s->a) < 4 && i >= 0 && i < 100) >>>>> sprintf (d, "%s-%i", s->a, i); >>>>> } >>>> ACK. Thanks for explaining things here too. I can't speak for others, >>>> but seeing examples along with the explanation is easier for me to absorb. >>>> >>>> For Bernd and others -- after kicking things around a bit with Martin, >>>> what we're recommending is that compute_string_length we compute the >>>> bounds using both GIMPLE and C semantics and return both. >>> >>> But you can't do this because GIMPLE did transforms that are not valid in >>> C, thus you can't interpret the GIMPLE IL as "C", you can only interpret >>> it as GIMPLE. What you'd do is return GIMPLE semantics length >>> and "foobar" semantics length which doesn't match the original source. >>> >> >> If I understood that suggestion right, it means, we >> live with some false positive or missing warnings due to those transformations. >> That means, get_range_strlen with the 2-parameter overload is used >> for warnings only. And it returns most of the time a correct range info, >> that is good enough for warnings. > Correct. 99.9% of the time using the ranges implied by the array types > is better for the warning code. So the idea is to return two ranges. > One which uses GIMPLE semantics for code generation and optimization > purposes, the other which uses ranges implied by the array types for > warning purposes. > > Martin suggested that we always compute and return both rather than > having a boolean argument to select between the behavior. > > Okay, but there is already the "strict" parameter: /* Determine the minimum and maximum value or string length that ARG refers to and store each in the first two elements of MINMAXLEN. For expressions that point to strings of unknown lengths that are character arrays, use the upper bound of the array as the maximum length. For example, given an expression like 'x ? array : "xyz"' and array declared as 'char array[8]', MINMAXLEN[0] will be set to 0 and MINMAXLEN[1] to 7, the longest string that could be stored in array. Return true if the range of the string lengths has been obtained from the upper bound of an array at the end of a struct. Such an array may hold a string that's longer than its upper bound due to it being used as a poor-man's flexible array member. STRICT is true if it will handle PHIs and COND_EXPRs conservatively and false if PHIs and COND_EXPRs are to be handled optimistically, if we can determine string length minimum and maximum; it will use the minimum from the ones where it can be determined. STRICT false should be only used for warning code. ELTSIZE is 1 for normal single byte character strings, and 2 or 4 for wide characer strings. ELTSIZE is by default 1. */ bool get_range_strlen (tree arg, tree minmaxlen[2], unsigned eltsize, bool strict) I found out, that there is only one call with strict = true: gimple_fold_builtin_strlen and that is the only one that needs GIMPLE semantics. because it does: if (minlen == maxlen) { lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL, true, GSI_SAME_STMT); replace_call_with_value (gsi, lenrange[0]); return true; } if (tree lhs = gimple_call_lhs (stmt)) if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE (lhs))) set_range_info (lhs, VR_RANGE, minlen, maxlen); All other places are dealing with warnings, and have no use for GIMPLE semantics. You do not suggest to have set_range_info to hanlde two sets of range info, right? I must admit that the interface of the 7-parameter overload of get_range_strlen is really a bit ugly, and that should at least be renamed to not mix up with the 4-parameter overload.... Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-20 15:16 ` Bernd Edlinger @ 2018-08-20 20:42 ` Martin Sebor 2018-08-20 21:31 ` Bernd Edlinger 2018-08-21 21:58 ` Jeff Law 1 sibling, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-20 20:42 UTC (permalink / raw) To: Bernd Edlinger, Jeff Law, Richard Biener Cc: GCC Patches, Richard Guenther, Jakub Jelinek On 08/20/2018 09:15 AM, Bernd Edlinger wrote: > On 08/20/18 16:26, Jeff Law wrote: >> On 08/20/2018 04:23 AM, Bernd Edlinger wrote: >>> On 08/20/18 12:12, Richard Biener wrote: >>>> On Wed, Aug 15, 2018 at 6:39 AM Jeff Law <law@redhat.com> wrote: >>>>> >>>>> On 08/10/2018 10:56 AM, Martin Sebor wrote: >>>>>> On 08/08/2018 11:36 PM, Jeff Law wrote: >>>>>>> On 08/02/2018 09:42 AM, Martin Sebor wrote: >>>>>>> >>>>>>>> The warning bits are definitely not okay by me. The purpose >>>>>>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is >>>>>>>> to detect buffer overflows. When a warning doesn't have access >>>>>>>> to string length information for dynamically created strings >>>>>>>> (like the strlen pass does) it uses array sizes as a proxy. >>>>>>>> This is useful both to detect possible buffer overflows and >>>>>>>> to prevent false positives for overflows that cannot happen >>>>>>>> in correctly written programs. >>>>>>> So how much of this falling-back to array sizes as a proxy would become >>>>>>> unnecessary if sprintf had access to the strlen pass as an analysis >>>>>>> module? >>>>>>> >>>>>>> As you know we've been kicking that around and from my investigations >>>>>>> that doesn't really look hard to do. Encapsulate the data structures in >>>>>>> a class, break up the statement handling into analysis and optimization >>>>>>> and we should be good to go. I'm hoping to start prototyping this week. >>>>>>> >>>>>>> If we think that has a reasonable chance to eliminate the array-size >>>>>>> fallback, then that seems like the most promising path forward. >>>>>> >>>>>> We discussed this idea this morning so let me respond here and >>>>>> reiterate the answer. Using the strlen data will help detect >>>>>> buffer overflow where the array size isn't available but it >>>>>> cannot replace the array size heuristic. Here's a simple >>>>>> example: >>>>>> >>>>>> struct S { char a[8]; }; >>>>>> >>>>>> char d[8]; >>>>>> void f (struct S *s, int i) >>>>>> { >>>>>> sprintf (d, "%s-%i", s[i].a, i); >>>>>> } >>>>>> >>>>>> We don't know the length of s->a but we do know that it can >>>>>> be up to 7 bytes long (assuming it's nul-terminated of course) >>>>>> so we know the sprintf call can overflow. Conversely, if >>>>>> the size of the destination is increased to 20 the sprintf >>>>>> call cannot overflow so the diagnostic can be avoided. >>>>>> >>>>>> Removing the array size heuristic would force us to either give >>>>>> up on diagnosing the first case or issue false positives for >>>>>> the second case. I think the second alternative would make >>>>>> the warning too noisy to be useful. >>>>>> >>>>>> The strlen pass will help detect buffer overflows in cases >>>>>> where the array size isn't known (e.g., with dynamically >>>>>> allocated buffers) but where the length of the string store >>>>>> in the array is known. It will also help avoid false positives >>>>>> in cases where the string stored in an array of known size is >>>>>> known to be too short to cause an overflow. For instance here: >>>>>> >>>>>> struct S { char a[8]; }; >>>>>> >>>>>> char d[8]; >>>>>> void f (struct S *s, int i) >>>>>> { >>>>>> if (strlen (s->a) < 4 && i >= 0 && i < 100) >>>>>> sprintf (d, "%s-%i", s->a, i); >>>>>> } >>>>> ACK. Thanks for explaining things here too. I can't speak for others, >>>>> but seeing examples along with the explanation is easier for me to absorb. >>>>> >>>>> For Bernd and others -- after kicking things around a bit with Martin, >>>>> what we're recommending is that compute_string_length we compute the >>>>> bounds using both GIMPLE and C semantics and return both. >>>> >>>> But you can't do this because GIMPLE did transforms that are not valid in >>>> C, thus you can't interpret the GIMPLE IL as "C", you can only interpret >>>> it as GIMPLE. What you'd do is return GIMPLE semantics length >>>> and "foobar" semantics length which doesn't match the original source. >>>> >>> >>> If I understood that suggestion right, it means, we >>> live with some false positive or missing warnings due to those transformations. >>> That means, get_range_strlen with the 2-parameter overload is used >>> for warnings only. And it returns most of the time a correct range info, >>> that is good enough for warnings. >> Correct. 99.9% of the time using the ranges implied by the array types >> is better for the warning code. So the idea is to return two ranges. >> One which uses GIMPLE semantics for code generation and optimization >> purposes, the other which uses ranges implied by the array types for >> warning purposes. >> >> Martin suggested that we always compute and return both rather than >> having a boolean argument to select between the behavior. >> >> > > Okay, but there is already the "strict" parameter: > > /* Determine the minimum and maximum value or string length that ARG > refers to and store each in the first two elements of MINMAXLEN. > For expressions that point to strings of unknown lengths that are > character arrays, use the upper bound of the array as the maximum > length. For example, given an expression like 'x ? array : "xyz"' > and array declared as 'char array[8]', MINMAXLEN[0] will be set > to 0 and MINMAXLEN[1] to 7, the longest string that could be > stored in array. > Return true if the range of the string lengths has been obtained > from the upper bound of an array at the end of a struct. Such > an array may hold a string that's longer than its upper bound > due to it being used as a poor-man's flexible array member. > > STRICT is true if it will handle PHIs and COND_EXPRs conservatively > and false if PHIs and COND_EXPRs are to be handled optimistically, > if we can determine string length minimum and maximum; it will use > the minimum from the ones where it can be determined. > STRICT false should be only used for warning code. > > ELTSIZE is 1 for normal single byte character strings, and 2 or > 4 for wide characer strings. ELTSIZE is by default 1. */ > > bool > get_range_strlen (tree arg, tree minmaxlen[2], unsigned eltsize, bool strict) > > > I found out, that there is only one call with strict = true: gimple_fold_builtin_strlen > and that is the only one that needs GIMPLE semantics. > because it does: > > if (minlen == maxlen) > { > lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL, > true, GSI_SAME_STMT); > replace_call_with_value (gsi, lenrange[0]); > return true; > } > > if (tree lhs = gimple_call_lhs (stmt)) > if (TREE_CODE (lhs) == SSA_NAME > && INTEGRAL_TYPE_P (TREE_TYPE (lhs))) > set_range_info (lhs, VR_RANGE, minlen, maxlen); > > > > All other places are dealing with warnings, and have no use for GIMPLE semantics. > > You do not suggest to have set_range_info to hanlde two sets of range info, right? The idea is to extend get_range_strlen() to take an array of three elements as MINMAXLEN: bool get_range_strlen (tree arg, tree minmaxlen[3], unsigned eltsize, bool strict) The first two elements would correspond to the strict setting used for optimization and the last one to the opposite for warnings. That way only one call would be needed by clients like sprintf that need both. That should also eliminate the need for the STRICT argument. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-20 20:42 ` Martin Sebor @ 2018-08-20 21:31 ` Bernd Edlinger 2018-08-21 2:43 ` Martin Sebor 0 siblings, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-20 21:31 UTC (permalink / raw) To: Martin Sebor, Jeff Law, Richard Biener Cc: GCC Patches, Richard Guenther, Jakub Jelinek On 08/20/18 22:42, Martin Sebor wrote: > On 08/20/2018 09:15 AM, Bernd Edlinger wrote: >> On 08/20/18 16:26, Jeff Law wrote: >>> On 08/20/2018 04:23 AM, Bernd Edlinger wrote: >>>> On 08/20/18 12:12, Richard Biener wrote: >>>>> On Wed, Aug 15, 2018 at 6:39 AM Jeff Law <law@redhat.com> wrote: >>>>>> >>>>>> On 08/10/2018 10:56 AM, Martin Sebor wrote: >>>>>>> On 08/08/2018 11:36 PM, Jeff Law wrote: >>>>>>>> On 08/02/2018 09:42 AM, Martin Sebor wrote: >>>>>>>> >>>>>>>>> The warning bits are definitely not okay by me. The purpose >>>>>>>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is >>>>>>>>> to detect buffer overflows. When a warning doesn't have access >>>>>>>>> to string length information for dynamically created strings >>>>>>>>> (like the strlen pass does) it uses array sizes as a proxy. >>>>>>>>> This is useful both to detect possible buffer overflows and >>>>>>>>> to prevent false positives for overflows that cannot happen >>>>>>>>> in correctly written programs. >>>>>>>> So how much of this falling-back to array sizes as a proxy would become >>>>>>>> unnecessary if sprintf had access to the strlen pass as an analysis >>>>>>>> module? >>>>>>>> >>>>>>>> As you know we've been kicking that around and from my investigations >>>>>>>> that doesn't really look hard to do. Encapsulate the data structures in >>>>>>>> a class, break up the statement handling into analysis and optimization >>>>>>>> and we should be good to go. I'm hoping to start prototyping this week. >>>>>>>> >>>>>>>> If we think that has a reasonable chance to eliminate the array-size >>>>>>>> fallback, then that seems like the most promising path forward. >>>>>>> >>>>>>> We discussed this idea this morning so let me respond here and >>>>>>> reiterate the answer. Using the strlen data will help detect >>>>>>> buffer overflow where the array size isn't available but it >>>>>>> cannot replace the array size heuristic. Here's a simple >>>>>>> example: >>>>>>> >>>>>>> struct S { char a[8]; }; >>>>>>> >>>>>>> char d[8]; >>>>>>> void f (struct S *s, int i) >>>>>>> { >>>>>>> sprintf (d, "%s-%i", s[i].a, i); >>>>>>> } >>>>>>> >>>>>>> We don't know the length of s->a but we do know that it can >>>>>>> be up to 7 bytes long (assuming it's nul-terminated of course) >>>>>>> so we know the sprintf call can overflow. Conversely, if >>>>>>> the size of the destination is increased to 20 the sprintf >>>>>>> call cannot overflow so the diagnostic can be avoided. >>>>>>> >>>>>>> Removing the array size heuristic would force us to either give >>>>>>> up on diagnosing the first case or issue false positives for >>>>>>> the second case. I think the second alternative would make >>>>>>> the warning too noisy to be useful. >>>>>>> >>>>>>> The strlen pass will help detect buffer overflows in cases >>>>>>> where the array size isn't known (e.g., with dynamically >>>>>>> allocated buffers) but where the length of the string store >>>>>>> in the array is known. It will also help avoid false positives >>>>>>> in cases where the string stored in an array of known size is >>>>>>> known to be too short to cause an overflow. For instance here: >>>>>>> >>>>>>> struct S { char a[8]; }; >>>>>>> >>>>>>> char d[8]; >>>>>>> void f (struct S *s, int i) >>>>>>> { >>>>>>> if (strlen (s->a) < 4 && i >= 0 && i < 100) >>>>>>> sprintf (d, "%s-%i", s->a, i); >>>>>>> } >>>>>> ACK. Thanks for explaining things here too. I can't speak for others, >>>>>> but seeing examples along with the explanation is easier for me to absorb. >>>>>> >>>>>> For Bernd and others -- after kicking things around a bit with Martin, >>>>>> what we're recommending is that compute_string_length we compute the >>>>>> bounds using both GIMPLE and C semantics and return both. >>>>> >>>>> But you can't do this because GIMPLE did transforms that are not valid in >>>>> C, thus you can't interpret the GIMPLE IL as "C", you can only interpret >>>>> it as GIMPLE. What you'd do is return GIMPLE semantics length >>>>> and "foobar" semantics length which doesn't match the original source. >>>>> >>>> >>>> If I understood that suggestion right, it means, we >>>> live with some false positive or missing warnings due to those transformations. >>>> That means, get_range_strlen with the 2-parameter overload is used >>>> for warnings only. And it returns most of the time a correct range info, >>>> that is good enough for warnings. >>> Correct. 99.9% of the time using the ranges implied by the array types >>> is better for the warning code. So the idea is to return two ranges. >>> One which uses GIMPLE semantics for code generation and optimization >>> purposes, the other which uses ranges implied by the array types for >>> warning purposes. >>> >>> Martin suggested that we always compute and return both rather than >>> having a boolean argument to select between the behavior. >>> >>> >> >> Okay, but there is already the "strict" parameter: >> >> /* Determine the minimum and maximum value or string length that ARG >> refers to and store each in the first two elements of MINMAXLEN. >> For expressions that point to strings of unknown lengths that are >> character arrays, use the upper bound of the array as the maximum >> length. For example, given an expression like 'x ? array : "xyz"' >> and array declared as 'char array[8]', MINMAXLEN[0] will be set >> to 0 and MINMAXLEN[1] to 7, the longest string that could be >> stored in array. >> Return true if the range of the string lengths has been obtained >> from the upper bound of an array at the end of a struct. Such >> an array may hold a string that's longer than its upper bound >> due to it being used as a poor-man's flexible array member. >> >> STRICT is true if it will handle PHIs and COND_EXPRs conservatively >> and false if PHIs and COND_EXPRs are to be handled optimistically, >> if we can determine string length minimum and maximum; it will use >> the minimum from the ones where it can be determined. >> STRICT false should be only used for warning code. >> >> ELTSIZE is 1 for normal single byte character strings, and 2 or >> 4 for wide characer strings. ELTSIZE is by default 1. */ >> >> bool >> get_range_strlen (tree arg, tree minmaxlen[2], unsigned eltsize, bool strict) >> >> >> I found out, that there is only one call with strict = true: gimple_fold_builtin_strlen >> and that is the only one that needs GIMPLE semantics. >> because it does: >> >> if (minlen == maxlen) >> { >> lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL, >> true, GSI_SAME_STMT); >> replace_call_with_value (gsi, lenrange[0]); >> return true; >> } >> >> if (tree lhs = gimple_call_lhs (stmt)) >> if (TREE_CODE (lhs) == SSA_NAME >> && INTEGRAL_TYPE_P (TREE_TYPE (lhs))) >> set_range_info (lhs, VR_RANGE, minlen, maxlen); >> >> >> >> All other places are dealing with warnings, and have no use for GIMPLE semantics. >> >> You do not suggest to have set_range_info to hanlde two sets of range info, right? > > The idea is to extend get_range_strlen() to take an array of three > elements as MINMAXLEN: > > bool > get_range_strlen (tree arg, tree minmaxlen[3], unsigned eltsize, > bool strict) > > The first two elements would correspond to the strict setting used > for optimization and the last one to the opposite for warnings. min/max strict + min/max warning = 4, isn't it? > That way only one call would be needed by clients like sprintf I must say that I don't like the return value optimization on the sprintf pass, because it uses knowledge of the glibc implementation of sprintf to do it's job (mind the 4K buffer limit). And I'd say compared to the actual sprintf library call the savings should be negligible when we ignore the result register. Or have you done a benchmark, to measure the savings? Bernd. > that need both. That should also eliminate the need for the STRICT > argument. > > Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-20 21:31 ` Bernd Edlinger @ 2018-08-21 2:43 ` Martin Sebor 2018-08-21 5:38 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-21 2:43 UTC (permalink / raw) To: Bernd Edlinger, Jeff Law, Richard Biener Cc: GCC Patches, Richard Guenther, Jakub Jelinek On 08/20/2018 03:31 PM, Bernd Edlinger wrote: > On 08/20/18 22:42, Martin Sebor wrote: >> On 08/20/2018 09:15 AM, Bernd Edlinger wrote: >>> On 08/20/18 16:26, Jeff Law wrote: >>>> On 08/20/2018 04:23 AM, Bernd Edlinger wrote: >>>>> On 08/20/18 12:12, Richard Biener wrote: >>>>>> On Wed, Aug 15, 2018 at 6:39 AM Jeff Law <law@redhat.com> wrote: >>>>>>> >>>>>>> On 08/10/2018 10:56 AM, Martin Sebor wrote: >>>>>>>> On 08/08/2018 11:36 PM, Jeff Law wrote: >>>>>>>>> On 08/02/2018 09:42 AM, Martin Sebor wrote: >>>>>>>>> >>>>>>>>>> The warning bits are definitely not okay by me. The purpose >>>>>>>>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is >>>>>>>>>> to detect buffer overflows. When a warning doesn't have access >>>>>>>>>> to string length information for dynamically created strings >>>>>>>>>> (like the strlen pass does) it uses array sizes as a proxy. >>>>>>>>>> This is useful both to detect possible buffer overflows and >>>>>>>>>> to prevent false positives for overflows that cannot happen >>>>>>>>>> in correctly written programs. >>>>>>>>> So how much of this falling-back to array sizes as a proxy would become >>>>>>>>> unnecessary if sprintf had access to the strlen pass as an analysis >>>>>>>>> module? >>>>>>>>> >>>>>>>>> As you know we've been kicking that around and from my investigations >>>>>>>>> that doesn't really look hard to do. Encapsulate the data structures in >>>>>>>>> a class, break up the statement handling into analysis and optimization >>>>>>>>> and we should be good to go. I'm hoping to start prototyping this week. >>>>>>>>> >>>>>>>>> If we think that has a reasonable chance to eliminate the array-size >>>>>>>>> fallback, then that seems like the most promising path forward. >>>>>>>> >>>>>>>> We discussed this idea this morning so let me respond here and >>>>>>>> reiterate the answer. Using the strlen data will help detect >>>>>>>> buffer overflow where the array size isn't available but it >>>>>>>> cannot replace the array size heuristic. Here's a simple >>>>>>>> example: >>>>>>>> >>>>>>>> struct S { char a[8]; }; >>>>>>>> >>>>>>>> char d[8]; >>>>>>>> void f (struct S *s, int i) >>>>>>>> { >>>>>>>> sprintf (d, "%s-%i", s[i].a, i); >>>>>>>> } >>>>>>>> >>>>>>>> We don't know the length of s->a but we do know that it can >>>>>>>> be up to 7 bytes long (assuming it's nul-terminated of course) >>>>>>>> so we know the sprintf call can overflow. Conversely, if >>>>>>>> the size of the destination is increased to 20 the sprintf >>>>>>>> call cannot overflow so the diagnostic can be avoided. >>>>>>>> >>>>>>>> Removing the array size heuristic would force us to either give >>>>>>>> up on diagnosing the first case or issue false positives for >>>>>>>> the second case. I think the second alternative would make >>>>>>>> the warning too noisy to be useful. >>>>>>>> >>>>>>>> The strlen pass will help detect buffer overflows in cases >>>>>>>> where the array size isn't known (e.g., with dynamically >>>>>>>> allocated buffers) but where the length of the string store >>>>>>>> in the array is known. It will also help avoid false positives >>>>>>>> in cases where the string stored in an array of known size is >>>>>>>> known to be too short to cause an overflow. For instance here: >>>>>>>> >>>>>>>> struct S { char a[8]; }; >>>>>>>> >>>>>>>> char d[8]; >>>>>>>> void f (struct S *s, int i) >>>>>>>> { >>>>>>>> if (strlen (s->a) < 4 && i >= 0 && i < 100) >>>>>>>> sprintf (d, "%s-%i", s->a, i); >>>>>>>> } >>>>>>> ACK. Thanks for explaining things here too. I can't speak for others, >>>>>>> but seeing examples along with the explanation is easier for me to absorb. >>>>>>> >>>>>>> For Bernd and others -- after kicking things around a bit with Martin, >>>>>>> what we're recommending is that compute_string_length we compute the >>>>>>> bounds using both GIMPLE and C semantics and return both. >>>>>> >>>>>> But you can't do this because GIMPLE did transforms that are not valid in >>>>>> C, thus you can't interpret the GIMPLE IL as "C", you can only interpret >>>>>> it as GIMPLE. What you'd do is return GIMPLE semantics length >>>>>> and "foobar" semantics length which doesn't match the original source. >>>>>> >>>>> >>>>> If I understood that suggestion right, it means, we >>>>> live with some false positive or missing warnings due to those transformations. >>>>> That means, get_range_strlen with the 2-parameter overload is used >>>>> for warnings only. And it returns most of the time a correct range info, >>>>> that is good enough for warnings. >>>> Correct. 99.9% of the time using the ranges implied by the array types >>>> is better for the warning code. So the idea is to return two ranges. >>>> One which uses GIMPLE semantics for code generation and optimization >>>> purposes, the other which uses ranges implied by the array types for >>>> warning purposes. >>>> >>>> Martin suggested that we always compute and return both rather than >>>> having a boolean argument to select between the behavior. >>>> >>>> >>> >>> Okay, but there is already the "strict" parameter: >>> >>> /* Determine the minimum and maximum value or string length that ARG >>> refers to and store each in the first two elements of MINMAXLEN. >>> For expressions that point to strings of unknown lengths that are >>> character arrays, use the upper bound of the array as the maximum >>> length. For example, given an expression like 'x ? array : "xyz"' >>> and array declared as 'char array[8]', MINMAXLEN[0] will be set >>> to 0 and MINMAXLEN[1] to 7, the longest string that could be >>> stored in array. >>> Return true if the range of the string lengths has been obtained >>> from the upper bound of an array at the end of a struct. Such >>> an array may hold a string that's longer than its upper bound >>> due to it being used as a poor-man's flexible array member. >>> >>> STRICT is true if it will handle PHIs and COND_EXPRs conservatively >>> and false if PHIs and COND_EXPRs are to be handled optimistically, >>> if we can determine string length minimum and maximum; it will use >>> the minimum from the ones where it can be determined. >>> STRICT false should be only used for warning code. >>> >>> ELTSIZE is 1 for normal single byte character strings, and 2 or >>> 4 for wide characer strings. ELTSIZE is by default 1. */ >>> >>> bool >>> get_range_strlen (tree arg, tree minmaxlen[2], unsigned eltsize, bool strict) >>> >>> >>> I found out, that there is only one call with strict = true: gimple_fold_builtin_strlen >>> and that is the only one that needs GIMPLE semantics. >>> because it does: >>> >>> if (minlen == maxlen) >>> { >>> lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL, >>> true, GSI_SAME_STMT); >>> replace_call_with_value (gsi, lenrange[0]); >>> return true; >>> } >>> >>> if (tree lhs = gimple_call_lhs (stmt)) >>> if (TREE_CODE (lhs) == SSA_NAME >>> && INTEGRAL_TYPE_P (TREE_TYPE (lhs))) >>> set_range_info (lhs, VR_RANGE, minlen, maxlen); >>> >>> >>> >>> All other places are dealing with warnings, and have no use for GIMPLE semantics. >>> >>> You do not suggest to have set_range_info to hanlde two sets of range info, right? >> >> The idea is to extend get_range_strlen() to take an array of three >> elements as MINMAXLEN: >> >> bool >> get_range_strlen (tree arg, tree minmaxlen[3], unsigned eltsize, >> bool strict) >> >> The first two elements would correspond to the strict setting used >> for optimization and the last one to the opposite for warnings. > > min/max strict + min/max warning = 4, isn't it? No, it's min/max strict + max warning. >> That way only one call would be needed by clients like sprintf > > I must say that I don't like the return value optimization > on the sprintf pass, because it uses knowledge of the glibc implementation > of sprintf to do it's job (mind the 4K buffer limit). Yet again, you don't know what you're talking about. The 4K limit is a C standard thing. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-21 2:43 ` Martin Sebor @ 2018-08-21 5:38 ` Bernd Edlinger 0 siblings, 0 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-08-21 5:38 UTC (permalink / raw) To: Martin Sebor, Jeff Law, Richard Biener Cc: GCC Patches, Richard Guenther, Jakub Jelinek On 08/21/18 04:43, Martin Sebor wrote: >> I must say that I don't like the return value optimization >> on the sprintf pass, because it uses knowledge of the glibc implementation >> of sprintf to do it's job (mind the 4K buffer limit). > > Yet again, you don't know what you're talking about. The 4K > limit is a C standard thing. > BTW: I say I don't like this or that and why. I did not say anything about you personally. You say something personal and generalizing about me as reply. That is superfluous, please stop that. Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-20 15:16 ` Bernd Edlinger 2018-08-20 20:42 ` Martin Sebor @ 2018-08-21 21:58 ` Jeff Law 1 sibling, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-21 21:58 UTC (permalink / raw) To: Bernd Edlinger, Richard Biener Cc: Martin Sebor, GCC Patches, Richard Guenther, Jakub Jelinek On 08/20/2018 09:15 AM, Bernd Edlinger wrote: > On 08/20/18 16:26, Jeff Law wrote: >> On 08/20/2018 04:23 AM, Bernd Edlinger wrote: >>> On 08/20/18 12:12, Richard Biener wrote: >>>> On Wed, Aug 15, 2018 at 6:39 AM Jeff Law <law@redhat.com> wrote: >>>>> >>>>> On 08/10/2018 10:56 AM, Martin Sebor wrote: >>>>>> On 08/08/2018 11:36 PM, Jeff Law wrote: >>>>>>> On 08/02/2018 09:42 AM, Martin Sebor wrote: >>>>>>> >>>>>>>> The warning bits are definitely not okay by me. The purpose >>>>>>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is >>>>>>>> to detect buffer overflows. When a warning doesn't have access >>>>>>>> to string length information for dynamically created strings >>>>>>>> (like the strlen pass does) it uses array sizes as a proxy. >>>>>>>> This is useful both to detect possible buffer overflows and >>>>>>>> to prevent false positives for overflows that cannot happen >>>>>>>> in correctly written programs. >>>>>>> So how much of this falling-back to array sizes as a proxy would become >>>>>>> unnecessary if sprintf had access to the strlen pass as an analysis >>>>>>> module? >>>>>>> >>>>>>> As you know we've been kicking that around and from my investigations >>>>>>> that doesn't really look hard to do. Encapsulate the data structures in >>>>>>> a class, break up the statement handling into analysis and optimization >>>>>>> and we should be good to go. I'm hoping to start prototyping this week. >>>>>>> >>>>>>> If we think that has a reasonable chance to eliminate the array-size >>>>>>> fallback, then that seems like the most promising path forward. >>>>>> >>>>>> We discussed this idea this morning so let me respond here and >>>>>> reiterate the answer. Using the strlen data will help detect >>>>>> buffer overflow where the array size isn't available but it >>>>>> cannot replace the array size heuristic. Here's a simple >>>>>> example: >>>>>> >>>>>> struct S { char a[8]; }; >>>>>> >>>>>> char d[8]; >>>>>> void f (struct S *s, int i) >>>>>> { >>>>>> sprintf (d, "%s-%i", s[i].a, i); >>>>>> } >>>>>> >>>>>> We don't know the length of s->a but we do know that it can >>>>>> be up to 7 bytes long (assuming it's nul-terminated of course) >>>>>> so we know the sprintf call can overflow. Conversely, if >>>>>> the size of the destination is increased to 20 the sprintf >>>>>> call cannot overflow so the diagnostic can be avoided. >>>>>> >>>>>> Removing the array size heuristic would force us to either give >>>>>> up on diagnosing the first case or issue false positives for >>>>>> the second case. I think the second alternative would make >>>>>> the warning too noisy to be useful. >>>>>> >>>>>> The strlen pass will help detect buffer overflows in cases >>>>>> where the array size isn't known (e.g., with dynamically >>>>>> allocated buffers) but where the length of the string store >>>>>> in the array is known. It will also help avoid false positives >>>>>> in cases where the string stored in an array of known size is >>>>>> known to be too short to cause an overflow. For instance here: >>>>>> >>>>>> struct S { char a[8]; }; >>>>>> >>>>>> char d[8]; >>>>>> void f (struct S *s, int i) >>>>>> { >>>>>> if (strlen (s->a) < 4 && i >= 0 && i < 100) >>>>>> sprintf (d, "%s-%i", s->a, i); >>>>>> } >>>>> ACK. Thanks for explaining things here too. I can't speak for others, >>>>> but seeing examples along with the explanation is easier for me to absorb. >>>>> >>>>> For Bernd and others -- after kicking things around a bit with Martin, >>>>> what we're recommending is that compute_string_length we compute the >>>>> bounds using both GIMPLE and C semantics and return both. >>>> >>>> But you can't do this because GIMPLE did transforms that are not valid in >>>> C, thus you can't interpret the GIMPLE IL as "C", you can only interpret >>>> it as GIMPLE. What you'd do is return GIMPLE semantics length >>>> and "foobar" semantics length which doesn't match the original source. >>>> >>> >>> If I understood that suggestion right, it means, we >>> live with some false positive or missing warnings due to those transformations. >>> That means, get_range_strlen with the 2-parameter overload is used >>> for warnings only. And it returns most of the time a correct range info, >>> that is good enough for warnings. >> Correct. 99.9% of the time using the ranges implied by the array types >> is better for the warning code. So the idea is to return two ranges. >> One which uses GIMPLE semantics for code generation and optimization >> purposes, the other which uses ranges implied by the array types for >> warning purposes. >> >> Martin suggested that we always compute and return both rather than >> having a boolean argument to select between the behavior. >> >> > > Okay, but there is already the "strict" parameter: [ ... ] Sorry, I misspoke. There's 3 ranges to return. I believe Martin clarified this in a subsequent message. Sorry for any confusion. jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-02 3:13 ` Martin Sebor 2018-08-02 10:22 ` Bernd Edlinger @ 2018-08-03 7:47 ` Jeff Law 1 sibling, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-03 7:47 UTC (permalink / raw) To: Martin Sebor, Richard Biener; +Cc: Jakub Jelinek, Bernd Edlinger, GCC Patches On 08/01/2018 09:13 PM, Martin Sebor wrote: > On 08/01/2018 01:19 AM, Richard Biener wrote: >> On Tue, 31 Jul 2018, Martin Sebor wrote: >> >>> On 07/31/2018 09:48 AM, Jakub Jelinek wrote: >>>> On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote: >>>>> On 07/31/2018 12:38 AM, Jakub Jelinek wrote: >>>>>> On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote: >>>>>>> Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past >>>>>>> the end of subobjects by string functions. With _FORTIFY_SOURCE=2 >>>>>>> it calls abort. This is the default on popular distributions, >>>>>> >>>>>> Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the >>>>>> standard >>>>>> requires, imposes extra requirements. So from what this mode >>>>>> accepts or >>>>>> rejects we shouldn't determine what is or isn't considered valid. >>>>> >>>>> I'm not sure what the additional requirements are but the ones >>>>> I am referring to are the enforcing of struct member boundaries. >>>>> This is in line with the standard requirements of not accessing >>>>> [sub]objects via pointers derived from other [sub]objects. >>>> >>>> In the middle-end the distinction between what was originally a >>>> reference >>>> to subobjects and what was a reference to objects is quickly lost >>>> (whether through SCCVN or other optimizations). >>>> We've run into this many times with the __builtin_object_size already. >>>> So, if e.g. >>>> struct S { char a[3]; char b[5]; } s = { "abc", "defg" }; >>>> ... >>>> strlen ((char *) &s) is well defined but >>>> strlen (s.a) is not in C, for the middle-end you might not figure >>>> out which >>>> one is which. >>> >>> Yes, I'm aware of the middle-end transformation to MEM_REF >>> -- it's one of the reasons why detecting invalid accesses >>> by the middle end warnings, including -Warray-bounds, >>> -Wformat-overflow, -Wsprintf-overflow, and even -Wrestrict, >>> is less than perfect. >>> >>> But is strlen(s.a) also meant to be well-defined in the middle >>> end (with the semantics of computing the length or "abcdefg"?) >> >> Yes. >> >>> And if so, what makes it well defined? >> >> The fact that strlen takes a char * argument and thus inline-expansion >> of a trivial implementation like >> >>  int len = 0; >>  for (; *p; ++p) >>   ++len; >> >> will have >> >>  p = &s.a; >> >> and the middle-end doesn't reconstruct s.a[..] from the pointer >> access. >> >>> >>> Certainly not every "strlen" has these semantics. For example, >>> this open-coded one doesn't: >>> >>>  int len = 0; >>>  for (int i = 0; s.a[i]; ++i) >>>    ++len; >>> >>> It computes 2 (with no warning for the out-of-bounds access). >> >> Yes. > > If that's not a problem then why is it one when strlen() does > the same thing? Presumably the answer is: "because here > the access is via array indexing and in strlen via pointer > dereferences." (But in C there is no difference between > the two. Also see below.) But the semantics within GCC aren't necessarily C, they are GIMPLE. While they are usually very similar, they can differ. That also means that if array indexing gets turned into pointer dereferences we lose semantic information. > I have seen and I think shown in this discussion examples > where this is not so. For instance: > >  struct S { char a[1], b[1]; }; > >  void f (struct S *s, int i) >  { >    char *p = &s->a[i]; >    char *q = &s->b[0]; > >    char x = *p; >    *q = 11; > >    if (x != *p)           // folded to false >      __builtin_abort ();  // eliminated >  } > > Is this a bug? (I hope not.) What I keep coming back to is what is the type of the object we pass to strlen in GIMPLE. If it is a char array, then we can use the array bounds to construct bounds for the result. If it is a char *, then we can not. This example is really looking at the aliasing model (which is another place where the semantics may not match precisely). But I don't think you can equate what happens in the aliasing model of GIMPLE with the strlen case. They're apples and oranges. > >>> If we can't then the only language we have in common with users >>> is the standard. (This, by the way, is what the C memory model >>> group is trying to address -- the language or feature that's >>> missing from the standard that says when, if ever, these things >>> might be valid.) >> >> Well, you simply have to not compare apples and oranges, >> a strlen implementation that isn't a strlen implementation >> and strlen. > > As I'm sure you know, the C standard doesn't differentiate > between the semantics of array subscript expressions and > pointer dereferencing. They both mean the same thing. > (Nothing prevents an implementation from defining strlen > as a macro that expands into a loop using array indices > for array arguments.) But this is an area were the C standard and GIMPLE differ. It's easy to miss (as I did in the initial review). But that's the way it is. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-01 7:19 ` Richard Biener 2018-08-01 8:40 ` Jakub Jelinek 2018-08-02 3:13 ` Martin Sebor @ 2018-08-03 7:38 ` Jeff Law 2 siblings, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-03 7:38 UTC (permalink / raw) To: Richard Biener, Martin Sebor; +Cc: Jakub Jelinek, Bernd Edlinger, GCC Patches On 08/01/2018 01:19 AM, Richard Biener wrote: > On Tue, 31 Jul 2018, Martin Sebor wrote: > >> On 07/31/2018 09:48 AM, Jakub Jelinek wrote: >>> On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote: >>>> On 07/31/2018 12:38 AM, Jakub Jelinek wrote: >>>>> On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote: >>>>>> Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past >>>>>> the end of subobjects by string functions. With _FORTIFY_SOURCE=2 >>>>>> it calls abort. This is the default on popular distributions, >>>>> >>>>> Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the >>>>> standard >>>>> requires, imposes extra requirements. So from what this mode accepts or >>>>> rejects we shouldn't determine what is or isn't considered valid. >>>> >>>> I'm not sure what the additional requirements are but the ones >>>> I am referring to are the enforcing of struct member boundaries. >>>> This is in line with the standard requirements of not accessing >>>> [sub]objects via pointers derived from other [sub]objects. >>> >>> In the middle-end the distinction between what was originally a reference >>> to subobjects and what was a reference to objects is quickly lost >>> (whether through SCCVN or other optimizations). >>> We've run into this many times with the __builtin_object_size already. >>> So, if e.g. >>> struct S { char a[3]; char b[5]; } s = { "abc", "defg" }; >>> ... >>> strlen ((char *) &s) is well defined but >>> strlen (s.a) is not in C, for the middle-end you might not figure out which >>> one is which. >> >> Yes, I'm aware of the middle-end transformation to MEM_REF >> -- it's one of the reasons why detecting invalid accesses >> by the middle end warnings, including -Warray-bounds, >> -Wformat-overflow, -Wsprintf-overflow, and even -Wrestrict, >> is less than perfect. >> >> But is strlen(s.a) also meant to be well-defined in the middle >> end (with the semantics of computing the length or "abcdefg"?) > > Yes. > >> And if so, what makes it well defined? > > The fact that strlen takes a char * argument and thus inline-expansion > of a trivial implementation like [ ... ] And ISTM again the key here is the type of the object that actually gets passed to strlen at the gimple level. If it's a char *, then the type does not constrain the return value in any way shape or form. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-27 6:49 ` Bernd Edlinger 2018-07-31 3:45 ` Martin Sebor @ 2018-08-06 15:34 ` Jeff Law 1 sibling, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-06 15:34 UTC (permalink / raw) To: Bernd Edlinger, Martin Sebor, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 07/27/2018 12:48 AM, Bernd Edlinger wrote: > I have one more example similar to PR86259, that resembles IMHO real world code: > > Consider the following: > > > int fun (char *p) > { > char buf[16]; > > assert(strlen(p) < 4); //here: security relevant check > > sprintf(buf, "echo %s - %s", p, p); //here: security relevant code > return system(buf); > } > > > What is wrong with the assertion? > > Nothing, except it is removed, when this function is called from untrusted code: > > untrused_fun () > { > char b[2] = "ab"; > fun(b); > } > > !!!! don't try to execute that: after "ab" there can be "; rm -rF / ;" on your stack!!!! But this code is fundamentally broken and catering to this kind of crap is well, dumb. At the point where we call strlen we've invoked undefined behavior. These aren't security checks in my mind, they're bandaids for idiot code and are not suitable justification for making any changes for how we generate code in GCC. You could use them as an argument for improving warnings though. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-24 23:18 ` Bernd Edlinger ` (2 preceding siblings ...) 2018-07-25 17:31 ` Martin Sebor @ 2018-08-03 7:00 ` Jeff Law 2018-08-04 21:56 ` Martin Sebor 2018-08-09 5:26 ` Jeff Law 4 siblings, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-08-03 7:00 UTC (permalink / raw) To: Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Martin Sebor On 07/24/2018 05:18 PM, Bernd Edlinger wrote: > On 07/24/18 23:46, Jeff Law wrote: >> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>> Hi! >>> >>> This patch makes strlen range computations more conservative. >>> >>> Firstly if there is a visible type cast from type A to B before passing >>> then value to strlen, don't expect the type layout of B to restrict the >>> possible return value range of strlen. >> Why do you think this is the right thing to do? ie, is there language >> in the standards that makes you think the code as it stands today is >> incorrect from a conformance standpoint? Is there a significant body of >> code that is affected in an adverse way by the current code? If so, >> what code? >> >> > > I think if you have an object, of an effective type A say char[100], then > you can cast the address of A to B, say typedef char (*B)[2] for instance > and then to const char *, say for use in strlen. I may be wrong, but I think > that we should at least try not to pick up char[2] from B, but instead > use A for strlen ranges, or leave this range open. Currently the range > info for strlen is [0..1] in this case, even if we see the type cast > in the generic tree. ISTM that you're essentially saying that the cast to const char * destroys any type information we can exploit here. But if that's the case, then I don't think we can even derive a range of [0,99]. What's to say that "A" didn't result from a similar cast of some object that was char[200] that happened out of the scope of what we could see during the strlen range computation? If that is what you're arguing, then I think there's a re-evaluation that needs to happen WRT strlen range computation/ And just to be clear, I do see this as a significant correctness question. Martin, thoughts? Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-03 7:00 ` Jeff Law @ 2018-08-04 21:56 ` Martin Sebor 2018-08-05 6:08 ` Bernd Edlinger 2018-08-06 15:12 ` Jeff Law 0 siblings, 2 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-04 21:56 UTC (permalink / raw) To: Jeff Law, Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 08/03/2018 01:00 AM, Jeff Law wrote: > On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >> On 07/24/18 23:46, Jeff Law wrote: >>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>> Hi! >>>> >>>> This patch makes strlen range computations more conservative. >>>> >>>> Firstly if there is a visible type cast from type A to B before passing >>>> then value to strlen, don't expect the type layout of B to restrict the >>>> possible return value range of strlen. >>> Why do you think this is the right thing to do? ie, is there language >>> in the standards that makes you think the code as it stands today is >>> incorrect from a conformance standpoint? Is there a significant body of >>> code that is affected in an adverse way by the current code? If so, >>> what code? >>> >>> >> >> I think if you have an object, of an effective type A say char[100], then >> you can cast the address of A to B, say typedef char (*B)[2] for instance >> and then to const char *, say for use in strlen. I may be wrong, but I think >> that we should at least try not to pick up char[2] from B, but instead >> use A for strlen ranges, or leave this range open. Currently the range >> info for strlen is [0..1] in this case, even if we see the type cast >> in the generic tree. > ISTM that you're essentially saying that the cast to const char * > destroys any type information we can exploit here. But if that's the > case, then I don't think we can even derive a range of [0,99]. What's > to say that "A" didn't result from a similar cast of some object that > was char[200] that happened out of the scope of what we could see during > the strlen range computation? > > If that is what you're arguing, then I think there's a re-evaluation > that needs to happen WRT strlen range computation/ > > And just to be clear, I do see this as a significant correctness question. > > Martin, thoughts? The argument is that given: struct S { char a[4], b; }; char a[8] = "1234567"; this is valid and should pass: __attribute__ ((noipa)) void f (struct S *p) { assert (7 == strlen (p->a)); } int main (void) { f ((struct S*)a); } (This is the basic premise behind pr86259.) This argument is wrong and the code is invalid. For the access to p->a to be valid p must point to an object of struct S (it doesn't) and the p->a array must hold a nul-terminated string (it also doesn't). This should not be surprising because the following equivalent code behaves the same way: __attribute__ ((noipa)) void f (struct S *p) { int n = 0; for (int i = 0; p->a[i]; ++i) ++n; if (3 != n) __builtin_abort (); } and also because for write accesses, GCC (helpfully) enforces the restriction with _FORTIFY_SOURCE=2: __attribute__ ((noipa)) void f (struct S *p) { strcpy (p->a, "1234"); // aborts } I care less about the optimization than I do about the basic premise that it's essential to respect subobject boundaries(*). It would make little sense to undo the strlen optimization without also undoing the optimization for the direct array access case. Undoing either would raise the question about the validity of the _FORRTIFY_SOURCE=2 behavior. That would be a huge step backwards in terms of code security. If we did some of these but not others, it would make the behavior inconsistent and surprising, all to accommodate one instance of invalid code. If we had a valid test case where the strlen optimization leads to invalid code, or even if there were a significant number of bug reports showing that it breaks an invalid but common idiom, I would certainly feel compelled to make it right somehow. But there has been just one bug report with clearly invalid code that should be easily corrected. Martin [*] I also care deeply about all the warnings that depend on it to avoid false positives: that's pretty much all those I have implemented in the middle-end: -Wformat-{overflow, truncation}, -Wstringop-{overflow,truncation}, and likely even -Wrestrict. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-04 21:56 ` Martin Sebor @ 2018-08-05 6:08 ` Bernd Edlinger 2018-08-05 15:58 ` Jeff Law 2018-08-06 15:12 ` Jeff Law 1 sibling, 1 reply; 121+ messages in thread From: Bernd Edlinger @ 2018-08-05 6:08 UTC (permalink / raw) To: Martin Sebor, Jeff Law, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 08/04/18 23:56, Martin Sebor wrote: > On 08/03/2018 01:00 AM, Jeff Law wrote: >> On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >>> On 07/24/18 23:46, Jeff Law wrote: >>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>>> Hi! >>>>> >>>>> This patch makes strlen range computations more conservative. >>>>> >>>>> Firstly if there is a visible type cast from type A to B before passing >>>>> then value to strlen, don't expect the type layout of B to restrict the >>>>> possible return value range of strlen. >>>> Why do you think this is the right thing to do? ie, is there language >>>> in the standards that makes you think the code as it stands today is >>>> incorrect from a conformance standpoint? Is there a significant body of >>>> code that is affected in an adverse way by the current code? If so, >>>> what code? >>>> >>>> >>> >>> I think if you have an object, of an effective type A say char[100], then >>> you can cast the address of A to B, say typedef char (*B)[2] for instance >>> and then to const char *, say for use in strlen. I may be wrong, but I think >>> that we should at least try not to pick up char[2] from B, but instead >>> use A for strlen ranges, or leave this range open. Currently the range >>> info for strlen is [0..1] in this case, even if we see the type cast >>> in the generic tree. >> ISTM that you're essentially saying that the cast to const char * >> destroys any type information we can exploit here. But if that's the >> case, then I don't think we can even derive a range of [0,99]. What's >> to say that "A" didn't result from a similar cast of some object that >> was char[200] that happened out of the scope of what we could see during >> the strlen range computation? >> >> If that is what you're arguing, then I think there's a re-evaluation >> that needs to happen WRT strlen range computation/ >> >> And just to be clear, I do see this as a significant correctness question. >> >> Martin, thoughts? > > The argument is that given: > > struct S { char a[4], b; }; > > char a[8] = "1234567"; > > this is valid and should pass: > > __attribute__ ((noipa)) > void f (struct S *p) > { > assert (7 == strlen (p->a)); > } > > int main (void) > { > f ((struct S*)a); > } > > (This is the basic premise behind pr86259.) > > This argument is wrong and the code is invalid. For the access > to p->a to be valid p must point to an object of struct S (it > doesn't) and the p->a array must hold a nul-terminated string > (it also doesn't). > > This should not be surprising because the following equivalent > code behaves the same way: > > __attribute__ ((noipa)) > void f (struct S *p) > { > int n = 0; > for (int i = 0; p->a[i]; ++i) > ++n; > if (3 != n) > __builtin_abort (); > } > > and also because for write accesses, GCC (helpfully) enforces > the restriction with _FORTIFY_SOURCE=2: > > __attribute__ ((noipa)) > void f (struct S *p) > { > strcpy (p->a, "1234"); // aborts > } > > I care less about the optimization than I do about the basic > premise that it's essential to respect subobject boundaries(*). > It would make little sense to undo the strlen optimization > without also undoing the optimization for the direct array > access case. Undoing either would raise the question about > the validity of the _FORRTIFY_SOURCE=2 behavior. That would > be a huge step backwards in terms of code security. If we > did some of these but not others, it would make the behavior > inconsistent and surprising, all to accommodate one instance > of invalid code. > > If we had a valid test case where the strlen optimization > leads to invalid code, or even if there were a significant > number of bug reports showing that it breaks an invalid > but common idiom, I would certainly feel compelled to > make it right somehow. But there has been just one bug > report with clearly invalid code that should be easily > corrected. > I see this from a software engineering POV. If we have code like this: void test (const char *x) { assert (strlen (x) < 10); } One would usually expect the program to abort (or at least abort with a near 100% likelihood) if x is not a valid string with length less than 10. But if lto and other optimizations show that this code is invoked with an invalid, non-zero terminated string the assertion is suddenly gone. Martin, why do you insist that GCC has to do this, and that it must be a good idea to do so, just based on the language definition? Why do we need assertions at all, when all programs have to be completely correct before we may compile them? > Martin > > [*] I also care deeply about all the warnings that depend > on it to avoid false positives: that's pretty much all those > I have implemented in the middle-end: -Wformat-{overflow, > truncation}, -Wstringop-{overflow,truncation}, and likely > even -Wrestrict. I do as well, but a false positive will not cause real damage. As I said before with optimistic range infos for warnings, we can reach both goals at the same time. Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-05 6:08 ` Bernd Edlinger @ 2018-08-05 15:58 ` Jeff Law 2018-08-06 11:57 ` Bernd Edlinger 0 siblings, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-08-05 15:58 UTC (permalink / raw) To: Bernd Edlinger, Martin Sebor, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 08/05/2018 12:08 AM, Bernd Edlinger wrote: > On 08/04/18 23:56, Martin Sebor wrote: >> On 08/03/2018 01:00 AM, Jeff Law wrote: >>> On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >>>> On 07/24/18 23:46, Jeff Law wrote: >>>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>>>> Hi! >>>>>> >>>>>> This patch makes strlen range computations more conservative. >>>>>> >>>>>> Firstly if there is a visible type cast from type A to B before passing >>>>>> then value to strlen, don't expect the type layout of B to restrict the >>>>>> possible return value range of strlen. >>>>> Why do you think this is the right thing to do? ie, is there language >>>>> in the standards that makes you think the code as it stands today is >>>>> incorrect from a conformance standpoint? Is there a significant body of >>>>> code that is affected in an adverse way by the current code? If so, >>>>> what code? >>>>> >>>>> >>>> >>>> I think if you have an object, of an effective type A say char[100], then >>>> you can cast the address of A to B, say typedef char (*B)[2] for instance >>>> and then to const char *, say for use in strlen. I may be wrong, but I think >>>> that we should at least try not to pick up char[2] from B, but instead >>>> use A for strlen ranges, or leave this range open. Currently the range >>>> info for strlen is [0..1] in this case, even if we see the type cast >>>> in the generic tree. >>> ISTM that you're essentially saying that the cast to const char * >>> destroys any type information we can exploit here. But if that's the >>> case, then I don't think we can even derive a range of [0,99]. What's >>> to say that "A" didn't result from a similar cast of some object that >>> was char[200] that happened out of the scope of what we could see during >>> the strlen range computation? >>> >>> If that is what you're arguing, then I think there's a re-evaluation >>> that needs to happen WRT strlen range computation/ >>> >>> And just to be clear, I do see this as a significant correctness question. >>> >>> Martin, thoughts? >> >> The argument is that given: >> >>  struct S { char a[4], b; }; >> >>  char a[8] = "1234567"; >> >> this is valid and should pass: >> >>  __attribute__ ((noipa)) >>  void f (struct S *p) >>  { >>    assert (7 == strlen (p->a)); >>  } >> >>  int main (void) >>  { >>    f ((struct S*)a); >>  } >> >> (This is the basic premise behind pr86259.) >> >> This argument is wrong and the code is invalid. For the access >> to p->a to be valid p must point to an object of struct S (it >> doesn't) and the p->a array must hold a nul-terminated string >> (it also doesn't). >> >> This should not be surprising because the following equivalent >> code behaves the same way: >> >>  __attribute__ ((noipa)) >>  void f (struct S *p) >>  { >>    int n = 0; >>    for (int i = 0; p->a[i]; ++i) >>      ++n; >>    if (3 != n) >>      __builtin_abort (); >>  } >> >> and also because for write accesses, GCC (helpfully) enforces >> the restriction with _FORTIFY_SOURCE=2: >> >>  __attribute__ ((noipa)) >>  void f (struct S *p) >>  { >>    strcpy (p->a, "1234");  // aborts >>  } >> >> I care less about the optimization than I do about the basic >> premise that it's essential to respect subobject boundaries(*). >> It would make little sense to undo the strlen optimization >> without also undoing the optimization for the direct array >> access case. Undoing either would raise the question about >> the validity of the _FORRTIFY_SOURCE=2 behavior. That would >> be a huge step backwards in terms of code security. If we >> did some of these but not others, it would make the behavior >> inconsistent and surprising, all to accommodate one instance >> of invalid code. >> >> If we had a valid test case where the strlen optimization >> leads to invalid code, or even if there were a significant >> number of bug reports showing that it breaks an invalid >> but common idiom, I would certainly feel compelled to >> make it right somehow. But there has been just one bug >> report with clearly invalid code that should be easily >> corrected. >> > > > I see this from a software engineering POV. > > If we have code like this: > > void test (const char *x) > { > assert (strlen (x) < 10); > } > > One would usually expect the program to abort (or at least abort with > a near 100% likelihood) if x is not a valid string with length less > than 10. I would not expect the program to abort if this function were called with an invalid (ie unterminated) string. In that scenario I know enough to not expect anything because my program is broken, plain and simple. > > But if lto and other optimizations show that this code is invoked with > an invalid, non-zero terminated string the assertion is suddenly gone. And the program is invalid as it exhibits undefined behavior. One undefined behavior is exhibited I have no expectations. I *like* when we do something like trap as soon as undefined behavior is discovered, but I do not expect it. > > Martin, why do you insist that GCC has to do this, and that it must > be a good idea to do so, just based on the language definition? We do this all the time in all kinds of situations. The language definition provides the contract that the programmer and compiler must adhere to. When the contract is broken we can not be responsible for the resulting code as the input source is simply broken. > > Why do we need assertions at all, when all programs have to be completely > correct before we may compile them? That's what compilers do. When they see code that is pointless it gets removed. As engineers we sometimes say "while this is undefined behavior we're not going to exploit the undefined-ness". That is based on our experience as developers. And sometimes we may not even agree where the line between optimize this vs do not because it's going to surprise a developer. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-05 15:58 ` Jeff Law @ 2018-08-06 11:57 ` Bernd Edlinger 0 siblings, 0 replies; 121+ messages in thread From: Bernd Edlinger @ 2018-08-06 11:57 UTC (permalink / raw) To: Jeff Law, Martin Sebor, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 08/05/18 17:58, Jeff Law wrote: > On 08/05/2018 12:08 AM, Bernd Edlinger wrote: >> I see this from a software engineering POV. >> >> If we have code like this: >> >> void test (const char *x) >> { >> assert (strlen (x) < 10); >> } >> >> One would usually expect the program to abort (or at least abort with >> a near 100% likelihood) if x is not a valid string with length less >> than 10. > I would not expect the program to abort if this function were called > with an invalid (ie unterminated) string. In that scenario I know > enough to not expect anything because my program is broken, plain and > simple. > >> >> But if lto and other optimizations show that this code is invoked with >> an invalid, non-zero terminated string the assertion is suddenly gone. > And the program is invalid as it exhibits undefined behavior. One > undefined behavior is exhibited I have no expectations. I *like* when > we do something like trap as soon as undefined behavior is discovered, > but I do not expect it. > >> >> Martin, why do you insist that GCC has to do this, and that it must >> be a good idea to do so, just based on the language definition? > We do this all the time in all kinds of situations. > > The language definition provides the contract that the programmer and > compiler must adhere to. When the contract is broken we can not be > responsible for the resulting code as the input source is simply broken. > >> >> Why do we need assertions at all, when all programs have to be completely >> correct before we may compile them? > That's what compilers do. When they see code that is pointless it gets > removed. As engineers we sometimes say "while this is undefined > behavior we're not going to exploit the undefined-ness". That is based > on our experience as developers. And sometimes we may not even agree > where the line between optimize this vs do not because it's going to > surprise a developer. > Sorry, to re-iterate on this topic, but maybe this can give some insights. I'd say, my concern would be resolved if strlen range information would not be more aggressive than the well-known loop-niter optimization. What I mean, if strlen would be defined like an inline macro, the assertion does not get optimized, even if the function is later in-lined in a totally different context: #define strlen(x) ({ int _l; for (_l=0; (x)[_l]; _l++); _l;}) void test (const char *x) { assert (strlen (x) < 10); } But it would get optimized, when x is declared locally. The rationale might be, that the type of object where x points to is unknown at where the assertion is vs. the type of object is statically known at where the assertion is. What is the reason for this, and what would be necessary to get the exact same behavior with the string range info? Thanks Bernd. ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-04 21:56 ` Martin Sebor 2018-08-05 6:08 ` Bernd Edlinger @ 2018-08-06 15:12 ` Jeff Law 2018-08-06 16:32 ` Martin Sebor 1 sibling, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-08-06 15:12 UTC (permalink / raw) To: Martin Sebor, Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 08/04/2018 03:56 PM, Martin Sebor wrote: > On 08/03/2018 01:00 AM, Jeff Law wrote: >> On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >>> On 07/24/18 23:46, Jeff Law wrote: >>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>>> Hi! >>>>> >>>>> This patch makes strlen range computations more conservative. >>>>> >>>>> Firstly if there is a visible type cast from type A to B before >>>>> passing >>>>> then value to strlen, don't expect the type layout of B to restrict >>>>> the >>>>> possible return value range of strlen. >>>> Why do you think this is the right thing to do? ie, is there language >>>> in the standards that makes you think the code as it stands today is >>>> incorrect from a conformance standpoint? Is there a significant >>>> body of >>>> code that is affected in an adverse way by the current code? If so, >>>> what code? >>>> >>>> >>> >>> I think if you have an object, of an effective type A say char[100], >>> then >>> you can cast the address of A to B, say typedef char (*B)[2] for >>> instance >>> and then to const char *, say for use in strlen. I may be wrong, but >>> I think >>> that we should at least try not to pick up char[2] from B, but instead >>> use A for strlen ranges, or leave this range open. Currently the range >>> info for strlen is [0..1] in this case, even if we see the type cast >>> in the generic tree. >> ISTM that you're essentially saying that the cast to const char * >> destroys any type information we can exploit here. But if that's the >> case, then I don't think we can even derive a range of [0,99]. What's >> to say that "A" didn't result from a similar cast of some object that >> was char[200] that happened out of the scope of what we could see during >> the strlen range computation? >> >> If that is what you're arguing, then I think there's a re-evaluation >> that needs to happen WRT strlen range computation/ >> >> And just to be clear, I do see this as a significant correctness >> question. >> >> Martin, thoughts? > > The argument is that given: > >  struct S { char a[4], b; }; > >  char a[8] = "1234567"; > > this is valid and should pass: > >  __attribute__ ((noipa)) >  void f (struct S *p) >  { >    assert (7 == strlen (p->a)); >  } > >  int main (void) >  { >    f ((struct S*)a); >  } > > (This is the basic premise behind pr86259.) > > This argument is wrong and the code is invalid. For the access > to p->a to be valid p must point to an object of struct S (it > doesn't) and the p->a array must hold a nul-terminated string > (it also doesn't). I agree with you for C/C++, but I think it's been shown elsewhere in this thread that GIMPLE semantics to not respect the subobject boundaries. That's a sad reality. [ ... ] > > I care less about the optimization than I do about the basic > premise that it's essential to respect subobject boundaries(*). I understand, but the semantics of GIMPLE do not respect them. We can argue about whether or not those should change and what it would take to fix that. But right now the existing semantics do not respect those boundaries. > It would make little sense to undo the strlen optimization > without also undoing the optimization for the direct array > access case. Undoing either would raise the question about > the validity of the _FORRTIFY_SOURCE=2 behavior. That would > be a huge step backwards in terms of code security. If we > did some of these but not others, it would make the behavior > inconsistent and surprising, all to accommodate one instance > of invalid code. In the direct array access case I think (and I'm sure Jakub, Richi and others will correct me if I'm wrong), we can use the object's type because the dereferences are actually using the array's type. > > If we had a valid test case where the strlen optimization > leads to invalid code, or even if there were a significant > number of bug reports showing that it breaks an invalid > but common idiom, I would certainly feel compelled to > make it right somehow. But there has been just one bug > report with clearly invalid code that should be easily > corrected. Again, I think you're too narrowly focused on C/C++ semantics here. What matters are the semantics in GIMPLE. > > Martin > > [*] I also care deeply about all the warnings that depend > on it to avoid false positives: that's pretty much all those > I have implemented in the middle-end: -Wformat-{overflow, > truncation}, -Wstringop-{overflow,truncation}, and likely > even -Wrestrict. I know. And I care deeply about your work to improve the preciseness of the warnings and ultimately improve the overall quality of code compiled with GCC. I just think we can't currently rely on the semantics you want to exploit to improve the precision of string lengths. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-06 15:12 ` Jeff Law @ 2018-08-06 16:32 ` Martin Sebor 2018-08-06 17:44 ` Richard Biener 2018-08-06 22:48 ` Jeff Law 0 siblings, 2 replies; 121+ messages in thread From: Martin Sebor @ 2018-08-06 16:32 UTC (permalink / raw) To: Jeff Law, Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 08/06/2018 09:12 AM, Jeff Law wrote: > On 08/04/2018 03:56 PM, Martin Sebor wrote: >> On 08/03/2018 01:00 AM, Jeff Law wrote: >>> On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >>>> On 07/24/18 23:46, Jeff Law wrote: >>>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>>>> Hi! >>>>>> >>>>>> This patch makes strlen range computations more conservative. >>>>>> >>>>>> Firstly if there is a visible type cast from type A to B before >>>>>> passing >>>>>> then value to strlen, don't expect the type layout of B to restrict >>>>>> the >>>>>> possible return value range of strlen. >>>>> Why do you think this is the right thing to do? ie, is there language >>>>> in the standards that makes you think the code as it stands today is >>>>> incorrect from a conformance standpoint? Is there a significant >>>>> body of >>>>> code that is affected in an adverse way by the current code? If so, >>>>> what code? >>>>> >>>>> >>>> >>>> I think if you have an object, of an effective type A say char[100], >>>> then >>>> you can cast the address of A to B, say typedef char (*B)[2] for >>>> instance >>>> and then to const char *, say for use in strlen. I may be wrong, but >>>> I think >>>> that we should at least try not to pick up char[2] from B, but instead >>>> use A for strlen ranges, or leave this range open. Currently the range >>>> info for strlen is [0..1] in this case, even if we see the type cast >>>> in the generic tree. >>> ISTM that you're essentially saying that the cast to const char * >>> destroys any type information we can exploit here. But if that's the >>> case, then I don't think we can even derive a range of [0,99]. What's >>> to say that "A" didn't result from a similar cast of some object that >>> was char[200] that happened out of the scope of what we could see during >>> the strlen range computation? >>> >>> If that is what you're arguing, then I think there's a re-evaluation >>> that needs to happen WRT strlen range computation/ >>> >>> And just to be clear, I do see this as a significant correctness >>> question. >>> >>> Martin, thoughts? >> >> The argument is that given: >> >> struct S { char a[4], b; }; >> >> char a[8] = "1234567"; >> >> this is valid and should pass: >> >> __attribute__ ((noipa)) >> void f (struct S *p) >> { >> assert (7 == strlen (p->a)); >> } >> >> int main (void) >> { >> f ((struct S*)a); >> } >> >> (This is the basic premise behind pr86259.) >> >> This argument is wrong and the code is invalid. For the access >> to p->a to be valid p must point to an object of struct S (it >> doesn't) and the p->a array must hold a nul-terminated string >> (it also doesn't). > I agree with you for C/C++, but I think it's been shown elsewhere in > this thread that GIMPLE semantics to not respect the subobject > boundaries. That's a sad reality. > > [ ... ] > >> >> I care less about the optimization than I do about the basic >> premise that it's essential to respect subobject boundaries(*). > I understand, but the semantics of GIMPLE do not respect them. We can > argue about whether or not those should change and what it would take to > fix that. But right now the existing semantics do not respect those > boundaries. They don't respect them in all cases (i.e., when MEM_REF loses information about the structure of an access) but in a good number of them GCC can still derive useful information from the access. It's relied on to great a effect by _FORTIFTY_SOURCE. I think it would be a welcome enhancement if besides out-of- bounds writes _FORTIFTY_SOURCE also prevented out-of-bounds reads. >> It would make little sense to undo the strlen optimization >> without also undoing the optimization for the direct array >> access case. Undoing either would raise the question about >> the validity of the _FORRTIFY_SOURCE=2 behavior. That would >> be a huge step backwards in terms of code security. If we >> did some of these but not others, it would make the behavior >> inconsistent and surprising, all to accommodate one instance >> of invalid code. > In the direct array access case I think (and I'm sure Jakub, Richi and > others will correct me if I'm wrong), we can use the object's type > because the dereferences are actually using the array's type. Subscripting and pointer access are identical both in C/C++ and in GCC's _FORTIFY_SOURCE. The absence of a distinction between the two is essential for preventing writes past the end by string functions like strcpy (_FORTIFY_SOURCE). >> If we had a valid test case where the strlen optimization >> leads to invalid code, or even if there were a significant >> number of bug reports showing that it breaks an invalid >> but common idiom, I would certainly feel compelled to >> make it right somehow. But there has been just one bug >> report with clearly invalid code that should be easily >> corrected. > Again, I think you're too narrowly focused on C/C++ semantics here. > What matters are the semantics in GIMPLE. I don't get that. GCC is a C/C++ compiler (besides other languages), but not a GIMPLE compiler. The only reason this came up at all is a bug report with an invalid C test case that reads past the end. The only reason in my mind to consider relaxing an assumption/restriction would be a valid test case (in any supported language) that the optimization invalidates. But as I said, far more essential than the optimization is the ability to detect these invalid access (both reads and writes), such as in: struct S { char a[4], b[2], c; }; void f (struct S *p) { strcpy (p->a, "1234"); // certain buffer overflow sprintf (p->b, "%s", p->a); // potential buffer overflow // ...but, to avoid false positives: sprintf (p->a, "%s", p->b); // no buffer overflow here } You've recently made comment elsewhere that you wish GCC would be more aggressive in detecting preventing undefined behavior by inserting traps. I agree but I don't see how we can aim for both looser and stricter UB detection at the same time. In any event, it seems clear to me that I've lost the argument. If you undo the optimization then please retain the functionality for the warnings. Otherwise you might as well remove those too. Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-06 16:32 ` Martin Sebor @ 2018-08-06 17:44 ` Richard Biener 2018-08-06 23:59 ` Martin Sebor 2018-08-06 22:48 ` Jeff Law 1 sibling, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-08-06 17:44 UTC (permalink / raw) To: Martin Sebor, Jeff Law, Bernd Edlinger, GCC Patches; +Cc: Jakub Jelinek On August 6, 2018 6:32:41 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote: >On 08/06/2018 09:12 AM, Jeff Law wrote: >> On 08/04/2018 03:56 PM, Martin Sebor wrote: >>> On 08/03/2018 01:00 AM, Jeff Law wrote: >>>> On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >>>>> On 07/24/18 23:46, Jeff Law wrote: >>>>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>>>>> Hi! >>>>>>> >>>>>>> This patch makes strlen range computations more conservative. >>>>>>> >>>>>>> Firstly if there is a visible type cast from type A to B before >>>>>>> passing >>>>>>> then value to strlen, don't expect the type layout of B to >restrict >>>>>>> the >>>>>>> possible return value range of strlen. >>>>>> Why do you think this is the right thing to do? ie, is there >language >>>>>> in the standards that makes you think the code as it stands today >is >>>>>> incorrect from a conformance standpoint? Is there a significant >>>>>> body of >>>>>> code that is affected in an adverse way by the current code? If >so, >>>>>> what code? >>>>>> >>>>>> >>>>> >>>>> I think if you have an object, of an effective type A say >char[100], >>>>> then >>>>> you can cast the address of A to B, say typedef char (*B)[2] for >>>>> instance >>>>> and then to const char *, say for use in strlen. I may be wrong, >but >>>>> I think >>>>> that we should at least try not to pick up char[2] from B, but >instead >>>>> use A for strlen ranges, or leave this range open. Currently the >range >>>>> info for strlen is [0..1] in this case, even if we see the type >cast >>>>> in the generic tree. >>>> ISTM that you're essentially saying that the cast to const char * >>>> destroys any type information we can exploit here. But if that's >the >>>> case, then I don't think we can even derive a range of [0,99]. >What's >>>> to say that "A" didn't result from a similar cast of some object >that >>>> was char[200] that happened out of the scope of what we could see >during >>>> the strlen range computation? >>>> >>>> If that is what you're arguing, then I think there's a >re-evaluation >>>> that needs to happen WRT strlen range computation/ >>>> >>>> And just to be clear, I do see this as a significant correctness >>>> question. >>>> >>>> Martin, thoughts? >>> >>> The argument is that given: >>> >>> struct S { char a[4], b; }; >>> >>> char a[8] = "1234567"; >>> >>> this is valid and should pass: >>> >>> __attribute__ ((noipa)) >>> void f (struct S *p) >>> { >>> assert (7 == strlen (p->a)); >>> } >>> >>> int main (void) >>> { >>> f ((struct S*)a); >>> } >>> >>> (This is the basic premise behind pr86259.) >>> >>> This argument is wrong and the code is invalid. For the access >>> to p->a to be valid p must point to an object of struct S (it >>> doesn't) and the p->a array must hold a nul-terminated string >>> (it also doesn't). >> I agree with you for C/C++, but I think it's been shown elsewhere in >> this thread that GIMPLE semantics to not respect the subobject >> boundaries. That's a sad reality. >> >> [ ... ] >> >>> >>> I care less about the optimization than I do about the basic >>> premise that it's essential to respect subobject boundaries(*). >> I understand, but the semantics of GIMPLE do not respect them. We >can >> argue about whether or not those should change and what it would take >to >> fix that. But right now the existing semantics do not respect those >> boundaries. > >They don't respect them in all cases (i.e., when MEM_REF loses >information about the structure of an access) but in a good >number of them GCC can still derive useful information from >the access. It's relied on to great a effect by _FORTIFTY_SOURCE. >I think it would be a welcome enhancement if besides out-of- >bounds writes _FORTIFTY_SOURCE also prevented out-of-bounds >reads. > >>> It would make little sense to undo the strlen optimization >>> without also undoing the optimization for the direct array >>> access case. Undoing either would raise the question about >>> the validity of the _FORRTIFY_SOURCE=2 behavior. That would >>> be a huge step backwards in terms of code security. If we >>> did some of these but not others, it would make the behavior >>> inconsistent and surprising, all to accommodate one instance >>> of invalid code. >> In the direct array access case I think (and I'm sure Jakub, Richi >and >> others will correct me if I'm wrong), we can use the object's type >> because the dereferences are actually using the array's type. > >Subscripting and pointer access are identical both in C/C++ >and in GCC's _FORTIFY_SOURCE. The absence of a distinction >between the two is essential for preventing writes past >the end by string functions like strcpy (_FORTIFY_SOURCE). > >>> If we had a valid test case where the strlen optimization >>> leads to invalid code, or even if there were a significant >>> number of bug reports showing that it breaks an invalid >>> but common idiom, I would certainly feel compelled to >>> make it right somehow. But there has been just one bug >>> report with clearly invalid code that should be easily >>> corrected. >> Again, I think you're too narrowly focused on C/C++ semantics here. >> What matters are the semantics in GIMPLE. > >I don't get that. GCC is a C/C++ compiler (besides other >languages), but not a GIMPLE compiler. The only reason this >came up at all is a bug report with an invalid C test case that >reads past the end. The only reason in my mind to consider >relaxing an assumption/restriction would be a valid test case >(in any supported language) that the optimization invalidates. > >But as I said, far more essential than the optimization is >the ability to detect these invalid access (both reads and >writes), such as in: The essential thing is to not introduce latent wrong code issues because you exploit C language constraints that are not preserved by GIMPLE transforms because they are not constraints in the GIMPLE IL _WHICH_ _IS_ _NOT_ _C_. Richard. > > struct S { char a[4], b[2], c; }; > > void f (struct S *p) > { > strcpy (p->a, "1234"); // certain buffer overflow > > sprintf (p->b, "%s", p->a); // potential buffer overflow > > // ...but, to avoid false positives: > sprintf (p->a, "%s", p->b); // no buffer overflow here > } > >You've recently made comment elsewhere that you wish GCC would >be more aggressive in detecting preventing undefined behavior >by inserting traps. I agree but I don't see how we can aim >for both looser and stricter UB detection at the same time. > >In any event, it seems clear to me that I've lost the argument. >If you undo the optimization then please retain the functionality >for the warnings. Otherwise you might as well remove those too. > >Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-06 17:44 ` Richard Biener @ 2018-08-06 23:59 ` Martin Sebor 2018-08-07 15:54 ` Jeff Law 0 siblings, 1 reply; 121+ messages in thread From: Martin Sebor @ 2018-08-06 23:59 UTC (permalink / raw) To: Richard Biener, Jeff Law, Bernd Edlinger, GCC Patches; +Cc: Jakub Jelinek >> But as I said, far more essential than the optimization is >> the ability to detect these invalid access (both reads and >> writes), such as in: > > The essential thing is to not introduce latent wrong code issues because you exploit C language constraints that are not preserved by GIMPLE transforms because they are not constraints in the GIMPLE IL _WHICH_ _IS_ _NOT_ _C_. You misunderstood my point: I'm saying "if you must, disable the strlen optimization but please don't compromise the bug detection." Martin ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-06 23:59 ` Martin Sebor @ 2018-08-07 15:54 ` Jeff Law 0 siblings, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-07 15:54 UTC (permalink / raw) To: Martin Sebor, Richard Biener, Bernd Edlinger, GCC Patches; +Cc: Jakub Jelinek On 08/06/2018 05:59 PM, Martin Sebor wrote: >>> But as I said, far more essential than the optimization is >>> the ability to detect these invalid access (both reads and >>> writes), such as in: >> >> The essential thing is to not introduce latent wrong code issues >> because you exploit C language constraints that are not preserved by >> GIMPLE transforms because they are not constraints in the GIMPLE IL >> _WHICH_ _IS_ _NOT_ _C_. > > You misunderstood my point: I'm saying "if you must, disable > the strlen optimization but please don't compromise the bug > detection." And to be clear, you should be involved in that process. Note if fixing the codegen issues loses warnings and restoring the warnings is a major effort, then the warnings may have to regress. jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-06 16:32 ` Martin Sebor 2018-08-06 17:44 ` Richard Biener @ 2018-08-06 22:48 ` Jeff Law 1 sibling, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-06 22:48 UTC (permalink / raw) To: Martin Sebor, Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek On 08/06/2018 10:32 AM, Martin Sebor wrote: > On 08/06/2018 09:12 AM, Jeff Law wrote: >> On 08/04/2018 03:56 PM, Martin Sebor wrote: >>> On 08/03/2018 01:00 AM, Jeff Law wrote: >>>> On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >>>>> On 07/24/18 23:46, Jeff Law wrote: >>>>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>>>>> Hi! >>>>>>> >>>>>>> This patch makes strlen range computations more conservative. >>>>>>> >>>>>>> Firstly if there is a visible type cast from type A to B before >>>>>>> passing >>>>>>> then value to strlen, don't expect the type layout of B to restrict >>>>>>> the >>>>>>> possible return value range of strlen. >>>>>> Why do you think this is the right thing to do? ie, is there >>>>>> language >>>>>> in the standards that makes you think the code as it stands today is >>>>>> incorrect from a conformance standpoint? Is there a significant >>>>>> body of >>>>>> code that is affected in an adverse way by the current code? If so, >>>>>> what code? >>>>>> >>>>>> >>>>> >>>>> I think if you have an object, of an effective type A say char[100], >>>>> then >>>>> you can cast the address of A to B, say typedef char (*B)[2] for >>>>> instance >>>>> and then to const char *, say for use in strlen. I may be wrong, but >>>>> I think >>>>> that we should at least try not to pick up char[2] from B, but instead >>>>> use A for strlen ranges, or leave this range open. Currently the >>>>> range >>>>> info for strlen is [0..1] in this case, even if we see the type cast >>>>> in the generic tree. >>>> ISTM that you're essentially saying that the cast to const char * >>>> destroys any type information we can exploit here. But if that's the >>>> case, then I don't think we can even derive a range of [0,99]. What's >>>> to say that "A" didn't result from a similar cast of some object that >>>> was char[200] that happened out of the scope of what we could see >>>> during >>>> the strlen range computation? >>>> >>>> If that is what you're arguing, then I think there's a re-evaluation >>>> that needs to happen WRT strlen range computation/ >>>> >>>> And just to be clear, I do see this as a significant correctness >>>> question. >>>> >>>> Martin, thoughts? >>> >>> The argument is that given: >>> >>>  struct S { char a[4], b; }; >>> >>>  char a[8] = "1234567"; >>> >>> this is valid and should pass: >>> >>>  __attribute__ ((noipa)) >>>  void f (struct S *p) >>>  { >>>    assert (7 == strlen (p->a)); >>>  } >>> >>>  int main (void) >>>  { >>>    f ((struct S*)a); >>>  } >>> >>> (This is the basic premise behind pr86259.) >>> >>> This argument is wrong and the code is invalid. For the access >>> to p->a to be valid p must point to an object of struct S (it >>> doesn't) and the p->a array must hold a nul-terminated string >>> (it also doesn't). >> I agree with you for C/C++, but I think it's been shown elsewhere in >> this thread that GIMPLE semantics to not respect the subobject >> boundaries. That's a sad reality. >> >> [ ... ] >> >>> >>> I care less about the optimization than I do about the basic >>> premise that it's essential to respect subobject boundaries(*). >> I understand, but the semantics of GIMPLE do not respect them. We can >> argue about whether or not those should change and what it would take to >> fix that. But right now the existing semantics do not respect those >> boundaries. > > They don't respect them in all cases (i.e., when MEM_REF loses > information about the structure of an access) but in a good > number of them GCC can still derive useful information from > the access. It's relied on to great a effect by _FORTIFTY_SOURCE. > I think it would be a welcome enhancement if besides out-of- > bounds writes _FORTIFTY_SOURCE also prevented out-of-bounds > reads. > >>> It would make little sense to undo the strlen optimization >>> without also undoing the optimization for the direct array >>> access case. Undoing either would raise the question about >>> the validity of the _FORRTIFY_SOURCE=2 behavior. That would >>> be a huge step backwards in terms of code security. If we >>> did some of these but not others, it would make the behavior >>> inconsistent and surprising, all to accommodate one instance >>> of invalid code. >> In the direct array access case I think (and I'm sure Jakub, Richi and >> others will correct me if I'm wrong), we can use the object's type >> because the dereferences are actually using the array's type. > > Subscripting and pointer access are identical both in C/C++ > and in GCC's _FORTIFY_SOURCE. The absence of a distinction > between the two is essential for preventing writes past > the end by string functions like strcpy (_FORTIFY_SOURCE). > >>> If we had a valid test case where the strlen optimization >>> leads to invalid code, or even if there were a significant >>> number of bug reports showing that it breaks an invalid >>> but common idiom, I would certainly feel compelled to >>> make it right somehow. But there has been just one bug >>> report with clearly invalid code that should be easily >>> corrected. >> Again, I think you're too narrowly focused on C/C++ semantics here. >> What matters are the semantics in GIMPLE. > > I don't get that. GCC is a C/C++ compiler (besides other > languages), but not a GIMPLE compiler.  The only reason this > came up at all is a bug report with an invalid C test case that > reads past the end. The only reason in my mind to consider > relaxing an assumption/restriction would be a valid test case > (in any supported language) that the optimization invalidates. > > But as I said, far more essential than the optimization is > the ability to detect these invalid access (both reads and > writes), such as in: > >  struct S { char a[4], b[2], c; }; > >  void f (struct S *p) >  { >    strcpy (p->a, "1234");       // certain buffer overflow > >    sprintf (p->b, "%s", p->a);  // potential buffer overflow > >    // ...but, to avoid false positives: >    sprintf (p->a, "%s", p->b);  // no buffer overflow here >  } > > You've recently made comment elsewhere that you wish GCC would > be more aggressive in detecting preventing undefined behavior > by inserting traps. I agree but I don't see how we can aim > for both looser and stricter UB detection at the same time. Certainly we can't insert a trap unless we are absolutely certain the code in question is undefined behavior. In fact, if a language comes along that says division by zero is well defined and should (for example) saturate to INT_MAX, then trapping after the division by zero is clearly going to be wrong and we'd have to conditionalize the trap insertion code. In cases where we are certain an array dereference is out of bounds and undefined, then I'd fully support using __builtin_trap to halt the program immediately after the array dereference. But in the cases we're arguing about we don't have that level of certainty because we don't have good type information on the strlen call. It's lost and not really recoverable and thus we have to be conservative in those cases. Jeff` ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-24 23:18 ` Bernd Edlinger ` (3 preceding siblings ...) 2018-08-03 7:00 ` Jeff Law @ 2018-08-09 5:26 ` Jeff Law 2018-08-09 6:27 ` Richard Biener 4 siblings, 1 reply; 121+ messages in thread From: Jeff Law @ 2018-08-09 5:26 UTC (permalink / raw) To: Bernd Edlinger, GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Martin Sebor On 07/24/2018 05:18 PM, Bernd Edlinger wrote: > On 07/24/18 23:46, Jeff Law wrote: >> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>> Hi! >>> >>> This patch makes strlen range computations more conservative. >>> >>> Firstly if there is a visible type cast from type A to B before passing >>> then value to strlen, don't expect the type layout of B to restrict the >>> possible return value range of strlen. >> Why do you think this is the right thing to do? ie, is there language >> in the standards that makes you think the code as it stands today is >> incorrect from a conformance standpoint? Is there a significant body of >> code that is affected in an adverse way by the current code? If so, >> what code? >> >> > > I think if you have an object, of an effective type A say char[100], then > you can cast the address of A to B, say typedef char (*B)[2] for instance > and then to const char *, say for use in strlen. I may be wrong, but I think > that we should at least try not to pick up char[2] from B, but instead > use A for strlen ranges, or leave this range open. Currently the range > info for strlen is [0..1] in this case, even if we see the type cast > in the generic tree. Coming back to this... I'd like to hope we can depend on the type of the strlen argument. Obviously if it's a char *, we know nothing. But if it's an ARRAY_TYPE it'd be advantageous if we could use the array bounds to bound the length. *But* we have code which will turn pointer access into array indexing. tree-ssa-forwprop.c can do that, there may be others. So if we originally had a char * argument to strlen, but forwprop changed it into a char[N] type, then have we just broken things? I'd totally forgotten about this behavior from forwprop. PR 46393 contains sample code where this happens. I also thought we had code to recover array indexing from pointer arithmetic in the C/C++ front-end, but I can't seem to find it tonight. But it would raise similar concerns. > > One other example I have found in one of the test cases: > > char c; > > if (strlen(&c) != 0) abort(); > > this is now completely elided, but why? Is there a code base where > that is used? I doubt, but why do we care to eliminate something > stupid like that? If we would emit a warning for that I'm fine with it, > But if we silently remove code like that I don't think that it > will improve anything. So I ask, where is the code base which > gets an improvement from that optimization? I think it falls out naturally from trying to get accurate computations. I don't think either Martin or I really care about optimizing strlen in this case. In fact it's so clearly erroneous that it ought to generate a diagnostic on its own. Knowing Martin it was probably included in the tests for completeness. However, there is a fair amount of code that passes addresses of characters to functions that want char * string arguments and those functions promptly walk off the end of the single character, unterminated string. We actually just saw one of these in glibc that was detected by Martin's recent work. So it's definitely useful to track how these kinds of values get used. > >> Ultimately we want highly accurate string lengths to help improve the >> quality of the warnings we generate for potentially dangerous code. >> These changes seem to take us in the opposite direction. >> > > No, I don't think so, we have full control on the direction, when > I do what Richi requested on his response, we will have one function > where the string length estimation is based upon, instead of several > open coded tree walks. I don't think anyone objects to consolidating length computation. What I think we're hashing through is how does the object model in GIMPLE affect the assumptions that can be correctly made about lengths of objects. When I ACK'd Martin's patches I'd totally forgotten about these issues in GIMPLE and the impact they'd have if they were used in areas that affect code generation. That is absolutely and totally my mistake. I suspect that we're ultimately going to have to refine the design a bit so that the lengths that impact code generation/optimization are distinct from those that are used for warnings. I'm not keen on this concept, but I believe it's better than just reverting all the work on issuing diagnostics for fishy code. We're going to be kicking this around immediately -- there's a concern that some of this may have gotten into the gcc-7 and/or gcc-8 codebase. We need to get a sense of scale of the damage as well as a sense of scale for how to go about fixing the codegen issues while still keeping the benefits in the warning code. If we go quiet, it's not from a lack of caring about this issue. Quite the opposite, we want to make sure we address these issues correctly without just churning on the trunk. Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-09 5:26 ` Jeff Law @ 2018-08-09 6:27 ` Richard Biener 2018-08-17 5:09 ` Jeff Law 0 siblings, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-08-09 6:27 UTC (permalink / raw) To: Jeff Law, Bernd Edlinger, GCC Patches; +Cc: Jakub Jelinek, Martin Sebor On August 9, 2018 7:26:19 AM GMT+02:00, Jeff Law <law@redhat.com> wrote: >On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >> On 07/24/18 23:46, Jeff Law wrote: >>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>> Hi! >>>> >>>> This patch makes strlen range computations more conservative. >>>> >>>> Firstly if there is a visible type cast from type A to B before >passing >>>> then value to strlen, don't expect the type layout of B to restrict >the >>>> possible return value range of strlen. >>> Why do you think this is the right thing to do? ie, is there >language >>> in the standards that makes you think the code as it stands today is >>> incorrect from a conformance standpoint? Is there a significant >body of >>> code that is affected in an adverse way by the current code? If so, >>> what code? >>> >>> >> >> I think if you have an object, of an effective type A say char[100], >then >> you can cast the address of A to B, say typedef char (*B)[2] for >instance >> and then to const char *, say for use in strlen. I may be wrong, but >I think >> that we should at least try not to pick up char[2] from B, but >instead >> use A for strlen ranges, or leave this range open. Currently the >range >> info for strlen is [0..1] in this case, even if we see the type cast >> in the generic tree. >Coming back to this... I'd like to hope we can depend on the type of >the >strlen argument. Obviously if it's a char *, we know nothing. But if >it's an ARRAY_TYPE it'd be advantageous if we could use the array >bounds >to bound the length. But the FE made it char * and only because pointer type conversions are useless we may see sth else. So we cannot use the type of the argument. > >*But* we have code which will turn pointer access into array indexing. >tree-ssa-forwprop.c can do that, there may be others. So if we >originally had a char * argument to strlen, but forwprop changed it >into >a char[N] type, then have we just broken things? I'd totally forgotten >about this behavior from forwprop. PR 46393 contains sample code where >this happens. If we really still have this it must be very constrained. Because we used to have sorts of wrong code issues with this and data dependence analysis. >I also thought we had code to recover array indexing from pointer >arithmetic in the C/C++ front-end, but I can't seem to find it tonight. >But it would raise similar concerns. I've removed that. What I did at some point is avoid decaying too much array to pointer conversions in the FEs to preserve array refs instead of trying to reconstruct them late. > > > >> >> One other example I have found in one of the test cases: >> >> char c; >> >> if (strlen(&c) != 0) abort(); >> >> this is now completely elided, but why? Is there a code base where >> that is used? I doubt, but why do we care to eliminate something >> stupid like that? If we would emit a warning for that I'm fine with >it, >> But if we silently remove code like that I don't think that it >> will improve anything. So I ask, where is the code base which >> gets an improvement from that optimization? >I think it falls out naturally from trying to get accurate >computations. >I don't think either Martin or I really care about optimizing strlen in >this case. In fact it's so clearly erroneous that it ought to generate >a diagnostic on its own. Knowing Martin it was probably included in >the >tests for completeness. > >However, there is a fair amount of code that passes addresses of >characters to functions that want char * string arguments and those >functions promptly walk off the end of the single character, >unterminated string. We actually just saw one of these in glibc that >was detected by Martin's recent work. So it's definitely useful to >track how these kinds of values get used. > >> >>> Ultimately we want highly accurate string lengths to help improve >the >>> quality of the warnings we generate for potentially dangerous code. >>> These changes seem to take us in the opposite direction. >>> >> >> No, I don't think so, we have full control on the direction, when >> I do what Richi requested on his response, we will have one function >> where the string length estimation is based upon, instead of several >> open coded tree walks. >I don't think anyone objects to consolidating length computation. What >I think we're hashing through is how does the object model in GIMPLE >affect the assumptions that can be correctly made about lengths of >objects. When I ACK'd Martin's patches I'd totally forgotten about >these issues in GIMPLE and the impact they'd have if they were used in >areas that affect code generation. That is absolutely and totally my >mistake. > >I suspect that we're ultimately going to have to refine the design a >bit >so that the lengths that impact code generation/optimization are >distinct from those that are used for warnings. I'm not keen on this >concept, but I believe it's better than just reverting all the work on >issuing diagnostics for fishy code. > > >We're going to be kicking this around immediately -- there's a concern >that some of this may have gotten into the gcc-7 and/or gcc-8 codebase. >We need to get a sense of scale of the damage as well as a sense of >scale for how to go about fixing the codegen issues while still keeping >the benefits in the warning code. > >If we go quiet, it's not from a lack of caring about this issue. Quite >the opposite, we want to make sure we address these issues correctly >without just churning on the trunk. > >Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-08-09 6:27 ` Richard Biener @ 2018-08-17 5:09 ` Jeff Law 0 siblings, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-17 5:09 UTC (permalink / raw) To: Richard Biener, Bernd Edlinger, GCC Patches; +Cc: Jakub Jelinek, Martin Sebor On 08/09/2018 12:27 AM, Richard Biener wrote: > On August 9, 2018 7:26:19 AM GMT+02:00, Jeff Law <law@redhat.com> wrote: >> On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >>> On 07/24/18 23:46, Jeff Law wrote: >>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>>> Hi! >>>>> >>>>> This patch makes strlen range computations more conservative. >>>>> >>>>> Firstly if there is a visible type cast from type A to B before >> passing >>>>> then value to strlen, don't expect the type layout of B to restrict >> the >>>>> possible return value range of strlen. >>>> Why do you think this is the right thing to do? ie, is there >> language >>>> in the standards that makes you think the code as it stands today is >>>> incorrect from a conformance standpoint? Is there a significant >> body of >>>> code that is affected in an adverse way by the current code? If so, >>>> what code? >>>> >>>> >>> >>> I think if you have an object, of an effective type A say char[100], >> then >>> you can cast the address of A to B, say typedef char (*B)[2] for >> instance >>> and then to const char *, say for use in strlen. I may be wrong, but >> I think >>> that we should at least try not to pick up char[2] from B, but >> instead >>> use A for strlen ranges, or leave this range open. Currently the >> range >>> info for strlen is [0..1] in this case, even if we see the type cast >>> in the generic tree. >> Coming back to this... I'd like to hope we can depend on the type of >> the >> strlen argument. Obviously if it's a char *, we know nothing. But if >> it's an ARRAY_TYPE it'd be advantageous if we could use the array >> bounds >> to bound the length. > > But the FE made it char * and only because pointer type conversions are useless we may see sth else. So we cannot use the type of the argument. I must have missed something along the line -- I thought I saw array types passed directly without converting to a char *. But as I noted later, we still have things like forwprop that may convert pointer arithmetic/access to arrays. So I think we have to give up on using the types for anything that affects code generation. I think that's consistent with your and Jakub's position. I think we do want to use types to drive warnings though. They help us eliminate meaningful numbers of false positives. There may be oddball missed warnings or false positives because of the actions of forwprop, but we should see far fewer issues if we rely on types for warnings. > >> >> *But* we have code which will turn pointer access into array indexing. >> tree-ssa-forwprop.c can do that, there may be others. So if we >> originally had a char * argument to strlen, but forwprop changed it >> into >> a char[N] type, then have we just broken things? I'd totally forgotten >> about this behavior from forwprop. PR 46393 contains sample code where >> this happens. > > If we really still have this it must be very constrained. Because we used to have sorts of wrong code issues with this and data dependence analysis. Yea, it's still in there. I verified it during the gcc-8 cycle since the BZ is a regression. The general forms of the transformation is dicussed in a comment at the top of the file. Given all that we have discussed in this thread this code could well be a ticking time bomb for wrong code issues. What's scary to me is git blame says I wrote that code. I certainly remember writing bits of tree-ssa-forwprop, but don't recall writing this specific transformation. > >> I also thought we had code to recover array indexing from pointer >> arithmetic in the C/C++ front-end, but I can't seem to find it tonight. >> But it would raise similar concerns. > > I've removed that. What I did at some point is avoid decaying too much array to pointer conversions in the FEs to preserve array refs instead of trying to reconstruct them late. THat explains why I couldn't find it :-) Jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-24 21:46 ` Jeff Law 2018-07-24 23:18 ` Bernd Edlinger @ 2018-07-25 7:08 ` Richard Biener 2018-08-02 16:45 ` Jeff Law 1 sibling, 1 reply; 121+ messages in thread From: Richard Biener @ 2018-07-25 7:08 UTC (permalink / raw) To: Jeff Law; +Cc: Bernd Edlinger, GCC Patches, Jakub Jelinek On Tue, 24 Jul 2018, Jeff Law wrote: > On 07/24/2018 01:59 AM, Bernd Edlinger wrote: > > Hi! > > > > This patch makes strlen range computations more conservative. > > > > Firstly if there is a visible type cast from type A to B before passing > > then value to strlen, don't expect the type layout of B to restrict the > > possible return value range of strlen. > Why do you think this is the right thing to do? ie, is there language > in the standards that makes you think the code as it stands today is > incorrect from a conformance standpoint? Is there a significant body of > code that is affected in an adverse way by the current code? If so, > what code? > > > > > > > Furthermore use the outermost enclosing array instead of the > > innermost one, because too aggressive optimization will likely > > convert harmless errors into security-relevant errors, because > > as the existing test cases demonstrate, this optimization is actively > > attacking string length checks in user code, while and not giving > > any warnings. > Same questions here. > > I'll also note that Martin is *very* aware of the desire to avoid > introducing security relevent errors. In fact his main focus is to help > identify coding errors that have a security impact. So please don't > characterize his work as "actively attacking string length checks in > user code". > > Ultimately we want highly accurate string lengths to help improve the > quality of the warnings we generate for potentially dangerous code. > These changes seem to take us in the opposite direction. > > So ISTM that you really need a stronger justification using the > standards compliance and/or real world code that is made less safe by > keeping string lengths as accurate as possible. Note you cannot solely look at what the C standard says. Instead you have to see where the middle-end lessens that constraints since these functions are not only called from C FE context. So arguing from a C language lawyer point here is pointless. You have to argue from a GENERIC language lawyer point which is going to be impossible since apart from the implementation (which includes how all existing GCC FEs _and_ the middle-end uses it) there is no formal specification available. Yes, in most cases we say we match C language behavior and constraints but in some cases we are clearly and definitely less strict and that has followup consequences in other areas. While exploiting all fine details of the C language might get you more constraints on things like string lengths you have to apply some common sense and middle-end knowledge - which means from my side trusting hunch feelings whether something is possibly not safe. As for patches in this area I would really love to see _smaller_ changes. And I'd like to see changes that make it clear what cases where supposed to be handled and _not_ including other cases "by accident". See especially my comments on this patch from Bernd. > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > > Is it OK for trunk? > I'd like to ask we hold on this until I return from PTO (Aug 1) so that > we can discuss the best thing to do here for each class of change. > > I think you, Martin, Richi and myself should hash through the technical > issues raised by the patch. Obviously others can chime in, but I think > the 4 of us probably need to drive the discussion. I think with respect to patches to fix issues in previous patches at this point a better option might be to revert the patches causing the issues and start from scratch in a more defined manner. Giving recent (temporary) regressions in the testsuite it feels like Martin is going too fast. Richard. -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) ^ permalink raw reply [flat|nested] 121+ messages in thread
* Re: [PATCH] Make strlen range computations more conservative 2018-07-25 7:08 ` Richard Biener @ 2018-08-02 16:45 ` Jeff Law 0 siblings, 0 replies; 121+ messages in thread From: Jeff Law @ 2018-08-02 16:45 UTC (permalink / raw) To: Richard Biener; +Cc: Bernd Edlinger, GCC Patches, Jakub Jelinek On 07/25/2018 01:08 AM, Richard Biener wrote: >> >> So ISTM that you really need a stronger justification using the >> standards compliance and/or real world code that is made less safe by >> keeping string lengths as accurate as possible. > > Note you cannot solely look at what the C standard says. Instead > you have to see where the middle-end lessens that constraints since > these functions are not only called from C FE context. Agreed. I guess my point WRT standards was to start with C/C++ and see if they rule in/out either case. If (for example) we could make a determination that either standard didn't allow us to narrow the length of a string after a cast in the way the code currently does, then that would make the current code simply wrong and we should either revert those bits or use Bernd's patches in that space. If the language standards don't give us guidance of that nature, then we look at common uses as well as our own GIMPLE/GENERIC (unwritten) semantics to guide us one way or the other. > > So arguing from a C language lawyer point here is pointless. You > have to argue from a GENERIC language lawyer point which is going > to be impossible since apart from the implementation (which > includes how all existing GCC FEs _and_ the middle-end uses it) > there is no formal specification available. Yes, in most cases > we say we match C language behavior and constraints but in some > cases we are clearly and definitely less strict and that has > followup consequences in other areas. > > While exploiting all fine details of the C language might get > you more constraints on things like string lengths you have to > apply some common sense and middle-end knowledge - which means > from my side trusting hunch feelings whether something is > possibly not safe. I'm absolutely not suggesting we want to exploit all the finer language details here. The language standards are just one piece of the puzzle IMHO. > > As for patches in this area I would really love to see _smaller_ > changes. And I'd like to see changes that make it clear > what cases where supposed to be handled and _not_ including > other cases "by accident". See especially my comments on this patch > from Bernd. Agreed. > >>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >>> Is it OK for trunk? >> I'd like to ask we hold on this until I return from PTO (Aug 1) so that >> we can discuss the best thing to do here for each class of change. >> >> I think you, Martin, Richi and myself should hash through the technical >> issues raised by the patch. Obviously others can chime in, but I think >> the 4 of us probably need to drive the discussion. > > I think with respect to patches to fix issues in previous patches > at this point a better option might be to revert the patches causing > the issues and start from scratch in a more defined manner. I wouldn't object to that. I think there's enough concerns here that we ought to slow down and re-visit the reasoning behind each change and make sure it's sensible. If the best way to do that is revert, break the patches down and resubmit, then let's do that. Mostly what I wanted to do was give us time to hash through the changes and decide what the best option was without having two developers changing, then reverting bits of each other's work. To that end, again, I'm open to reverting Martin's work and having it resubmitted as we has through each issue. My general preference is to lean towards more accurate length information (mostly to drive accuracy in warnings elsewhere), but that obviously has to be within the constraints of the language standards, common practice, as well as our own internal semantics. If we need to pull back in some spaces, that's fine, but we should clearly document when/how we're doing that and if possible make it consistent throughout the compiler if at all possible. I'm also open to the concept of splitting this stuff up a bit in the sense of what we'll use for optimization vs what we'll use to increase accuracy of our other warnings such as sprintf). > > Giving recent (temporary) regressions in the testsuite it feels like > Martin is going too fast. ACK. jeff ^ permalink raw reply [flat|nested] 121+ messages in thread
end of thread, other threads:[~2018-11-16 17:26 UTC | newest] Thread overview: 121+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-07-24 7:59 [PATCH] Make strlen range computations more conservative Bernd Edlinger 2018-07-24 14:50 ` Richard Biener 2018-07-25 13:03 ` Bernd Edlinger 2018-07-24 16:14 ` Martin Sebor 2018-07-24 21:46 ` Jeff Law 2018-07-24 23:18 ` Bernd Edlinger 2018-07-25 4:52 ` Jeff Law 2018-07-25 7:23 ` Richard Biener 2018-07-25 19:37 ` Martin Sebor 2018-07-26 8:55 ` Richard Biener 2018-08-07 2:24 ` Martin Sebor 2018-08-07 8:51 ` Richard Biener 2018-08-07 14:37 ` Martin Sebor 2018-08-07 17:44 ` Richard Biener 2018-08-08 2:33 ` Martin Sebor 2018-08-17 10:31 ` Richard Biener 2018-08-17 15:49 ` Martin Sebor 2018-08-19 15:55 ` Bernd Edlinger 2018-08-20 10:24 ` Richard Biener 2018-08-20 17:23 ` Bernd Edlinger 2018-08-21 8:46 ` Richard Biener 2018-08-21 22:25 ` Jeff Law 2018-08-22 4:05 ` Bernd Edlinger 2018-08-22 16:05 ` Martin Sebor 2018-08-22 17:22 ` Bernd Edlinger 2018-08-22 22:34 ` Jeff Law 2018-08-22 22:57 ` Bernd Edlinger 2018-08-22 22:57 ` Martin Sebor 2018-08-22 23:08 ` Bernd Edlinger 2018-08-21 22:43 ` Jeff Law 2018-08-22 4:16 ` Bernd Edlinger 2018-08-22 23:41 ` Jeff Law 2018-08-26 9:58 ` Bernd Edlinger 2018-09-15 9:22 ` Bernd Edlinger 2018-10-10 23:12 ` Jeff Law 2018-10-12 15:03 ` Jeff Law 2018-10-13 9:07 ` Bernd Edlinger 2018-10-17 23:59 ` Jeff Law 2018-10-20 11:16 ` Bernd Edlinger 2018-11-16 17:26 ` Bernd Edlinger 2018-08-22 13:10 ` Bernd Edlinger 2018-10-24 9:14 ` Maxim Kuvyrkov 2018-10-24 13:38 ` Bernd Edlinger 2018-10-24 14:26 ` Maxim Kuvyrkov 2018-08-03 7:29 ` Jeff Law 2018-08-03 7:19 ` Jeff Law 2018-08-03 7:48 ` Jakub Jelinek 2018-08-06 14:58 ` Jeff Law 2018-08-20 10:06 ` Richard Biener 2018-07-25 17:31 ` Martin Sebor 2018-07-27 6:49 ` Bernd Edlinger 2018-07-31 3:45 ` Martin Sebor 2018-07-31 6:38 ` Jakub Jelinek 2018-07-31 15:17 ` Martin Sebor 2018-07-31 15:48 ` Jakub Jelinek 2018-07-31 23:20 ` Martin Sebor 2018-08-01 6:55 ` Bernd Edlinger 2018-08-03 4:19 ` Martin Sebor 2018-08-06 15:39 ` Jeff Law 2018-08-01 7:19 ` Richard Biener 2018-08-01 8:40 ` Jakub Jelinek 2018-08-03 3:59 ` Martin Sebor 2018-08-03 7:43 ` Jakub Jelinek 2018-08-04 20:52 ` Martin Sebor 2018-08-05 6:51 ` Bernd Edlinger 2018-08-05 15:49 ` Jeff Law 2018-08-06 17:15 ` Martin Sebor 2018-08-06 17:40 ` Jeff Law 2018-08-07 3:39 ` Martin Sebor 2018-08-07 5:45 ` Richard Biener 2018-08-07 15:02 ` Martin Sebor 2018-08-07 15:33 ` Bernd Edlinger 2018-08-07 16:31 ` Martin Sebor 2018-08-07 17:46 ` Richard Biener 2018-08-08 15:51 ` Martin Sebor 2018-08-08 16:12 ` Bernd Edlinger 2018-08-08 17:19 ` Richard Biener 2018-08-07 15:32 ` Jeff Law 2018-08-06 22:39 ` Jeff Law 2018-08-05 17:00 ` Jeff Law 2018-08-05 17:27 ` Richard Biener 2018-08-06 15:36 ` Martin Sebor 2018-08-02 3:13 ` Martin Sebor 2018-08-02 10:22 ` Bernd Edlinger 2018-08-02 15:42 ` Martin Sebor 2018-08-02 17:00 ` Martin Sebor 2018-08-02 18:15 ` Bernd Edlinger 2018-08-03 3:06 ` Martin Sebor 2018-08-02 18:20 ` Jakub Jelinek 2018-08-03 3:24 ` Martin Sebor 2018-08-09 5:36 ` Jeff Law 2018-08-10 16:56 ` Martin Sebor 2018-08-15 4:39 ` Jeff Law 2018-08-20 10:12 ` Richard Biener 2018-08-20 10:23 ` Bernd Edlinger 2018-08-20 14:26 ` Jeff Law 2018-08-20 15:16 ` Bernd Edlinger 2018-08-20 20:42 ` Martin Sebor 2018-08-20 21:31 ` Bernd Edlinger 2018-08-21 2:43 ` Martin Sebor 2018-08-21 5:38 ` Bernd Edlinger 2018-08-21 21:58 ` Jeff Law 2018-08-03 7:47 ` Jeff Law 2018-08-03 7:38 ` Jeff Law 2018-08-06 15:34 ` Jeff Law 2018-08-03 7:00 ` Jeff Law 2018-08-04 21:56 ` Martin Sebor 2018-08-05 6:08 ` Bernd Edlinger 2018-08-05 15:58 ` Jeff Law 2018-08-06 11:57 ` Bernd Edlinger 2018-08-06 15:12 ` Jeff Law 2018-08-06 16:32 ` Martin Sebor 2018-08-06 17:44 ` Richard Biener 2018-08-06 23:59 ` Martin Sebor 2018-08-07 15:54 ` Jeff Law 2018-08-06 22:48 ` Jeff Law 2018-08-09 5:26 ` Jeff Law 2018-08-09 6:27 ` Richard Biener 2018-08-17 5:09 ` Jeff Law 2018-07-25 7:08 ` Richard Biener 2018-08-02 16:45 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).