public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/95803] New: Failure to optimize strlen in certain situations properly, instead leading to weird code
@ 2020-06-21 16:55 gabravier at gmail dot com
  2020-06-22 23:52 ` [Bug tree-optimization/95803] " msebor at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: gabravier at gmail dot com @ 2020-06-21 16:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95803

            Bug ID: 95803
           Summary: Failure to optimize strlen in certain situations
                    properly, instead leading to weird code
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

bool f(int i)
{
    if (i < 4)
        i = 4;

    const char *s = &"abc"[i];

    return strlen(s) > 3;
}

This can be optimized to a simple `return true` or `return false` (considering
UB is invoked here). LLVM does this transformation, but GCC does not. Obviously
this code is pretty weird and may not be very realistic itself, but this
optimization would probably be effective for other code, and currently it
results in rather weird code generation :

f(int):
  mov ecx, 4
  cmp edi, 4
  mov eax, 3
  mov edx, ecx
  cmovge edx, edi
  movsx rdx, edx
  sub rax, rdx
  cmp rdx, 3
  mov edx, 0
  cmova rax, rdx
  cmp rax, 3
  seta al
  ret

accompanied by very weird final tree optimized code :

;; Function f (_Z1fi, funcdef_no=287, decl_uid=9588, cgraph_uid=216,
symbol_order=215)

f (int i)
{
  const char * s;
  long unsigned int _1;
  int _3;
  bool _6;
  sizetype _7;

  <bb 2> [local count: 1073741824]:
  # DEBUG BEGIN_STMT
  _3 = MAX_EXPR <i_2(D), 4>;
  # DEBUG i => _3
  # DEBUG BEGIN_STMT
  _7 = (sizetype) _3;
  s_4 = "abc" + _7;
  # DEBUG s => s_4
  # DEBUG BEGIN_STMT
  _1 = __builtin_strlen (s_4);
  _6 = _1 > 3;
  return _6;
}

Whereas this code :

bool f(int i)
{
    if (i < 4)
        i = 4;

    return strlen(&"abc"[i]) > 3;
}

Optimizes to this :

;; Function f (_Z1fi, funcdef_no=287, decl_uid=9588, cgraph_uid=216,
symbol_order=215)

f (int i)
{
  <bb 2> [local count: 1073741824]:
  # DEBUG BEGIN_STMT
  # DEBUG i => NULL
  # DEBUG BEGIN_STMT
  return 0;

}

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug tree-optimization/95803] Failure to optimize strlen in certain situations properly, instead leading to weird code
  2020-06-21 16:55 [Bug tree-optimization/95803] New: Failure to optimize strlen in certain situations properly, instead leading to weird code gabravier at gmail dot com
@ 2020-06-22 23:52 ` msebor at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: msebor at gcc dot gnu.org @ 2020-06-22 23:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95803

Martin Sebor <msebor at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Blocks|                            |83819
   Last reconfirmed|                            |2020-06-22
                 CC|                            |msebor at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Martin Sebor <msebor at gcc dot gnu.org> ---
The first case is due to failing to handle POINTER_PLUS expressions in
maybe_set_strlen_range() in tree-ssa-strlen.c.  The second case is folded into
(i <= 3 ? i : 0) very early on by fold_builtin_strlen().

There is no consistency in how GCC responds to instances of undefined behavior.
 Some are folded away and replaced by constants, others are intentionally made
to expand into library calls.  The referenced meta-bug tracks a number of such
cases.

We've been talking for years now about implementing a policy where the decision
how to respond is left up to the user.  The three common responses that were
discussed at GNU Cauldron in Manchester are to:

1) fold/eliminate/replace it with a safer alternative
2) replace with __builtin_trap
3) replace with __builtin_unreachable

All three with a warning.  I've been working on improvements in this area and
with my WIP patches to the detection of past the end reads GCC prints:

pr95803.c: In function ‘f’:
pr95803.c:8:12: warning: ‘__builtin_strlen’ reading 1 or more bytes from a
region of size 0 [-Wstringop-overread]
    8 |     return __builtin_strlen(s) > 3;
      |            ^~~~~~~~~~~~~~~~~~~

I expect to submit the patch for review soon (maybe even this week).  I support
folding this and similar code to a no-op provided it still triggers a warning.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83819
[Bug 83819] [meta-bug] missing strlen optimizations

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-06-22 23:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-21 16:55 [Bug tree-optimization/95803] New: Failure to optimize strlen in certain situations properly, instead leading to weird code gabravier at gmail dot com
2020-06-22 23:52 ` [Bug tree-optimization/95803] " msebor at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).