[Bug tree-optimization/104106] New: Fail to remove some useless loop

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/104106] New: Fail to remove some useless loop
@ 2022-01-18 22:25 denis.campredon at gmail dot com
  2022-01-18 22:27 ` [Bug tree-optimization/104106] " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: denis.campredon at gmail dot com @ 2022-01-18 22:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104106

            Bug ID: 104106
           Summary: Fail to remove some useless loop
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: denis.campredon at gmail dot com
  Target Milestone: ---

In the following snippet none of the loops are removed when compiled with -O2
or -O3.

In f and g the optimizers shoulds detect that tmp_a is only written and never
read.

In h, only one index of tmp_a is read, so it should be the only one computed.

Ideally, if not too complex for gcc, the first two loops should be removed and
the computations, if any, done on the last loop.

-------------
int f(char *a, unsigned n) {
    char tmp_a[n];

    for (unsigned i = 1; i != n; i++) tmp_a[i] = a[i];
    return a[0];
}

int g(char *a, int n) {
    char tmp_a[n];

    for (int i = 1; i < n; i++) tmp_a[i] = a[i] - a[i - 1];
    return a[0];
}

int h(char *a, int n) {
    char tmp_a[n];

    for (int i = 0; i < n; i++) tmp_a[i] = a[i];
    return tmp_a[1];
}

int i(char *a, char *b, int n) {
    char tmp_a[n];
    char tmp_b[n];

    for (int i = 1; i < n; i++) tmp_a[i] = a[i] - a[i - 1];
    for (int i = 1; i < n; i++) tmp_b[i] = b[i] - b[i - 1];

    int result = 0;
    for (int i = 1; i < n; i++) result += tmp_a[i] + tmp_b[i];

    return result;
}
---------------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/104106] Fail to remove some useless loop
  2022-01-18 22:25 [Bug tree-optimization/104106] New: Fail to remove some useless loop denis.campredon at gmail dot com
@ 2022-01-18 22:27 ` pinskia at gcc dot gnu.org
  2022-01-18 22:35 ` [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-18 22:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104106

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
           Keywords|                            |missed-optimization

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops
  2022-01-18 22:25 [Bug tree-optimization/104106] New: Fail to remove some useless loop denis.campredon at gmail dot com
  2022-01-18 22:27 ` [Bug tree-optimization/104106] " pinskia at gcc dot gnu.org
@ 2022-01-18 22:35 ` pinskia at gcc dot gnu.org
  2022-01-19  8:00 ` denis.campredon at gmail dot com
  2022-01-19 10:55 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-18 22:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104106

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
            Summary|Fail to remove some useless |Fail to remove stores to
                   |loop                        |VLA inside loops
   Last reconfirmed|                            |2022-01-18

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
If we change the VLA to a normal array, GCC is able to optimize f and g.

h is almost done:

  <bb 2> [local count: 118111600]:
  if (n_8(D) > 0)
    goto <bb 3>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 3> [local count: 105119324]:
  _18 = (unsigned int) n_8(D);
  _6 = (sizetype) _18;
  __builtin_memcpy (&tmp_a, a_11(D), _6);

  <bb 4> [local count: 118111600]:
  _4 = tmp_a[0];

GCC could do a PRE for the load of tmp_a[0] to a_11[0] and unspecified.
This is true even with the VLA.

As for i, that requires loop fision which GCC does not implement yet (there is
another bug about that even).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops
  2022-01-18 22:25 [Bug tree-optimization/104106] New: Fail to remove some useless loop denis.campredon at gmail dot com
  2022-01-18 22:27 ` [Bug tree-optimization/104106] " pinskia at gcc dot gnu.org
  2022-01-18 22:35 ` [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops pinskia at gcc dot gnu.org
@ 2022-01-19  8:00 ` denis.campredon at gmail dot com
  2022-01-19 10:55 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: denis.campredon at gmail dot com @ 2022-01-19  8:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104106

--- Comment #2 from denis.campredon at gmail dot com ---
The missed optimisations are also present if the arrays are allocated with
malloc or new.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops
  2022-01-18 22:25 [Bug tree-optimization/104106] New: Fail to remove some useless loop denis.campredon at gmail dot com
                   ` (2 preceding siblings ...)
  2022-01-19  8:00 ` denis.campredon at gmail dot com
@ 2022-01-19 10:55 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-19 10:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104106

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think DSE doesn't quite understand free() as must-def, likewise for
__builtin_stack_restore though that is _much_ more difficult to tie to
a specific alloca allocation.

It would be nice if the gimplifier would emit CLOBBERs for the VLAs
that go out of scope at the __builtin_stack_restore point, that would
fix the VLA case I think.  For

int f(char *a, unsigned n) {
    char *tmp_a = __builtin_malloc (n);

    for (unsigned i = 1; i != n; i++) tmp_a[i] = a[i];
    __builtin_free (tmp_a);
    return a[0];
}

it should already work but DSE is not set up to consider variable indexed
stores in loops that end up as pointer accesses like

*_5 = _6;

or rather stmt_kills_ref_p is too simple minded when looking at free():

          case BUILT_IN_FREE:
            {
              tree ptr = gimple_call_arg (stmt, 0);
              tree base = ao_ref_base (ref);
              if (base && TREE_CODE (base) == MEM_REF
                  && TREE_OPERAND (base, 0) == ptr)
                {
                  ++alias_stats.stmt_kills_ref_p_yes;
                  return true;
                }

it might be able to use points-to analysis or track down the base of
_5 via DR analysis but in the end I think it's DSEs job to do better
here.

With

int f(char *a, unsigned n) {
    char (*tmp_a)[n] = __builtin_malloc (n);

    for (unsigned i = 1; i != n; i++) (*tmp_a)[i] = a[i];
    __builtin_free (tmp_a);
    return a[0];
}

this issue is avoided but we then run into

          /* If we visit this PHI by following a backedge then we have to
             make sure ref->ref only refers to SSA names that are invariant
             with respect to the loop represented by this PHI node.  */
          if (dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
                              gimple_bb (temp))
              && !for_each_index (ref->ref ? &ref->ref : &ref->base,
                                  check_name, gimple_bb (temp)))
            return DSE_STORE_LIVE;

which explicitely disables DSE of variably indexed accesses with the index
varying in the loop.  That's because of cross iteration dependences otherwise
mishandled.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-19 10:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-18 22:25 [Bug tree-optimization/104106] New: Fail to remove some useless loop denis.campredon at gmail dot com
2022-01-18 22:27 ` [Bug tree-optimization/104106] " pinskia at gcc dot gnu.org
2022-01-18 22:35 ` [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops pinskia at gcc dot gnu.org
2022-01-19  8:00 ` denis.campredon at gmail dot com
2022-01-19 10:55 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).