From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-468616-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 16488 invoked by alias); 26 Nov 2014 10:44:16 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 16454 invoked by uid 48); 26 Nov 2014 10:44:12 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/64081] [5 Regression] r217827 prevents RTL loop unroll
Date: Wed, 26 Nov 2014 10:44:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-64081-4-nfgM9UzbWr@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-64081-4@http.gcc.gnu.org/bugzilla/>
References: <bug-64081-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-11/txt/msg03088.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64081
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Well, the change allows DOM to CSE pos = 0 in

  <bb 2>:
  pos = 0;
  dir = 1;
  _45 = (long unsigned int) argc_9(D);
  if (_45 != 0)
    goto <bb 3>;
  else
    goto <bb 21>;

  <bb 3>:
  pretmp_75 = data;
  pretmp_77 = token;
  arr1.5_23 = arr1;
  arr2.7_27 = arr2;
  pos_lsm.17_22 = pos;

which in turn allows jump threading to do its work.

  Registering jump thread: (11, 12) incoming edge;  (12, 5) joiner;  (5, 23)
normal; (23, 21) nocopy;
...
  Threaded jump 11 --> 12 to 25

this changes the loop to have two latches (thus it becomes a loop nest), and
it adds one exit (now the loop has three).  Further down the road the extra
loop is no longer there but the three exits remain.

I don't see what is wrong with what DOM does here.

We do miss some interesting kind of optimization opportunities like
transforming

  if (prephitmp_87 == 1)
    goto <bb 9>;
  else
    goto <bb 10>;

  <bb 9>:
  _24 = arr1.5_23 + _62;
  pos.6_25 = *_24;
  goto <bb 11>;

  <bb 10>:
  _28 = arr2.7_27 + _62;
  pos.8_29 = *_28;

  <bb 11>:
  # prephitmp_89 = PHI <pos.6_25(9), pos.8_29(10)>

to

  if (prephitmp_87 == 1)
    goto <bb 9>;
  else
    goto <bb 11>;

  <bb 9>:
  goto <bb 11>;

  <bb 11>:
  # _24 = PHI <arr1.5_23, arr2.7_27>
  _28 = _24 + _62;
  prephitmp_89 = *_28;

sinking common computations through a PHI.

With the followup optimization in out-of-SSA to coalesce arr1.5_23 and
arr2.7_27 which means we can drop the conditional entirely.

Heh.  Fun idea.

Anyway - for this PR it is RTL unroll IV analysis that needs to be improved.