From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-475015-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 2166 invoked by alias); 27 Jan 2015 07:56:45 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 2017 invoked by uid 48); 27 Jan 2015 07:56:30 -0000
From: "amker at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/62173] [5.0 regression] 64bit Arch can't ivopt while 32bit Arch can
Date: Tue, 27 Jan 2015 07:56:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: amker at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: jiwang at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-62173-4-Fh8SAAQ7F8@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-62173-4@http.gcc.gnu.org/bugzilla/>
References: <bug-62173-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-01/txt/msg03009.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173

--- Comment #27 from amker at gcc dot gnu.org ---
(In reply to Jiong Wang from comment #24)
> (In reply to amker from comment #23)
> 
> partially agree.
> 
> at least for the single use case given by Seb, I think tree ivopt should do
> it. (I verified clang do ivopt correctly for the case)

LLVM generates correct code, but I am not sure it's because of ivopt.  The dump
after ivopt for llvm is like below:

; Function Attrs: nounwind
define void @bar(i32 %d) #0 {
entry:
  %A = alloca [10 x i8], align 1
  %cmp2 = icmp sgt i32 %d, 0
  br i1 %cmp2, label %while.body.lr.ph, label %while.end

while.body.lr.ph:                                 ; preds = %entry
  %0 = sext i32 %d to i64
  br label %while.body

while.body:                                       ; preds = %while.body.lr.ph,
%while.body
  %indvars.iv = phi i64 [ %0, %while.body.lr.ph ], [ %indvars.iv.next,
%while.body ]
  %indvars.iv.next = add nsw i64 %indvars.iv, -1
  %scevgep = getelementptr i8* %A4, i64 %indvars.iv
  %1 = load i8* %scevgep, align 1, !tbaa !1
  tail call void @foo(i8 %1) #2
  %2 = add i64 %indvars.iv.next, 1
  %cmp = icmp sgt i64 %2, 1
  br i1 %cmp, label %while.body, label %while.end.loopexit

The induction variable chosen is the original biv (d) actually, just like GCC.

So even if we fix the idx_find_step issue, GCC's ivopt still can generate below
codes:

Loop-preheader
  ...
Loop-body:
  iv = phi<d, -1>
  tmp = (POINTER_TYPE)&A;
  foo(MEM[base:tmp, index:iv]);

Without proper RTL optimization, very likely the issue in calculation of base
address of A still exists.

> 
> for the rtl re-associate, it's a little bit painful from my experiment
> experiences. as it's not always good to reassociate virtual_frame + offset,
> we can only benefit if it's in loop, because the re-associate will increase
> register pressure, there will be situations that more callee-saved regs
> used, and finally we run into unncessary push/pop in pro/epilogue... and I
> haven't found a good place where we can safely re-use existed rtl info and
> do the rtl re-association as I am afraid rebuild those rtl info will cause
> compile time penalty.