From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14364 invoked by alias); 23 Jan 2015 17:33:38 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 13881 invoked by uid 48); 23 Jan 2015 17:33:08 -0000 From: "jiwang at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/62173] [5.0 regression] 64bit Arch can't ivopt while 32bit Arch can Date: Fri, 23 Jan 2015 17:33:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: jiwang at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: jiwang at gcc dot gnu.org X-Bugzilla-Target-Milestone: 5.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-01/txt/msg02604.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173 --- Comment #16 from Jiong Wang --- After some work on this issue, things have gone beyond my expectations. To summarize my understanding of this issue and what I have got: Reason of regression ====== the testcase contains a loop which is hotcode, and there is one critical loop invariant, actually base address of local array, if it can't be ivopted then there will be big performance regression on this testcase. because of one gcc tree level ivopt issue, this variable will not be ivopted on 64bit machine, mips, powerpc, aarch64 etc. mostly because there some type precision issues in gcc tree level code that gcc is doing too conservative decision on overflow checking and thus abort the ivopt in early stage. For details, please see above discussion. Then why there is regression on AArch64 since r213488 "[AArch64] Improve TARGET_LEGITIMIZE_ADDRESS_P hook" ? it's because except tree level ivopt pass, gcc also do extra ivopt on rtl level by the rtl pre pass. previously, we have inefficient implementation on aarch64 TARGET_LEGITIMIZE_ADDRESS_P while this inefficient implementation happen to generated some instruction pattern that rtl level ivopt could catch and that critical variable finally ivopted by rtl pass. r213488 fixed the inefficiency of TARGET_LEGITIMIZE_ADDRESS_P, and as a side-effect, we will not generate that specific instruction pattern again, thus no ivopt on rtl pre, and thus regression happen. before r213488, ivopt happen on AArch64 only, while after r213488, AArch64 act the same as all other 64bit targets, mips64, powerpc64, sparc64 etc. To Fix ====== * the idea way is fix this issue on tree-ssa-loop-ivopt pass. We need to correct type precision & overflow checking issue, then gcc could ivopt the critical variable as early as in tree level for all 64bit targets. * gcc rtl level ivopt could be improved also. we could re-shuffle rtl pattern, then rtl pre pass could catch more ivopt opportunities and act as a supplement to tree ivopt. What I have tried ================= I have tried to fix this issue on rtl level. We need two patches. * one patch to correct aarch64 TARGET_LEGITIMIZE_ADDRESS hook to generate more efficient rtl pattern for array indexed address. * one patch to re-shuffle rtl pattern to let rtl pre pass catch more ivopt opportunities. For targets except AArch64, patch 2 alone is enough to let ivopt happen in rtl pre pass. While for AArch64, we need to correct TARGET_LEGITIMIZE_ADDRESS first, to don't split valid "offset" into "magic-base + offset" which make the rtl sequences unnecessarily complexer and cause troubles for later rtl pass. but this fix on AArch64 TARGET_LEGITIMIZE_ADDRESS hook then exposed another issue on tree-ssa-loop-ivopt. When running benchmarks, I found some post-index addressing opportunities are missing after we correct TARGET_LEGITIMIZE_ADDRESS. The correct of TARGET_LEGITIMIZE_ADDRESS will actually correct address cost which tree-ssa-loop-ivopt will query, and looks like there are glitches in tree-ssa-loop-ivopt cost model that gcc ivopt some simple post-index addressable variable, and thus generated sub-optimal code. And the patch for re-shuffle rtl pattern alone, https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00355.html, has critical problem. It's unsafe to re-use those REG_NOTE info in those places. So the patch needs re-work, or we need to investigate a better solution to fix this on rtl level. Future work =========== I hope someone could have a looks at the tree level fix which is the ideal fix. For RTL level fix, now the issue dependent chain for my solution becomes: fix-on-rtl-level \ v re-shuffle insn for rtl pre \ v fix AArch64 TARGET_LEGITIMIZE_ADDRESS (B) \ v fix tree-ssa-loop-ivopt cost model to make better decision between ivopt and post-index. (A) I think task A and B are important as they are quite general improvements to AArch64 and gcc generic code besides this testcase issue. I'd prepare a testcase and create a seperate ticket for task A first. And, I don't think we can fix this issue in 5.0 release, so move the target to the next release? Thanks.