From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30072 invoked by alias); 28 May 2015 01:33:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 30062 invoked by uid 89); 28 May 2015 01:33:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-oi0-f48.google.com Received: from mail-oi0-f48.google.com (HELO mail-oi0-f48.google.com) (209.85.218.48) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 28 May 2015 01:33:17 +0000 Received: by oifu123 with SMTP id u123so21464157oif.1 for ; Wed, 27 May 2015 18:33:15 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.202.80.10 with SMTP id e10mr232022oib.76.1432776795733; Wed, 27 May 2015 18:33:15 -0700 (PDT) Received: by 10.76.134.100 with HTTP; Wed, 27 May 2015 18:33:15 -0700 (PDT) In-Reply-To: <5565E654.6010702@arm.com> References: <000001d0897c$57b00d90$071028b0$@arm.com> <5565E654.6010702@arm.com> Date: Thu, 28 May 2015 07:15:00 -0000 Message-ID: Subject: Re: [PATCH PR65447]Improve IV handling by grouping address type uses with same base and step From: "Bin.Cheng" To: Kyrill Tkachov Cc: Bin Cheng , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-05/txt/msg02590.txt.bz2 On Wed, May 27, 2015 at 11:44 PM, Kyrill Tkachov wrote: > Hi Bin, > > > On 08/05/15 11:47, Bin Cheng wrote: >> >> Hi, >> GCC's IVO currently handles every IV use independently, which is not right >> by learning from cases reported in PR65447. >> >> The rationale is: >> 1) Lots of address type IVs refer to the same memory object, share similar >> base and have same step. We should handle these IVs as a group in order >> to >> maximize CSE opportunities, prefer reg+offset addressing mode. >> 2) GCC's IVO algorithm is expensive and only is run when candidate set is >> small enough. By grouping same family uses, we can decrease the number of >> both uses and candidates. Before this patch, number of candidates for >> PR65447 is too big to run expensive IVO algorithm, resulting in bad >> assembly >> code on targets like AArch64 and Mips. >> 3) Even for cases the assembly code isn't improved, we can still get >> compilation time benefit with this patch. >> 4) This is a prerequisite for enabling auto-increment support in IVO on >> AArch64. >> >> For now, this is only done to address type IVs, in the future I may extend >> it to general IVs too. >> >> For AArch64: >> Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by >> this patch. A couple of cases from spec2k/fp appear regressed. I looked >> into generated assembly code and can confirm the regression is false alarm >> except one case (189.lucas). For that case, I think it's another issue >> exposed by this patch (GCC failed to CSE candidate setup code, resulting >> in >> bloated loop header). Anyway, I also fined tuned the patch to minimize >> the >> impact. >> >> For AArch32, this patch seems to be able to improve spec2kfp too, but I >> didn't look deep into it. I guess the reason is it can make life for >> auto-increment support in IVO better. >> >> One of defects of this patch is computation of max offset in >> compute_max_addr_offset is basically borrowed from get_address_cost. The >> comment says we should find a better way to compute all information. >> People >> also complained we need to refactor that part of code. I don't have good >> solution to that yet, though I did try best to keep >> compute_max_addr_offset >> simple. >> >> I believe this is a generally wanted change, bootstrap and test on x86_64 >> and AArch64, so is it ok? >> >> >> 2015-05-08 Bin Cheng >> >> PR tree-optimization/65447 >> * tree-ssa-loop-ivopts.c (struct iv_use): New fields. >> (dump_use, dump_uses): Support to dump sub use. >> (record_use): New parameters to support sub use. Remove call to >> dump_use. >> (record_sub_use, record_group_use): New functions. >> (compute_max_addr_offset, split_all_small_groups): New functions. >> (group_address_uses, rewrite_use_address): New functions. >> (strip_offset): New declaration. >> (find_interesting_uses_address): Call record_group_use. >> (add_candidate): New assertion. >> (infinite_cost_p): Move definition forward. >> (add_costs): Check INFTY cost and return immediately. >> (get_computation_cost_at): Clear setup cost and dependent bitmap >> for sub uses. >> (determine_use_iv_cost_address): Compute cost for sub uses. >> (rewrite_use_address_1): Rename from old rewrite_use_address. >> (free_loop_data): Free sub uses. >> (tree_ssa_iv_optimize_loop): Call group_address_uses. >> >> gcc/testsuite/ChangeLog >> 2015-05-08 Bin Cheng >> >> PR tree-optimization/65447 >> * gcc.dg/tree-ssa/pr65447.c: New test. > > > I see this test failing on arm-none-eabi with a compiler at r223737. > My configure options are: --enable-checking=yes --with-newlib > --with-fpu=neon-fp-armv8 --with-arch=armv8-a --without-isl Hi Kyrill, Thank you for reporting this. I will have a look. Thanks, bin > > Kyrill > >