From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12081 invoked by alias); 18 May 2015 09:14:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 12070 invoked by uid 89); 18 May 2015 09:14:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ob0-f175.google.com Received: from mail-ob0-f175.google.com (HELO mail-ob0-f175.google.com) (209.85.214.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 18 May 2015 09:14:37 +0000 Received: by obcus9 with SMTP id us9so117819641obc.2 for ; Mon, 18 May 2015 02:14:35 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.202.71.84 with SMTP id u81mr17872609oia.81.1431940475475; Mon, 18 May 2015 02:14:35 -0700 (PDT) Received: by 10.76.115.167 with HTTP; Mon, 18 May 2015 02:14:35 -0700 (PDT) In-Reply-To: References: <000001d0897c$57b00d90$071028b0$@arm.com> Date: Mon, 18 May 2015 09:17:00 -0000 Message-ID: Subject: Re: [PATCH PR65447]Improve IV handling by grouping address type uses with same base and step From: Richard Biener To: "Bin.Cheng" Cc: Bin Cheng , GCC Patches Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-05/txt/msg01547.txt.bz2 On Thu, May 14, 2015 at 1:11 PM, Bin.Cheng wrote: > On Wed, May 13, 2015 at 7:38 PM, Richard Biener > wrote: >> On Fri, May 8, 2015 at 12:47 PM, Bin Cheng wrote: >>> Hi, >>> GCC's IVO currently handles every IV use independently, which is not right >>> by learning from cases reported in PR65447. >>> >>> The rationale is: >>> 1) Lots of address type IVs refer to the same memory object, share similar >>> base and have same step. We should handle these IVs as a group in order to >>> maximize CSE opportunities, prefer reg+offset addressing mode. >>> 2) GCC's IVO algorithm is expensive and only is run when candidate set is >>> small enough. By grouping same family uses, we can decrease the number of >>> both uses and candidates. Before this patch, number of candidates for >>> PR65447 is too big to run expensive IVO algorithm, resulting in bad assembly >>> code on targets like AArch64 and Mips. >>> 3) Even for cases the assembly code isn't improved, we can still get >>> compilation time benefit with this patch. >>> 4) This is a prerequisite for enabling auto-increment support in IVO on >>> AArch64. >>> >>> For now, this is only done to address type IVs, in the future I may extend >>> it to general IVs too. >>> >>> For AArch64: >>> Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by >>> this patch. A couple of cases from spec2k/fp appear regressed. I looked >>> into generated assembly code and can confirm the regression is false alarm >>> except one case (189.lucas). For that case, I think it's another issue >>> exposed by this patch (GCC failed to CSE candidate setup code, resulting in >>> bloated loop header). Anyway, I also fined tuned the patch to minimize the >>> impact. >>> >>> For AArch32, this patch seems to be able to improve spec2kfp too, but I >>> didn't look deep into it. I guess the reason is it can make life for >>> auto-increment support in IVO better. >>> >>> One of defects of this patch is computation of max offset in >>> compute_max_addr_offset is basically borrowed from get_address_cost. The >>> comment says we should find a better way to compute all information. People >>> also complained we need to refactor that part of code. I don't have good >>> solution to that yet, though I did try best to keep compute_max_addr_offset >>> simple. >>> >>> I believe this is a generally wanted change, bootstrap and test on x86_64 >>> and AArch64, so is it ok? >> >> I'm a little bit worried about the linked list of sub-uses and the "sorting" >> (that's quadratic). A little. I don't have any good idea but to use a tree... >> We don't seem to limit the number of sub-uses (if we'd do that it would >> become O(1)). >> >> Similar is searching in the list of uses for a group with same base/step >> (but ISTR IVOPTs has multiple similar loops?) > > Hi Richard, > Thanks for reviewing. Instead of tree, I can also keep the linked > list, then quick sort it by using a vector as temporary storage. This > can avoid the complexity of BST operations without non-trivial > overload. Sounds good if constructing & querying are not done at the same time. > For the searching routine, a local hash table could help. >> >> Overall the patch looks like a good improvement to how we do IVO, so I think >> it is ok as-is. > Do you want me to do this change now, or I can pick it up later when > dealing with compilation time issue? Also it would be nice if any > compilation time issue will be reported after applying this version > patch. Because we will have a IVO compilation time benchmark then. You can pick it up later. Richard. > Thanks, > bin > >> >> Thanks, >> Richard. >> >> >>> >>> 2015-05-08 Bin Cheng >>> >>> PR tree-optimization/65447 >>> * tree-ssa-loop-ivopts.c (struct iv_use): New fields. >>> (dump_use, dump_uses): Support to dump sub use. >>> (record_use): New parameters to support sub use. Remove call to >>> dump_use. >>> (record_sub_use, record_group_use): New functions. >>> (compute_max_addr_offset, split_all_small_groups): New functions. >>> (group_address_uses, rewrite_use_address): New functions. >>> (strip_offset): New declaration. >>> (find_interesting_uses_address): Call record_group_use. >>> (add_candidate): New assertion. >>> (infinite_cost_p): Move definition forward. >>> (add_costs): Check INFTY cost and return immediately. >>> (get_computation_cost_at): Clear setup cost and dependent bitmap >>> for sub uses. >>> (determine_use_iv_cost_address): Compute cost for sub uses. >>> (rewrite_use_address_1): Rename from old rewrite_use_address. >>> (free_loop_data): Free sub uses. >>> (tree_ssa_iv_optimize_loop): Call group_address_uses. >>> >>> gcc/testsuite/ChangeLog >>> 2015-05-08 Bin Cheng >>> >>> PR tree-optimization/65447 >>> * gcc.dg/tree-ssa/pr65447.c: New test.