From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27462 invoked by alias); 9 Sep 2013 00:15:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 27449 invoked by uid 89); 9 Sep 2013 00:15:56 -0000 Received: from e28smtp05.in.ibm.com (HELO e28smtp05.in.ibm.com) (122.248.162.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Mon, 09 Sep 2013 00:15:56 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.2 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e28smtp05.in.ibm.com Received: from /spool/local by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 9 Sep 2013 05:38:52 +0530 Received: from d28dlp03.in.ibm.com (9.184.220.128) by e28smtp05.in.ibm.com (192.168.1.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 9 Sep 2013 05:38:51 +0530 Received: from d28relay04.in.ibm.com (d28relay04.in.ibm.com [9.184.220.61]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id C7AA61258052 for ; Mon, 9 Sep 2013 05:45:44 +0530 (IST) Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay04.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r890FefW42729510 for ; Mon, 9 Sep 2013 05:45:41 +0530 Received: from d28av05.in.ibm.com (localhost [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id r890FgJg008288 for ; Mon, 9 Sep 2013 05:45:42 +0530 Received: from [9.80.14.61] ([9.80.14.61]) by d28av05.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id r890FdLK008219; Mon, 9 Sep 2013 05:45:40 +0530 Message-ID: <1378685738.3730.16.camel@gnopaine> Subject: Re: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction From: Bill Schmidt To: Richard Biener Cc: "bin.cheng" , GCC Patches Date: Mon, 09 Sep 2013 02:01:00 -0000 In-Reply-To: References: <003f01cea7a9$8e984ae0$abc8e0a0$@arm.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13090900-8256-0000-0000-00000923A4C4 X-IsSubscribed: yes X-SW-Source: 2013-09/txt/msg00538.txt.bz2 On Mon, 2013-09-02 at 11:15 +0200, Richard Biener wrote: > On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng wrote: > > Hi, > > > > The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find > > different MEM_REFs sharing common part in addressing expression. If such > > MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient > > addressing expression during the RTL passes. > > The pass analyzes addressing expression in each MEM_REF to see if it can be > > formalized as follows: > > base: MEM_REF (T1, C1) > > offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) > > bitpos: C4 * BITS_PER_UNIT > > Then restructures it into below form: > > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), > > C1 + (C2 * C3) + C4) > > At last, rewrite the MEM_REFs if there are two or more sharing common > > (non-constant) part. > > The problem is it doesn't back trace T2. If T2 is recorded as a CAND_ADD in > > form of "T2' + C5", the MEM_REF should be restructure into: > > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)), > > C1 + (C2 * C3) + C4 + (C5 * C3)) > > > > The patch also includes a test case to illustrate the problem. > > > > Bootstrapped and tested on x86/x86_64/arm-a15, is it ok? > > This looks ok to me if Bill is ok with it. This is a good generalization and I'm fine with it. There are a few minor nits that should be corrected, outlined below. > > Thanks, > Richard. > > > Thanks. > > bin > > > > 2013-09-02 Bin Cheng > > > > * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New. > > (restructure_reference): Call backtrace_base_for_ref. > > > > gcc/testsuite/ChangeLog > > 2013-09-02 Bin Cheng > > > > * gcc.dg/tree-ssa/slsr-39.c: New test. > >>Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c >>=================================================================== >>--- >>gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c >>(revision 0) >>+++ >>gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c >>(revision 0) >>@@ -0,0 +1,26 @@ >>+/* Verify straight-line strength reduction for back-tracing >>+ CADN_ADD for T2 in: CAND_ADD >>+ >>+ *PBASE: T1 >>+ *POFFSET: MULT_EXPR (T2, C3) >>+ *PINDEX: C1 + (C2 * C3) + C4 */ >>+ >>+/* { dg-do compile } */ >>+/* { dg-options "-O2 -fdump-tree-slsr" } */ >>+ >>+typedef int arr_2[50][50]; >>+ >>+void foo (arr_2 a2, int v1) >>+{ >>+ int i, j; >>+ >>+ i = v1 + 5; >>+ j = i; >>+ a2 [i] [j++] = i; >>+ a2 [i] [j++] = i; >>+ a2 [i] [i-1] += 1; >>+ return; >>+} >>+ >>+/* { dg-final { scan-tree-dump-times "MEM" 4 "slsr" } } */ >>+/* { dg-final { cleanup-tree-dump "slsr" } } */ >>Index: gcc/gimple-ssa-strength-reduction.c >>=================================================================== >>--- >>gcc/gimple-ssa-strength-reduction.c >>(revision 202067) >>+++ >>gcc/gimple-ssa-strength-reduction.c >>(working copy) >>@@ -750,6 +750,57 @@ slsr_process_phi (gimple phi, bool speed) >> add_cand_for_stmt (phi, c); >> } >> >>+/* Given PBASE which is a pointer to tree, loop up the defining look up >>+ statement for it and check whether the candidate is in the >>+ form of: >>+ >>+ X = B + (1 * S), S is integer constant >>+ X = B + (i * S), S is integer one >>+ >>+ If so, set PBASE to the candiate's base_expr and return double candidate's >>+ int (i * S). >>+ Otherwise, just return double int zero. */ This is sufficient, since you are properly checking the next_interp chain. Another possible form would be X = (B + i) * 1, but if this is present, then one of the forms you're checking for should also be present, so there's no need to check the MULT_CANDs. >>+ >>+static double_int >>+backtrace_base_for_ref (tree *pbase) >>+{ >>+ tree base_in = *pbase; >>+ slsr_cand_t base_cand; >>+ >>+ STRIP_NOPS (base_in); >>+ if (TREE_CODE (base_in) != SSA_NAME) >>+ return tree_to_double_int (integer_zero_node); >>+ >>+ base_cand = base_cand_from_table (base_in); >>+ >>+ while (base_cand && base_cand->kind != CAND_PHI) >>+ { >>+ if (base_cand->kind == CAND_ADD >>+ && base_cand->index.is_one () >>+ && TREE_CODE (base_cand->stride) == INTEGER_CST) >>+ { >>+ /* X = B + (1 * S), S is integer constant. */ >>+ *pbase = base_cand->base_expr; >>+ return tree_to_double_int (base_cand->stride); >>+ } >>+ else if (base_cand->kind == CAND_ADD >>+ && TREE_CODE (base_cand->stride) == INTEGER_CST >>+ && integer_onep (base_cand->stride)) >>+ { >>+ /* X = B + (i * S), S is integer one. */ >>+ *pbase = base_cand->base_expr; >>+ return base_cand->index; >>+ } >>+ >>+ if (base_cand->next_interp) >>+ base_cand = lookup_cand (base_cand->next_interp); >>+ else >>+ base_cand = NULL; >>+ } >>+ >>+ return tree_to_double_int (integer_zero_node); >>+} >>+ >> /* Look for the following pattern: >> >> *PBASE: MEM_REF (T1, C1) >>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) >> >> *PBASE: T1 >> *POFFSET: MULT_EXPR (T2, C3) >>- *PINDEX: C1 + (C2 * C3) + C4 */ >>+ *PINDEX: C1 + (C2 * C3) + C4 >> >>+ When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It ^ ^ a it >>+ will be further restructured to: >>+ >>+ *PBASE: T1 >>+ *POFFSET: MULT_EXPR (T2', C3) >>+ *PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ >>+ >> static bool >> restructure_reference (tree *pbase, tree *poffset, double_int *pindex, >> tree *ptype) >>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, >> double_int index = *pindex; >> double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); >> tree mult_op0, mult_op1, t1, t2, type; >>- double_int c1, c2, c3, c4; >>+ double_int c1, c2, c3, c4, c5; >> >> if (!base >> || !offset >>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree *poffset, >> } >> >> c4 = index.udiv (bpu, FLOOR_DIV_EXPR); >>+ c5 = backtrace_base_for_ref (&t2); >> >> *pbase = t1; >>- *poffset = fold_build2 (MULT_EXPR, sizetype, t2, >>- double_int_to_tree (sizetype, c3)); >>- *pindex = c1 + c2 * c3 + c4; >>+ *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2), >>+ double_int_to_tree (sizetype, c3)); I am not sure why you changed this call. fold_build2 is a more efficient call than size_binop. size_binop makes several checks that will fail in this case, and then calls fold_build2_loc, right? Not a big deal but seems like changing it back would be better. Perhaps I'm missing something (as usual ;). Thanks, Bill >>+ *pindex = c1 + c2 * c3 + c4 + c5 * c3; >> *ptype = type; >> >> return true;