From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 71646 invoked by alias); 12 Oct 2017 11:38:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 71175 invoked by uid 89); 12 Oct 2017 11:38:54 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-io0-f176.google.com Received: from mail-io0-f176.google.com (HELO mail-io0-f176.google.com) (209.85.223.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 12 Oct 2017 11:38:52 +0000 Received: by mail-io0-f176.google.com with SMTP id m16so5163226iod.1 for ; Thu, 12 Oct 2017 04:38:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=soHOWq8rf4clWgTOhZhOcaybE8zBNmPwp3hRMmHHVY8=; b=HwJd1GRjwvr4tBu1Ej0dXikk00ibmNw5OxlmugkzjjTPKSKgDAOwzZsVjRRQI6u0Es 2w7u07vG4sf+MpWGeW+lBQcy4nL4oda5Fjd2fBbUudDq7TqndYON8UM/B37arKWINkin 9MiMTLWq2JYsCy1anCaZSqJxoaD+pqF1m2ksPN8G+2KWzzK5DaYzAKACLrQlSRQuHKsM LZW2gnj0ueQ063KlLLLPz9sRRl//KU4twsDzvQE7YUfG5wbluQjTCszZsHy952LzWxXx Ic03iahD3otYwN1JRLWcLjuC7D32sLltvHKkKiKr/lURPXwvOhDbvXsG2x439QdYEbAR +ixg== X-Gm-Message-State: AMCzsaWrwuHPnJoe870MksU1FS+igiIsPBU0UHlMNLgABbO+JrZhmmS1 8tRdg6JVgTPy5zUmzrluRktedD6+jM/KX3VIJis= X-Google-Smtp-Source: AOwi7QCFqF/3Hws7DedL2jHFuEJSkNFUcPSf0+f8HvN/7Q+a6G+8G5W3AkLsvXIZUDge9L2E92syQkfulSCcCzcNcUE= X-Received: by 10.107.136.68 with SMTP id k65mr2455066iod.143.1507808330519; Thu, 12 Oct 2017 04:38:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.2.86.86 with HTTP; Thu, 12 Oct 2017 04:38:49 -0700 (PDT) In-Reply-To: References: From: "Bin.Cheng" Date: Thu, 12 Oct 2017 11:42:00 -0000 Message-ID: Subject: Re: [PATCH][GRAPHITE] Fix PR82451 (and PR82355 in a different way) To: Richard Biener Cc: gcc-patches List , Sebastian Pop Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2017-10/txt/msg00744.txt.bz2 On Thu, Oct 12, 2017 at 12:13 PM, Richard Biener wrote: > On Thu, 12 Oct 2017, Bin.Cheng wrote: > >> On Wed, Oct 11, 2017 at 3:43 PM, Richard Biener wrote: >> > >> > For PR82355 I introduced a fake dimension to ISL to allow CHRECs >> > having an evolution in a loop that isn't fully part of the SESE >> > region we are processing. That was easier than fending off those >> > CHRECs (without simply giving up on SESE regions with those). >> > >> > But it didn't fully solve the issue as PR82451 shows where we run >> > into the issue that we eventually have to code-gen those >> > evolutions and thus in theory need a canonical IV of that containing loop. >> > >> > So I decided (after Micha pressuring me a bit...) to revisit the >> > original issue and make SCEV analysis "properly" handle SE regions. >> > It turns out that it is mostly instantiate_scev lacking proper support >> > plus the necessary interfacing change (really just cosmetic in some sense) >> > from a instantiate_before basic-block to a instantiate_before edge. >> > >> > data-ref interfaces have been similarly adjusted, here changing >> > the "loop nest" loop parameter to the entry edge for the SE region >> > and passing that down accordingly. >> > >> > I've for now tried to keep other high-level loop-based interfaces the >> > same by simply using the loop preheader edge as entry where appropriate >> > (needing loop_preheader_edge cope with the loop root tree for simplicity). >> > >> > In the process I ran into issues with us too overly aggressive >> > instantiating random trees and thus I cut those down. That part >> > doesn't successfully test separately (when I remove the strange >> > ARRAY_REF instantiation), so it's part of this patch. I've also >> > run into an SSA verification fail (the id-27.f90 testcase) which >> > shows we _do_ need to clear the SCEV cache after introducing >> > the versioned CFG (and added a comment before it). >> > >> > On the previously failing testcases I've verified we produce >> > sensible instantiations for those pesky refs residing in "no" loop >> > in the SCOP and that we get away with the result in terms of >> > optimizing. >> > >> > SPEC 2k6 testing shows >> > >> > loop nest optimized: 311 >> > loop nest not optimized, code generation error: 0 >> > loop nest not optimized, optimized schedule is identical to original >> > schedule: 173 >> > loop nest not optimized, optimization timed out: 59 >> > loop nest not optimized, ISL signalled an error: 9 >> > loop nest: 552 >> > >> > for SPEC 2k6 and -floop-nest-optimize while adding -fgraphite-identity >> > still reveals some codegen errors: >> > >> > loop nest optimized: 437 >> > loop nest not optimized, code generation error: 25 >> > loop nest not optimized, optimized schedule is identical to original >> > schedule: 169 >> > loop nest not optimized, optimization timed out: 60 >> > loop nest not optimized, ISL signalled an error: 9 >> > loop nest: 700 >> > >> > Bootstrap and regtest in progress on x86_64-unknown-linux-gnu >> > (with and without -fgraphite-identity -floop-nest-optimize). >> > >> > Ok? >> > >> > Thanks, >> > Richard. >> > >> >> > Index: gcc/tree-scalar-evolution.c >> > =================================================================== >> > --- gcc/tree-scalar-evolution.c (revision 253645) >> > +++ gcc/tree-scalar-evolution.c (working copy) >> > @@ -2344,7 +2348,7 @@ static tree instantiate_scev_r (basic_bl >> > instantiated, and to stop if it exceeds some limit. */ >> > >> > static tree >> > -instantiate_scev_name (basic_block instantiate_below, >> > +instantiate_scev_name (edge instantiate_below, >> > struct loop *evolution_loop, struct loop *inner_loop, >> > tree chrec, >> > bool *fold_conversions, >> > @@ -2358,7 +2362,7 @@ instantiate_scev_name (basic_block insta >> > evolutions in outer loops), nothing to do. */ >> > if (!def_bb >> > || loop_depth (def_bb->loop_father) == 0 >> > - || dominated_by_p (CDI_DOMINATORS, instantiate_below, def_bb)) >> > + || ! dominated_by_p (CDI_DOMINATORS, def_bb, instantiate_below->dest)) >> > return chrec; >> > >> > /* We cache the value of instantiated variable to avoid exponential >> > @@ -2380,6 +2384,51 @@ instantiate_scev_name (basic_block insta >> > >> > def_loop = find_common_loop (evolution_loop, def_bb->loop_father); >> > >> > + if (! dominated_by_p (CDI_DOMINATORS, >> > + def_loop->header, instantiate_below->dest)) >> > + { >> > + gimple *def = SSA_NAME_DEF_STMT (chrec); >> > + if (gassign *ass = dyn_cast (def)) >> > + { >> > + switch (gimple_assign_rhs_class (ass)) >> > + { >> > + case GIMPLE_UNARY_RHS: >> > + { >> > + tree op0 = instantiate_scev_r (instantiate_below, evolution_loop, >> > + inner_loop, gimple_assign_rhs1 (ass), >> > + fold_conversions, size_expr); >> > + if (op0 == chrec_dont_know) >> > + return chrec_dont_know; >> > + res = fold_build1 (gimple_assign_rhs_code (ass), >> > + TREE_TYPE (chrec), op0); >> > + break; >> > + } >> > + case GIMPLE_BINARY_RHS: >> > + { >> > + tree op0 = instantiate_scev_r (instantiate_below, evolution_loop, >> > + inner_loop, gimple_assign_rhs1 (ass), >> > + fold_conversions, size_expr); >> > + if (op0 == chrec_dont_know) >> > + return chrec_dont_know; >> > + tree op1 = instantiate_scev_r (instantiate_below, evolution_loop, >> > + inner_loop, gimple_assign_rhs2 (ass), >> > + fold_conversions, size_expr); >> > + if (op1 == chrec_dont_know) >> > + return chrec_dont_know; >> > + res = fold_build2 (gimple_assign_rhs_code (ass), >> > + TREE_TYPE (chrec), op0, op1); >> > + break; >> > + } >> > + default: >> > + res = chrec_dont_know; >> > + } >> > + } >> > + else >> > + res = chrec_dont_know; >> > + global_cache->set (si, res); >> > + return res; >> > + } >> > + >> > /* If the analysis yields a parametric chrec, instantiate the >> > result again. */ >> > res = analyze_scalar_evolution (def_loop, chrec); >> >> IIUC, after changing instantiate_scev_r from loop based to region >> based, there are >> two cases. In one case, def_loop is dominated by instantiate_edge, >> which we'd like >> to analyze/instantiate scev wrto the new def_loop; In the other case, >> def_loop is not >> fully part of sese region, which we'd like to expand ssa_name wrto the >> basic block >> instantiate_edge->dest. It's simply ssa expanding and no loop is involved. > > Note there can be still a loop involved if the SSA chain arrives at > a DEF that is defined within a loop inbetween the current def and > instantiate_edge->dest. In that case we need to process the > compute_overall_effect_of_inner_loop case. > >> So how about factor out above big if-statement into a function with name like >> expand_scev_name (Other better names?). The code like: >> >> /* Some comment explaining the two cases in region based instantiation. */ >> if (dominated_by_p (CDI_DOMINATORS, def_loop->header, >> instantiate_below->dest)) >> res = analyze_scalar_evolution (def_loop, chrec); >> else >> res = expand_scev_name (instantiate_below, chrec); >> >> could be easier to read? > > Note it would be > > else > { > res = expand_scev_name (instantiate_below, chrec); > global_cache->set (si, res); > return res; > } > > and expand_scev_name would still need to recurse via instantiate_scev. > It isn't merely gathering all expressions up to instantiate_edge->dest > from the stmts (see above). I was thinking only do expand here and have last piece of code in function instantiate_scev_name do the recursive call: else if (res != chrec_dont_know) { if (inner_loop && def_bb->loop_father != inner_loop && !flow_loop_nested_p (def_bb->loop_father, inner_loop)) /* ??? We could try to compute the overall effect of the loop here. */ res = chrec_dont_know; else res = instantiate_scev_r (instantiate_below, evolution_loop, inner_loop, res, fold_conversions, size_expr); } Not sure if it's feasible. We might need to stop expansion at either instantiate_edge or at in between loop. Thanks, bin > > But yes, factoring the above might be a good idea. The above is also > not 100% equivalent in capabilities as the rest of the instantiation > machinery -- it lacks condition PHI handling and it fails to capture > the loop-closed PHI handling (so the above mentioned case wouldn't > be handled right now). > > I sent the patch out for comments early before fleshing out all > those pesky details ;) It's at least conservative correct > right now. > > Richard.