From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by sourceware.org (Postfix) with ESMTPS id 4379B3857C53 for ; Mon, 10 Aug 2020 12:38:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4379B3857C53 Received: by mail-ej1-x62c.google.com with SMTP id a26so9211918ejc.2 for ; Mon, 10 Aug 2020 05:38:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=oeiEOkNomwMUJYOF5X+YMRC7ETmjCJ7VxdouE9AZy3c=; b=KiWzuoRJ8FESB0uDnXD8Iix5GaIKD0wlIUyYLzXjIG99EHDXI2BjiILCRX8ZsdLfuH IFsKxfPD/wGQvkqOQpG8R/aFyIQRmI4L6rvGBqxcNZT2QEAGEYt1rcf3rUb4RIGw/Jqm wg2fSzBFE59F33M7hKpYXF0Zv1D/VqKk06UflQCY53Qa0urNc9B9Vvp6MW9a4ohwuorD N/Vata6/XNxvQa/ZwJmGfU3I2J75TUcFIpSqxQJwm8gqgXYPb6hGf/0Id1nEEOOX2k66 N6+Bd9l93H4dwXY2yGrvGRoTlWA3Cx6dvSKdl3gNc/lVMTd81RBV7VufTIO3Yb0G5vRw xUSg== X-Gm-Message-State: AOAM531CNv7TV1hqJMxdgXTjZw81IV9Qg4p0PtVWXtC7kO0GanCZPtwq RZXipW7nQIPzTOMD7onD7AG8Ui4KazCfv04u6WE= X-Google-Smtp-Source: ABdhPJxnhs7ls16qiYi0CHv2owLqnWWBplPaI6dC/yQh3ujtotWst0g5C6QSZTDblCWrznnEZnCc/1iGZbDh6HjrzkQ= X-Received: by 2002:a17:906:401b:: with SMTP id v27mr21168815ejj.300.1597063122082; Mon, 10 Aug 2020 05:38:42 -0700 (PDT) MIME-Version: 1.0 References: <16f8ae40-c0ae-dd57-346e-f9cacea55038@linux.ibm.com> In-Reply-To: From: "Bin.Cheng" Date: Mon, 10 Aug 2020 20:38:31 +0800 Message-ID: Subject: Re: [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling To: "Kewen.Lin" Cc: GCC Patches , "bin.cheng" , Richard Guenther , Bill Schmidt , Segher Boessenkool , Richard Sandiford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Aug 2020 12:38:45 -0000 On Mon, Aug 10, 2020 at 12:27 PM Kewen.Lin wrote: > > Hi Bin, > > Thanks for the review!! > > on 2020/8/8 =E4=B8=8B=E5=8D=884:01, Bin.Cheng wrote: > > Hi Kewen, > > Sorry for the late reply. > > The patch's most important change is below cost computation: > > > >> @@ -5890,6 +5973,10 @@ determine_iv_cost (struct ivopts_data *data, st= ruct iv_cand *cand) > >> cost_step =3D add_cost (data->speed, TYPE_MODE (TREE_TYPE (base)))= ; > >> cost =3D cost_step + adjust_setup_cost (data, cost_base.cost); > >> > >> + /* Consider additional step updates during unrolling. */ > >> + if (data->consider_reg_offset_for_unroll_p && !cand->reg_offset_p) > >> + cost +=3D (data->current_loop->estimated_unroll - 1) * cost_step; > > This is a bit strange, to me the add instructions are additional > > computation caused by unrolling+addressing_mode, rather than a native > > part in candidate itself. Specifically, an additional cost is needed > > if candidates (without reg_offset_p) are chosen for the address type > > group/uses. > > Good point, ideally it should be one additional cost for each cand set, > when we select one cand for one group, we need to check this pair need > more (estimated_unroll - 1) step costs, we probably need to care about > this during remove/replace etc. IIUC the current IVOPTs cost framework > doesn't support this and it could increase the selection complexity and > time. I hesitated to do it and put it to cand step cost initially instea= d. > > I was thinking those candidates with reg_offset_p should be only used for > those reg_offset_p groups in most cases (very limited) meanwhile the othe= rs > are simply scaled up like before. But indeed this can cover some similar > cases like one cand is only used for the compare type group which is for > loop closing, then it doesn't need more step costs for unrolling. > > Do you prefer me to improve the current cost framework? No, I don't think it's relevant to the candidate selecting algorithm. I was thinking about adjusting cost somehow in determine_group_iv_cost_address. Given we don't expose selected addressing mode in this function, you may need to do it in get_address_cost, either way. > > >> + > >> /* Prefer the original ivs unless we may gain something by replacing= it. > >> The reason is to make debugging simpler; so this is not relevant = for > >> artificial ivs created by other optimization passes. */ > >> > > > >> @@ -3654,6 +3729,14 @@ set_group_iv_cost (struct ivopts_data *data, > >> return; > >> } > >> > >> + /* Since we priced more on non reg_offset IV cand step cost, we sho= uld scale > >> + up the appropriate IV group costs. Simply consider USE_COMPARE = at the > >> + loop exit, FIXME if multiple exits supported or no loop exit com= parisons > >> + matter. */ > >> + if (data->consider_reg_offset_for_unroll_p > >> + && group->vuses[0]->type !=3D USE_COMPARE) > >> + cost *=3D (HOST_WIDE_INT) data->current_loop->estimated_unroll; > > Not quite follow here, giving "pricing more on on-reg_offset IV cand" > > doesn't make much sense to me. Also why generic type uses are not > > skipped? We want to model the cost required for address computation, > > however, for generic type uses there is no way to save the computation > > in "address expression". Once unrolled, the computation is always > > there? > > > > The main intention is to scale up the group/cand cost for unrolling since > we have scaled up the step costs. The assumption is that the original If we adjust cost appropriately in function *group_iv_cost_address, this would become unnecessary, right? And naturally. > costing (without this patch) can be viewed as either for all unrolled > iterations or just one single iteration. Since IVOPTs doesn't support > fractional costing, I interpreted it as single iterations, to emulate > unrolling scenario based on the previous step cost scaling, we need to > scale up the cost for all computation. > > In most cases, the compare type use is for loop closing, there is still > only one computation even unrolling happens, so I excluded it here. > As "FIXME", if we find some cases are off, we can further restrict it to > those USE_COMPARE uses which is exactly for loop closing. > > > And what's the impact on targets supporting [base + index + offset] > > addressing mode? > > Good question, I didn't notice it since power doesn't support it. > I noticed the comments of function addr_offset_valid_p only mentioning > [base + offset], I guess it excludes [base + index + offset]? > But I guess the address-based IV can work for this mode? No, addr_offset_valid_p is only used to split address use groups. See get_address_cost and struct mem_address. I forgot to ask, what about target only supports [base + offset] addressing mode like RISC-V? I would expect it's not affected at all. > > > > > Given the patch is not likely to harm because rtl loop unrolling is > > (or was?) by default disabled, so I am OK once the above comments are > > addressed. > > > > Yes, it needs explicit unrolling options, excepting for some targets > wants to enable it for some cases with specific loop_unroll_adjust checks= . > > > I wonder if it's possible to get 10% of (all which should be unrolled) > > loops unrolled (conservatively) on gimple and enable it by default at > > O3, rather than teaching ivopts to model a future pass which not > > likely be used outside of benchmarks? > > > > Yeah, it would be nice if the unrolling happen before IVOPTs and won't > have any future unrollings to get it off. PR[1] seems to have some > discussion on gimple unrolling. Thanks for directing me to the discussion. I am on Wilco's side on this problem, IMHO, It might be useful getting small loops unrolled (at O3 by default) by simply saving induction variable stepping and exit condition check, which can be modeled on gimple. Especially for RISC-V, it doesn't support index addressing mode, which means there will be as many induction variables as distinct arrays. Also interleaving after unrolling is not that important, it's at high chance that small loops eligible for interleaving are handled by vectorizer already. Thanks, bin > > Richi suggested driven-by-need gimple unrolling in the previous discussio= n[2] > on the RFC of this. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D88760 > [2] https://gcc.gnu.org/pipermail/gcc-patches/2020-January/537645.html > > BR, > Kewen