From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 4B2D83858420 for ; Wed, 22 Sep 2021 22:38:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4B2D83858420 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 18MMaxx2022677; Wed, 22 Sep 2021 17:36:59 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 18MMavad022672; Wed, 22 Sep 2021 17:36:57 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Wed, 22 Sep 2021 17:36:57 -0500 From: Segher Boessenkool To: "Kewen.Lin" Cc: GCC Patches , Bill Schmidt , David Edelsohn Subject: Re: [PATCH] rs6000: Modify the way for extra penalized cost Message-ID: <20210922223657.GE1583@gate.crashing.org> References: <20210917220158.GW1583@gate.crashing.org> <5a152e8d-908b-4971-46c6-721418405199@linux.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5a152e8d-908b-4971-46c6-721418405199@linux.ibm.com> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2021 22:38:01 -0000 Hi! On Tue, Sep 21, 2021 at 11:24:08AM +0800, Kewen.Lin wrote: > on 2021/9/18 上午6:01, Segher Boessenkool wrote: > > On Thu, Sep 16, 2021 at 09:14:15AM +0800, Kewen.Lin wrote: > >> The way with nunits * stmt_cost can get one much exaggerated > >> penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320, > >> that's why we need one bound. To make it scale, this patch > >> doesn't use nunits * stmt_cost any more, but it still keeps > >> nunits since there are actually nunits scalar loads there. So > >> it uses one cost adjusted from stmt_cost, since the current > >> stmt_cost sort of considers nunits, we can stablize the cost > >> for big nunits and retain the cost for small nunits. After > >> some tries, this patch gets the adjusted cost as: > >> > >> stmt_cost / (log2(nunits) * log2(nunits)) > > > > So for V16QI it gives *16/(4*4) so *1 > > V8HI it gives *8/(3*3) so *8/9 > > V4SI it gives *4/(2*2) so *1 > > V2DI it gives *2/(1*1) so *2 > > and for V1TI it gives *1/(0*0) which is UB (no, does not crash for us, > > just gives wildly wrong answers; the div returns 0 on recent systems). > > I don't expected we will have V1TI for strided/elementwise load, > if it's one unit vector, it's the whole vector itself. > Besides, the below assertion should exclude it already. Yes. But ignoring the UB for unexpectedly large vector components, the 1 / 1.111 / 1 / 2 scoring does not make much sense. The formulas "look" smooth and even sort of reasonable, but as soon as you look at what it *means*, and realise the domain if the function is discrete (only four or five possible inputs), and then see how the function behaves on that... Hrm :-) > > This of course is assuming nunits will always be a power of 2, but I'm > > sure that we have many other places in the compiler assuming that > > already, so that is fine. And if one day this stops being true we will > > get a nice ICE, pretty much the best we could hope for. > > Yeah, exact_log2 returns -1 for non power of 2 input, for example: Exactly. > >> + unsigned int adjusted_cost = stmt_cost / nunits_sq; > > > > But this can divide by 0. Or are we somehow guaranteed that nunits > > will never be 1? Yes the log2 check above, sure, but that ICEs if this > > is violated; is there anything that actually guarantees it is true? > > As I mentioned above, I don't expect we can have nunits 1 strided/ew load, > and the ICE should check this and ensure dividing by zero never happens. :) Can you assert that *directly* then please? > > A magic crazy formula like this is no good. If you want to make the > > cost of everything but V2D* be the same, and that of V2D* be twice that, > > that is a weird heuristic, but we can live with that perhaps. But that > > beats completely unexplained (and unexplainable) magic! > > > > Sorry. > > That's all right, thanks for the comments! let's improve it. :) I like that spirit :-) > How about just assigning 2 for V2DI and 1 for the others for the > penalized_cost_per_load with some detailed commentary, it should have > the same effect with this "magic crazy formula", but I guess it can > be more clear. That is fine yes! (Well, V2DF the same I guess? Or you'll need very detailed commentary :-) ) It is fine to say "this is just a heuristic without much supporting theory" in places. That is what most of our --param= are as well, for example. If counting two-element vectors as twice as expensive as all other vectors helps performance, then so be it: if there is no better way to cost things (or we do not know one), then what else are we to do? Segher