From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <segher@kernel.crashing.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
 by sourceware.org (Postfix) with ESMTP id A8E353858C2C
 for <gcc-patches@gcc.gnu.org>; Thu,  9 Sep 2021 18:25:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A8E353858C2C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=kernel.crashing.org
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
 by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 189IOji0026777;
 Thu, 9 Sep 2021 13:24:45 -0500
Received: (from segher@localhost)
 by gate.crashing.org (8.14.1/8.14.1/Submit) id 189IOiwR026776;
 Thu, 9 Sep 2021 13:24:44 -0500
X-Authentication-Warning: gate.crashing.org: segher set sender to
 segher@kernel.crashing.org using -f
Date: Thu, 9 Sep 2021 13:24:44 -0500
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Bill Schmidt <wschmidt@linux.ibm.com>
Cc: "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>,
 will schmidt <will_schmidt@vnet.ibm.com>,
 GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH v4] rs6000: Add load density heuristic
Message-ID: <20210909182444.GT1583@gate.crashing.org>
References: <7b9f9bdf-1ed5-139b-de9c-511ee8454b85@linux.ibm.com>
 <3424a3d3-fa4e-16f9-89c6-0b07beec957d@linux.ibm.com>
 <df5512db-81bd-02e2-a45d-e6b276e07f1b@linux.ibm.com>
 <77fe5ac1-200f-db69-a92a-5d349642f394@linux.ibm.com>
 <4f7c5da8-75d3-2d98-b728-e1a319392097@linux.ibm.com>
 <20210909161152.GR1583@gate.crashing.org>
 <894f01c3-6481-0757-751f-b4239a4f0232@linux.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <894f01c3-6481-0757-751f-b4239a4f0232@linux.ibm.com>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL,
 KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Sep 2021 18:25:48 -0000

Hi!

On Thu, Sep 09, 2021 at 12:19:28PM -0500, Bill Schmidt wrote:
> On 9/9/21 11:11 AM, Segher Boessenkool wrote:
> >On Wed, Sep 08, 2021 at 02:57:14PM +0800, Kewen.Lin wrote:
> >>+      /* If we have strided or elementwise loads into a vector, it's
> >"strided" is not a word: it properly is "stridden", which does not read
> >very well either.  "Have loads by stride, or by element, ..."?  Is that
> >good English, and easier to understand?
> 
> No, this is OK.  "Strided loads" is a term of art used by the 
> vectorizer; whether or not it was the Queen's English, it's what we 
> have...

"Strided" is not in any dictionary I could find.  I have no idea what
the island queen has to do with anything :-)

> (And I think you might only find "bestridden" in some 18th or 
> 19th century English poetry... :-)

"Bestride" is not used in the US apparently, but it has nothing to do
with this anyway.

In any case, "strided" is used a lot in the vectoriser already (and
nowhere else).  So, fine with me of course.

> >Does that text look good to you now Bill?  It is still kinda complex,
> >maybe you see a way to make it simpler.
> 
> I think it's OK now.  The complexity at least matches the code now 
> instead of exceeding it. :-P  j/k...

Ha :-)

Ideally the comments will clarify complex code.  But not making it worse
is at least something, okay :-)

> >>+  if (data->extra_ctor_cost > 0)
> >>+    {
> >>+      /* Threshold for load stmts percentage in all vectorized stmts.  */
> >>+      const int DENSITY_LOAD_PCT_THRESHOLD = 45;
> >Threshold for what?
> >
> >45% is awfully exact.  Can you make this a param?
> >
> >>+      /* Threshold for total number of load stmts.  */
> >>+      const int DENSITY_LOAD_NUM_THRESHOLD = 20;
> >Same.
> 
> We have similar magic constants in here already.

And that is problematic.  That is the problem.

> Parameterizing is 
> possible, but I'm more interested in making sure the numbers are 
> appropriate for each processor.

Of course.  So a) there shouldn't be any magic constants, the constants
should be meaningful.  That way, tunings will need less maintenance for
new hardware versions, but way more importantly: if anything else in the
compiler changes.  And b), having params makes it easier to do such
tuning, or experiments for it.

It is very easy to add params btw (just in rs6000.opt).

> Perhaps a follow-up patch to add params for the magic constants would be 
> reasonable, but I'd personally consider it pretty low priority.

It also is super easy to do.  The only hard part is making a name for
it, and if you cannot think of a good name, that says that it is not a
good abstraction in the first place -- magic isn't good.

Please do it.

> >>+	  /* As function rs6000_builtin_vectorization_cost shows, we have
> >>+	     priced much on V16QI/V8HI vector construction as their units,
> >>+	     if we penalize them with nunits * stmt_cost, it can result in
> >>+	     an unreliable body cost, eg: for V16QI on Power8, stmt_cost
> >>+	     is 20 and nunits is 16, the extra cost is 320 which looks
> >>+	     much exaggerated.  So let's use one maximum bound for the
> >>+	     extra penalized cost for vector construction here.  */
> >>+	  const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
> >>+	  if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
> >>+	    extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
> >That is a pretty gross hack.  Can you think of any saner way to not have
> >those out of scale costs in the first place?
> 
> In Kewen's defense, the whole business of "finish_cost" for these 
> vectorized loops is to tweak things that don't work quite right with the 
> hooks currently provided to the vectorizer to add costs on a per-stmt 
> basis without looking at the overall set of statements.  It gives the 
> back end a chance to massage things and exercise veto power over 
> otherwise bad decisions.  By nature, that's going to be very much a 
> heuristic exercise.  Personally I think the heuristics used here are 
> pretty reasonable, and importantly they are designed to only be employed 
> in pretty rare circumstances.  It doesn't look easy to me to avoid the 
> need for a cap here without making the rest of the heuristics harder to 
> understand.  But sure, he can try! :)

Yes.  But we need to avoid magic.  Magic does not scale; magic makes it
hard or impossible to do any future changes.

I did approve the patch already.  But weaknesses like this need
continuous attention, and that does not scale.


Segher