From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <segher@kernel.crashing.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
 by sourceware.org (Postfix) with ESMTP id 525E9384A89C
 for <gcc-patches@gcc.gnu.org>; Thu,  9 Sep 2021 16:12:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 525E9384A89C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=kernel.crashing.org
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
 by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 189GBr8n009519;
 Thu, 9 Sep 2021 11:11:53 -0500
Received: (from segher@localhost)
 by gate.crashing.org (8.14.1/8.14.1/Submit) id 189GBqpq009518;
 Thu, 9 Sep 2021 11:11:52 -0500
X-Authentication-Warning: gate.crashing.org: segher set sender to
 segher@kernel.crashing.org using -f
Date: Thu, 9 Sep 2021 11:11:52 -0500
From: Segher Boessenkool <segher@kernel.crashing.org>
To: "Kewen.Lin" <linkw@linux.ibm.com>
Cc: wschmidt@linux.ibm.com, David Edelsohn <dje.gcc@gmail.com>,
 will schmidt <will_schmidt@vnet.ibm.com>,
 GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH v4] rs6000: Add load density heuristic
Message-ID: <20210909161152.GR1583@gate.crashing.org>
References: <7b9f9bdf-1ed5-139b-de9c-511ee8454b85@linux.ibm.com>
 <3424a3d3-fa4e-16f9-89c6-0b07beec957d@linux.ibm.com>
 <df5512db-81bd-02e2-a45d-e6b276e07f1b@linux.ibm.com>
 <77fe5ac1-200f-db69-a92a-5d349642f394@linux.ibm.com>
 <4f7c5da8-75d3-2d98-b728-e1a319392097@linux.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4f7c5da8-75d3-2d98-b728-e1a319392097@linux.ibm.com>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL,
 KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Sep 2021 16:12:56 -0000

Hi!

On Wed, Sep 08, 2021 at 02:57:14PM +0800, Kewen.Lin wrote:
> >>+      /* If we have strided or elementwise loads into a vector, it's
> >>+	 possible to be bounded by latency and execution resources for
> >>+	 many scalar loads.  Try to account for this by scaling the
> >>+	 construction cost by the number of elements involved, when
> >>+	 handling each matching statement we record the possible extra
> >>+	 penalized cost into target cost, in the end of costing for
> >>+	 the whole loop, we do the actual penalization once some load
> >>+	 density heuristics are satisfied.  */
> > 
> > The above comment is quite hard to read.  Can you please break up the last
> > sentence into at least two sentences?
> 
> How about the below:
> 
> +      /* If we have strided or elementwise loads into a vector, it's

"strided" is not a word: it properly is "stridden", which does not read
very well either.  "Have loads by stride, or by element, ..."?  Is that
good English, and easier to understand?

> +        possible to be bounded by latency and execution resources for
> +        many scalar loads.  Try to account for this by scaling the
> +        construction cost by the number of elements involved.  For
> +        each matching statement, we record the possible extra
> +        penalized cost into the relevant field in target cost.  When
> +        we want to finalize the whole loop costing, we will check if
> +        those related load density heuristics are satisfied, and add
> +        this accumulated penalized cost if yes.  */
> 
> > Otherwise this looks good to me, and I recommend maintainers approve with
> > that clarified.

Does that text look good to you now Bill?  It is still kinda complex,
maybe you see a way to make it simpler.

> 	* config/rs6000/rs6000.c (struct rs6000_cost_data): New members
> 	nstmts, nloads and extra_ctor_cost.
> 	(rs6000_density_test): Add load density related heuristics and the
> 	checks, do extra costing on vector construction statements if need.

"and the checks"?  Oh, "and checks"?  It is probably fine to just leave
out this whole phrase part :-)

Don't use commas like this in changelogs.  s/, do/.  Do/  Yes this is a
bit boring text that way, but that is the purpose: it makes it simpler
to read (and read quickly, even merely scan).

> @@ -5262,6 +5262,12 @@ typedef struct _rs6000_cost_data

[ Btw, you can get rid of the typedef now, just have a struct with the
non-underscore name, we have C++ now.  Such a mechanical change (as
separate patch!) is pre-approved. ]

> +  /* Check if we need to penalize the body cost for latency and
> +     execution resources bound from strided or elementwise loads
> +     into a vector.  */

Bill, is that clear enough?  I'm sure something nicer would help here,
but it's hard for me to write anything :-)

> +  if (data->extra_ctor_cost > 0)
> +    {
> +      /* Threshold for load stmts percentage in all vectorized stmts.  */
> +      const int DENSITY_LOAD_PCT_THRESHOLD = 45;

Threshold for what?

45% is awfully exact.  Can you make this a param?

> +      /* Threshold for total number of load stmts.  */
> +      const int DENSITY_LOAD_NUM_THRESHOLD = 20;

Same.

> +      unsigned int load_pct = (data->nloads * 100) / (data->nstmts);

No parens around the last thing please.  The other pair of parens is
unneeded as well, but perhaps it is easier to read like that.

> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_NOTE, vect_location,
> +			     "Found %u loads and load pct. %u%% exceed "
> +			     "the threshold, penalizing loop body "
> +			     "cost by extra cost %u for ctor.\n",
> +			     data->nloads, load_pct, data->extra_ctor_cost);

That line does not fit.  Make it more lines?

It is a pity that using these interfaces at all takes up 45 chars
of noise already.

> +/* Helper function for add_stmt_cost.  Check each statement cost
> +   entry, gather information and update the target_cost fields
> +   accordingly.  */
> +static void
> +rs6000_update_target_cost_per_stmt (rs6000_cost_data *data,
> +				    enum vect_cost_for_stmt kind,
> +				    struct _stmt_vec_info *stmt_info,
> +				    enum vect_cost_model_location where,
> +				    int stmt_cost, unsigned int orig_count)

Please put those last two on separate lines as well?

> +	  /* As function rs6000_builtin_vectorization_cost shows, we have
> +	     priced much on V16QI/V8HI vector construction as their units,
> +	     if we penalize them with nunits * stmt_cost, it can result in
> +	     an unreliable body cost, eg: for V16QI on Power8, stmt_cost
> +	     is 20 and nunits is 16, the extra cost is 320 which looks
> +	     much exaggerated.  So let's use one maximum bound for the
> +	     extra penalized cost for vector construction here.  */
> +	  const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
> +	  if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
> +	    extra_cost = MAX_PENALIZED_COST_FOR_CTOR;

That is a pretty gross hack.  Can you think of any saner way to not have
those out of scale costs in the first place?

Okay for trunk with such tweaks.  Thanks!  (And please consult with Bill
for the wordsmithing :-) )


Segher