From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <segher@kernel.crashing.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
 by sourceware.org (Postfix) with ESMTP id 1E508385E03D;
 Tue, 14 Jun 2022 22:35:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1E508385E03D
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=kernel.crashing.org
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
 by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 25EMYMBa017987;
 Tue, 14 Jun 2022 17:34:22 -0500
Received: (from segher@localhost)
 by gate.crashing.org (8.14.1/8.14.1/Submit) id 25EMYLQh017986;
 Tue, 14 Jun 2022 17:34:21 -0500
X-Authentication-Warning: gate.crashing.org: segher set sender to
 segher@kernel.crashing.org using -f
Date: Tue, 14 Jun 2022 17:34:21 -0500
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Jiufu Guo <guojiufu@linux.ibm.com>
Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org,
 amodra@gmail.com
Subject: Re: [PATCH V2]rs6000: Store complicated constant into pool
Message-ID: <20220614223421.GB25951@gate.crashing.org>
References: <20220614132355.117887-1-guojiufu@linux.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20220614132355.117887-1-guojiufu@linux.ibm.com>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL,
 KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Jun 2022 22:35:24 -0000

Hi!

On Tue, Jun 14, 2022 at 09:23:55PM +0800, Jiufu Guo wrote:
> This patch reduces the threshold of instruction number for storing
> constant to pool and update cost for constant and mem accessing.
> And then if building the constant needs more than 2 instructions (or
> more than 1 instruction on P10), then prefer to load it from constant
> pool.

Have you tried with different limits?  And, p10 is a red herring, you
actually test if prefixed insns are used.

> 	* config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem):
> 	Exclude rtx with code 'HIGH'.

> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -9706,8 +9706,9 @@ rs6000_init_stack_protect_guard (void)
>  static bool
>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
> -  if (GET_CODE (x) == HIGH
> -      && GET_CODE (XEXP (x, 0)) == UNSPEC)
> +  /* Exclude CONSTANT HIGH part.  e.g.
> +     (high:DI (symbol_ref:DI ("var") [flags 0xc0] <var_decl>)).  */
> +  if (GET_CODE (x) == HIGH)
>      return true;

So, why is this?  You didn't explain.

> @@ -11139,7 +11140,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
>  		    && FP_REGNO_P (REGNO (operands[0])))
>  		   || !CONST_INT_P (operands[1])
>  		   || (num_insns_constant (operands[1], mode)
> -		       > (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2)))
> +		       > (TARGET_PREFIXED ? 1 : 2)))
>  	       && !toc_relative_expr_p (operands[1], false, NULL, NULL)
>  	       && (TARGET_CMODEL == CMODEL_SMALL
>  		   || can_create_pseudo_p ()

This is the more obvious part.

> @@ -22101,6 +22102,14 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
>  
>      case CONST_DOUBLE:
>      case CONST_WIDE_INT:
> +      /* It may needs a few insns for const to SET. -1 for outer SET code.  */
> +      if (outer_code == SET)
> +	{
> +	  *total = COSTS_N_INSNS (num_insns_constant (x, mode)) - 1;
> +	  return true;
> +	}
> +      /* FALLTHRU */
> +
>      case CONST:
>      case HIGH:
>      case SYMBOL_REF:

But this again isn't an obvious improvement at all, and needs
performance testing -- separately of the other changes.

> @@ -22110,8 +22119,12 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
>      case MEM:
>        /* When optimizing for size, MEM should be slightly more expensive
>  	 than generating address, e.g., (plus (reg) (const)).
> -	 L1 cache latency is about two instructions.  */
> -      *total = !speed ? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (2);
> +	 L1 cache latency is about two instructions.
> +	 For prefixed load (pld), we would set it slightly faster than
> +	 than two instructions. */
> +      *total = !speed
> +		 ? COSTS_N_INSNS (1) + 1
> +		 : TARGET_PREFIXED ? COSTS_N_INSNS (2) - 1 : COSTS_N_INSNS (2);
>        if (rs6000_slow_unaligned_access (mode, MEM_ALIGN (x)))
>  	*total += COSTS_N_INSNS (100);
>        return true;

And this is completely independent of the rest as well.  Cost 5 or 7 are
completely random numbers, why did you pick these?  Does it work better
than 8, or 4, etc.?


> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-require-effective-target lp64 } */
>  /* { dg-options "-O" } */
> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */
> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */

Why?  This is still better generated in code, no?  It should never be
loaded from a constant pool (it is hex 4000_0000_0000_0000, easy to
construct with just one or two insns).

> --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c
> @@ -10,4 +10,4 @@ unsigned long long mskh1() { return 0xffff9234ffff9234ULL; }
>  unsigned long long mskl1() { return 0x2bcdffff2bcdffffULL; }
>  unsigned long long mskse() { return 0xffff1234ffff1234ULL; }
>  
> -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */
> +/* { dg-final { scan-assembler-times {\mrldimi|ld|pld\M} 7 } } */

Please make this the exact number of times you want to see rldimi and
the number of times you want a load.

Btw, you need to write
  \m(?:rldimi|ld|pld)\M
or it will mean
  \mrldimi
or
  ld
or
  pld\M
(and that "ld" will match anything that "pld$" will match of course).


So no doubt this will improve things, but we need testing of each part
separately.  Also look at code size, or differences in the generated
code in general: this is much more sensitive to detect than performance,
and is not itself sensitive to things like system load, so a) is easier
to measure, and b) has more useful outputs, outputs that tell more of
the whole story.


Segher