From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=dkBB=NC=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 3C79B385ED4A
	for <gcc-patches@gcc.gnu.org>; Fri, 31 May 2024 09:53:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3C79B385ED4A
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3C79B385ED4A
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717149236; cv=none;
	b=rt0ss0ljvq5VDZ9fgAdM/YmIT1hMWLxytcwG+LkAp1GXioBb2zEpG5XQRc93TWHv4rxE7gKxDKFo4fVkDv3Vn3au2f6DpgdcfEUFjNxRq00yeOG+3vk0IQPsVxQIwRhLFlTeYnZQmptF3HvVQiKEVm0rl4MMz9twQjyt+zTa4kc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1717149236; c=relaxed/simple;
	bh=Cpj1NweI3/HQ6QX6DSZHLhhveCYEzHkpG3NLg1IRyOo=;
	h=From:To:Subject:Date:Message-ID:MIME-Version; b=dO19VKyITf068h48rl2ij9+GDeB7WlZit1HOWlciF0fzvjUeq0/501Ew9E6ZFWf7r4yUcnj86VfPkmuKUFvxvlShhA1KxJl/nIxtdu5Dwg6/GMUL5ozi9oevPO1CcSJIv9YouyLUgWOOxueuE0S3Pjw0Df90vyTUFYIfVceBfEY=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2E6E91424;
	Fri, 31 May 2024 02:54:18 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B8F233F641;
	Fri, 31 May 2024 02:53:52 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Ajit Agarwal <aagarwa1@linux.ibm.com>
Mail-Followup-To: Ajit Agarwal <aagarwa1@linux.ibm.com>,Alex Coplan <alex.coplan@arm.com>,  "Kewen.Lin" <linkw@linux.ibm.com>,  Segher Boessenkool <segher@kernel.crashing.org>,  Michael Meissner <meissner@linux.ibm.com>,  Peter Bergner <bergner@linux.ibm.com>,  David Edelsohn <dje.gcc@gmail.com>,  gcc-patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: Alex Coplan <alex.coplan@arm.com>,  "Kewen.Lin" <linkw@linux.ibm.com>,  Segher Boessenkool <segher@kernel.crashing.org>,  Michael Meissner <meissner@linux.ibm.com>,  Peter Bergner <bergner@linux.ibm.com>,  David Edelsohn <dje.gcc@gmail.com>,  gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion
References: <53ba46de-6c01-4c68-bd98-1ba6950a793a@linux.ibm.com>
Date: Fri, 31 May 2024 10:53:51 +0100
In-Reply-To: <53ba46de-6c01-4c68-bd98-1ba6950a793a@linux.ibm.com> (Ajit
	Agarwal's message of "Fri, 31 May 2024 01:21:44 +0530")
Message-ID: <mpt34py71e8.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-20.3 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Ajit Agarwal <aagarwa1@linux.ibm.com> writes:
> Hello All:
>
> Common infrastructure using generic code for pair mem fusion of different
> targets.
>
> rs6000 target specific specific code implements virtual functions defined
> by generic code.
>
> Code is implemented with pure virtual functions to interface with target
> code.
>
> Target specific code are added in rs6000-mem-fusion.cc and additional virtual
> function implementation required for rs6000 are added in aarch64-ldp-fusion.cc.
>
> Bootstrapped and regtested for aarch64-linux-gnu and powerpc64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
>
> aarch64, rs6000, middle-end: Add implementation for different targets for pair mem fusion
>
> Common infrastructure using generic code for pair mem fusion of different
> targets.
>
> rs6000 target specific specific code implements virtual functions defined
> by generic code.
>
> Code is implemented with pure virtual functions to interface with target
> code.
>
> Target specific code are added in rs6000-mem-fusion.cc and additional virtual
> function implementation required for rs6000 are added in aarch64-ldp-fusion.cc.
>
> 2024-05-31  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-ldp-fusion.cc: Add target specific
> 	implementation of additional virtual functions added in pair_fusion
> 	struct.
> 	* config/rs6000/rs6000-passes.def: New mem fusion pass
> 	before pass_early_remat.
> 	* config/rs6000/rs6000-mem-fusion.cc: Add new pass.
> 	Add target specific implementation using pure virtual
> 	functions.
> 	* config.gcc: Add new object file.
> 	* config/rs6000/rs6000-protos.h: Add new prototype for mem
> 	fusion pass.
> 	* config/rs6000/t-rs6000: Add new rule.
> 	* rtl-ssa/accesses.h: Moved set_is_live_out_use as public
> 	from private.
>
> gcc/testsuite/ChangeLog:
>
> 	* g++.target/powerpc/me-fusion.C: New test.
> 	* g++.target/powerpc/mem-fusion-1.C: New test.
> 	* gcc.target/powerpc/mma-builtin-1.c: Modify test.
> ---

This isn't a complete review, just some initial questions & comments
about selected parts.

> [...]
> +/* Check whether load can be fusable or not.
> +   Return true if dependent use is UNSPEC otherwise false.  */
> +bool
> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
> +{
> +  rtx_insn *insn = info->rtl ();
> +
> +  for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1))
> +    if (REG_NOTE_KIND (note) == REG_EQUAL
> +	|| REG_NOTE_KIND (note) == REG_EQUIV)
> +      return false;

It's unusual to punt on an optimisation because of a REG_EQUAL/EQUIV
note.  What's the reason for doing this?  Are you trying to avoid
fusing pairs before reload that are equivalent to a MEM (i.e. have
a natural spill slot)?  I think Alex hit a similar situation.

> +
> +  for (auto def : info->defs ())
> +    {
> +      auto set = dyn_cast<set_info *> (def);
> +      if (set && set->has_any_uses ())
> +	{
> +	  for (auto use : set->all_uses())

Nit: has_any_uses isn't necessary: the inner loop will simply do nothing
in that case.  Also, we can/should restrict the scan to non-debug uses.

This can then be:

  for (auto def : info->defs ())
    if (auto set = dyn_cast<set_info *> (def))
      for (auto use : set->nondebug_insn_uses())

> +	    {
> +	      if (use->insn ()->is_artificial ())
> +		return false;
> +
> +	       insn_info *info = use->insn ();
> +
> +	       if (info
> +		   && info->rtl ()

This test shouldn't be necessary.

> +		   && info->is_real ())
> +		  {
> +		    rtx_insn *rtl_insn = info->rtl ();
> +		    rtx set = single_set (rtl_insn);
> +
> +		    if (set == NULL_RTX)
> +		      return false;
> +
> +		    rtx op0 = SET_SRC (set);
> +		    if (GET_CODE (op0) != UNSPEC)
> +		      return false;

What's the motivation for rejecting unspecs?  It's unsual to treat
all unspecs as a distinct group.

Also, using single_set means that the function still lets through
parallels of two sets in which the sources are unspecs.  Is that
intentional?

The reasons behind things like the REG_EQUAL/EQUIV and UNSPEC decisions
need to be described in comments, so that other people coming to this
code later can understand the motivation.  The same thing applies to
other decisions in the patch.

> +		  }
> +	      }
> +	  }
> +    }
> +  return true;
> +}
> [...]
> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
> index 9f897ac04e2..2dbe9f854ef 100644
> --- a/gcc/pair-fusion.cc
> +++ b/gcc/pair-fusion.cc
> @@ -312,7 +312,7 @@ static int
>  encode_lfs (lfs_fields fields)
>  {
>    int size_log2 = exact_log2 (fields.size);
> -  gcc_checking_assert (size_log2 >= 2 && size_log2 <= 4);
> +  gcc_checking_assert (size_log2 >= 2 && size_log2 <= 6);
>    return ((int)fields.load_p << 3)
>      | ((int)fields.fpsimd_p << 2)
>      | (size_log2 - 2);

The point of the assert is to make sure that "size_log2 - 2"
fits within 2 bits, and so does not interfere with the fpsimd_p
and load_p parts of the encoding.  If rs6000 needs size_log2 == 6,
the shift amounts should be increased by 1 to compensate.

If we do bump the shifts by 1, the new range can be:

  gcc_checking_assert (size_log2 >= 2 && size_log2 <= 9);

> [...]
> +  // Given insn_info pair I1 and I2, return true if offsets are in order.
> +  virtual bool should_handle_unordered_insns (rtl_ssa::insn_info *i1,
> +					      rtl_ssa::insn_info *i2) = 0;
> +

This name seems a bit misleading.  The function is used in:

@@ -2401,6 +2405,9 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
       reversed = true;
     }
 
+  if (!m_pass->should_handle_unordered_insns (i1, i2))
+    return false;
+
   rtx cand_mems[2];
   rtx reg_ops[2];
   rtx pats[2];

and so it acts as a general opt-out.  The insns aren't known to be unordered.

It looks like the rs6000 override requires the original insns to be
in offset order.  Could you say why that's necessary?  (Both in email
and as a comment in the code.)

If we do need a general opt-out like this, it should probably go at the
very start of try_fuse_pair, before even the dump message.  (Alternatively,
it could go after the dump message, but then the "return false" would need
a dump message of its own to explain the failure.)

Thanks,
Richard