Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Ajit Agarwal <aagarwa1@linux.ibm.com>
To: Alex Coplan <alex.coplan@arm.com>,
	"Kewen.Lin" <linkw@linux.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	Michael Meissner <meissner@linux.ibm.com>,
	Peter Bergner <bergner@linux.ibm.com>,
	David Edelsohn <dje.gcc@gmail.com>,
	gcc-patches <gcc-patches@gcc.gnu.org>,
	richard.sandiford@arm.com
Subject: Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion
Date: Mon, 3 Jun 2024 19:17:49 +0530	[thread overview]
Message-ID: <957bd4b7-11dd-4ebd-adf1-1c0815884944@linux.ibm.com> (raw)
In-Reply-To: <mpt5xuq1csh.fsf@arm.com>

Hello Richard:

On 03/06/24 5:03 pm, Richard Sandiford wrote:
> Ajit Agarwal <aagarwa1@linux.ibm.com> writes:
>>> [...]
>>> If it is intentional, what distinguishes things like vperm and xxinsertw
>>> (and all other unspecs) from plain addition?
>>>
>>>   [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
>>>         (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")
>>> 		    (match_operand:VSX_F 2 "vsx_register_operand" "wa")))]
>>>
>>
>> Plain addition are not supported currently.
>> We have not seen many cases with plain addition and this patch
>> will not accept plain addition.
>>
>>  
>>> This is why the intention behind the patch is important.  As it stands,
>>> it isn't clear what criteria the patch is using to distinguish "valid"
>>> fuse candidates from "invalid" ones.
>>>
>>
>> Intention behind this patch all variants of UNSPEC instructions are
>> supported and uses without UNSPEC are not supported in this patch.
> 
> But why make the distinction this way though?  UNSPEC is a very
> GCC-specific concept.  Whether something is an UNSPEC or some other
> RTL code depends largely on historical accident.  E.g. we have specific
> codes for VEC_SELECT, VEC_MERGE, and VEC_DUPLICATE, but don't have one
> for VEC_PERM (even for VEC_PERM_EXPR exists in gimple).
> 
> It seems unlikely that GCC's choice about whether to represent something
> as an UNSPEC or as another RTL code lines up neatly with the kind of
> codegen decisions that a good assembly programmer would make.
> 
> I suppose another way of asking is to turn this around and say: what
> kind of uses are you trying to exclude?  Presumably things are worse
> if you remove this function override.  But what makes them worse?
> What kind of uses cause the regression?
> 

Uses of fused load where load with low address uses are modified with load with high address uses.

Similarly load with high address uses are modified with load low address
uses.

This is the semantics of lxvp instructions which can occur through
UNSPEC uses otherwise it breaks the functionality and seen failure
in almost all vect regressions and SPEC benchmarks.


>>>>>>>> [...]
>>>>>>>> +  // Given insn_info pair I1 and I2, return true if offsets are in order.
>>>>>>>> +  virtual bool should_handle_unordered_insns (rtl_ssa::insn_info *i1,
>>>>>>>> +					      rtl_ssa::insn_info *i2) = 0;
>>>>>>>> +
>>>>>>>
>>>>>>> This name seems a bit misleading.  The function is used in:
>>>>>>>
>>>>>>> @@ -2401,6 +2405,9 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>>>>>>>        reversed = true;
>>>>>>>      }
>>>>>>>  
>>>>>>> +  if (!m_pass->should_handle_unordered_insns (i1, i2))
>>>>>>> +    return false;
>>>>>>> +
>>>>>>>    rtx cand_mems[2];
>>>>>>>    rtx reg_ops[2];
>>>>>>>    rtx pats[2];
>>>>>>>
>>>>>>> and so it acts as a general opt-out.  The insns aren't known to be unordered.
>>>>>>>
>>>>>>> It looks like the rs6000 override requires the original insns to be
>>>>>>> in offset order.  Could you say why that's necessary?  (Both in email
>>>>>>> and as a comment in the code.)
>>>>>>>
>>>>>>
>>>>>> Yes rs6000 requires the original load insns to be in offset order.
>>>>>> Some regression tests like vect-outer-4f fails if we do load pair
>>>>>> fusion with load offsets are not in offset order as this breaks lxvp 
>>>>>> semantics.
>>>>>
>>>>> How does it break the semantics though?  In principle, the generic code
>>>>> only fuses if it has "proved" that the loads can happen in either order.
>>>>> So it shouldn't matter which order the hardware does things in.
>>>>>
>>>>> Could you give an example of the kind of situation that you want
>>>>> to avoid, and why it generates the wrong result?
>>>>>
>>>>
>>>> (insn 31 62 32 2 (set (reg:V16QI 177 [ MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 64B] ])
>>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>>                 (const_int 64 [0x40])) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 64B]+0 S16 A16]))  {vsx_movv16qi_64bit}
>>>>      (nil))
>>>> (insn 32 31 16 2 (set (reg:V16QI 178 [ MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 80B] ])
>>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>>                 (const_int 80 [0x50])) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 80B]+0 S16 A16]))  {vsx_movv16qi_64bit}
>>>>      (nil))
>>>> (insn 16 32 21 2 (set (reg:V16QI 159 [ MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 16B] ])
>>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>>                 (const_int 16 [0x10])) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 16B]+0 S16 A16]))  {vsx_movv16qi_64bit}
>>>>      (nil))
>>>> (insn 21 16 22 2 (set (reg:V16QI 165 [ MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 32B] ])
>>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>>                 (const_int 32 [0x20])) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 32B]+0 S16 A16])) {vsx_movv16qi_64bit}
>>>>      (nil))
>>>> (insn 22 21 37 2 (set (reg:V16QI 166 [ MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 48B] ])
>>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>>                 (const_int 48 [0x30])) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 48B]+0 S16 A16])) {vsx_movv16qi_64bit}
>>>>      (nil))
>>>>
>>>> insn 22 and insn 31 is merged in the failure case and breaks the code.
>>>
>>> What specifically goes wrong though?  This is just a sequence of loads
>>> from the same base pointer, with no interdependencies, so it should be
>>> possible to do the loads in any order.
>>>
>>
>> Here in fuse_pair we set first and second based  as follows:
>>
>>   insn_info *first = (*i1 < *i2) ? i1 : i2;
>>   insn_info *second = (first == i1) ? i2 : i1;
>>
>> This makes higher offset with first and lower offset with second.
>> if (*i1 > *i2).
>>
>> and in set_multiword_subreg interface we pass first and second.
>> Hence above code breaks because subreg offsets with 256 bits are not set properly.
>>
>> If we pass i1 and i2 in set_multiword_subreg (i1, i2, load_p)
>> in fuse_pair should_handle_unordered_insns is not required in try_fuse_pair.
>>
>> I will send the patch by removing the above interface check 
>> in try_fuse_pair and pass i1 and i2 in set_multiword_subreg
> 
> Thanks for looking into it.
> 
> I think it'd be better to resolve the unspec discussion before
> posting another version of the patch though.
> 
Sure.
> Richard

Thanks & Regards
Ajit

next prev parent reply	other threads:[~2024-06-03 13:48 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-30 19:51 Ajit Agarwal
2024-05-30 21:34 ` Segher Boessenkool
2024-05-31  8:14   ` Richard Sandiford
2024-05-31 14:19     ` Segher Boessenkool
2024-05-31  9:53 ` Richard Sandiford
2024-05-31 10:28   ` Richard Sandiford
2024-05-31 13:54   ` Ajit Agarwal
2024-05-31 14:38     ` Richard Sandiford
2024-05-31 16:59       ` Ajit Agarwal
2024-06-02  5:52         ` Ajit Agarwal
2024-06-03  8:37         ` Richard Sandiford
2024-06-03 11:05           ` Ajit Agarwal
2024-06-03 11:33             ` Richard Sandiford
2024-06-03 13:47               ` Ajit Agarwal [this message]
2024-06-03 14:17                 ` Richard Sandiford
2024-06-03 14:34                   ` Ajit Agarwal
2024-06-03 14:54                     ` Richard Sandiford
2024-06-03 15:58                       ` Ajit Agarwal
2024-06-04 16:00                         ` Ajit Agarwal
2024-06-02 13:16   ` Ajit Agarwal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=957bd4b7-11dd-4ebd-adf1-1c0815884944@linux.ibm.com \
    --to=aagarwa1@linux.ibm.com \
    --cc=alex.coplan@arm.com \
    --cc=bergner@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@linux.ibm.com \
    --cc=meissner@linux.ibm.com \
    --cc=richard.sandiford@arm.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).