From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=lXZf=NF=linux.ibm.com=aagarwa1@sourceware.org>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1])
	by sourceware.org (Postfix) with ESMTPS id 06B41384B11C
	for <gcc-patches@gcc.gnu.org>; Mon,  3 Jun 2024 13:48:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 06B41384B11C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 06B41384B11C
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717422483; cv=none;
	b=o/VOgTA/SUtMP9bBC0NfdPrysFdhne8gxyVq8+wl+S+hKlPSRvRszlYWxPvFhzSaHoZbO6RB6eCfdhdmgEzW96v7QcrO4Ccu5naPDX9eyxc3rdEv1PO6Y3azSxEkL260TMAqgR6IXU09rhkpAVgEKjpr9/oeM1pATea6A9Yl32s=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1717422483; c=relaxed/simple;
	bh=X81hPrVlYKwsWPuecbP3ab0QJ7Xs71MMUjdzzi0HWII=;
	h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=jT4uKKbPqq9WcmvuEVzhGQhCVq92jP5RVhQmQQv/Nadf8AehBK98dGDBa401Zxx2d4/wZmYvh3HjZzAjrW1xlc43VPwAb4FOCZWjr0bRGSel+2ICWhk3+0gIcfGfe55Fzj7VYjlbY3qImjyzLB0w3jaMpRfRnDdpzvDMo/Pp+d0=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from pps.filterd (m0356517.ppops.net [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 453DWCEW009585;
	Mon, 3 Jun 2024 13:47:58 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com;
 h=content-transfer-encoding : content-type : date : from : in-reply-to :
 message-id : mime-version : references : subject : to; s=pp1;
 bh=98aRj1MHn3sZzdE8BEBzvaXGOhfCRLp7Lcg3iUgiIUE=;
 b=mT5X2B669xASTBL5gAiPh7/sKDDiVLekexcwAdmCyXPaDEO48+KY5Lf2x4Qyk9wrJAUS
 S3lGuiAwbf/8yrSMDYB/r0ivlVXVHuxESUhfcfUjk5QqdTCddY3v+kja0zNmx0wfkZLq
 DBhlNJoOnGnNTiduxgfrsfJJutJMpG/2Df2DcrM9CN9fj+fJ/6eC7isj+hyOYcjFjFsy
 h3Gv9w7uESt17iY/SnlzZ+4lQThrhvB3/DGXWRPYPznnJYy5+BGxWpK8XrIArdqgiFCC
 ioEoeuvM3/xhawFMyqu3afaD41MIP4EZctHomkPNHtuJpBO5CkThY9I0GsBSsGLNkwZU 6w== 
Received: from pps.reinject (localhost [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yhet1g1gv-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Mon, 03 Jun 2024 13:47:57 +0000
Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1])
	by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 453DlvgS006420;
	Mon, 3 Jun 2024 13:47:57 GMT
Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yhet1g1gg-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Mon, 03 Jun 2024 13:47:56 +0000
Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1])
	by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 453BOi8V026509;
	Mon, 3 Jun 2024 13:47:56 GMT
Received: from smtprelay02.dal12v.mail.ibm.com ([172.16.1.4])
	by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3yggp2qnd2-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Mon, 03 Jun 2024 13:47:56 +0000
Received: from smtpav05.dal12v.mail.ibm.com (smtpav05.dal12v.mail.ibm.com [10.241.53.104])
	by smtprelay02.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 453Dlrec30867900
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Mon, 3 Jun 2024 13:47:55 GMT
Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 4BA3A58056;
	Mon,  3 Jun 2024 13:47:53 +0000 (GMT)
Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id B95A158052;
	Mon,  3 Jun 2024 13:47:50 +0000 (GMT)
Received: from [9.36.9.27] (unknown [9.36.9.27])
	by smtpav05.dal12v.mail.ibm.com (Postfix) with ESMTP;
	Mon,  3 Jun 2024 13:47:50 +0000 (GMT)
Message-ID: <957bd4b7-11dd-4ebd-adf1-1c0815884944@linux.ibm.com>
Date: Mon, 3 Jun 2024 19:17:49 +0530
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [Patch, rs6000, aarch64, middle-end] Add implementation for
 different targets for pair mem fusion
To: Alex Coplan <alex.coplan@arm.com>, "Kewen.Lin" <linkw@linux.ibm.com>,
        Segher Boessenkool <segher@kernel.crashing.org>,
        Michael Meissner <meissner@linux.ibm.com>,
        Peter Bergner <bergner@linux.ibm.com>,
        David Edelsohn <dje.gcc@gmail.com>,
        gcc-patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
References: <53ba46de-6c01-4c68-bd98-1ba6950a793a@linux.ibm.com>
 <mpt34py71e8.fsf@arm.com>
 <cc31b3fb-552b-43bd-9029-5b84f6e6c437@linux.ibm.com>
 <mpted9i59n1.fsf@arm.com>
 <95a33b0a-8090-4218-a62c-da1f53bebbb7@linux.ibm.com>
 <mpt4jaa2zh9.fsf@arm.com>
 <9efb06e2-74f1-42f1-8a52-931d13a57ebc@linux.ibm.com>
 <mpt5xuq1csh.fsf@arm.com>
Content-Language: en-US
From: Ajit Agarwal <aagarwa1@linux.ibm.com>
In-Reply-To: <mpt5xuq1csh.fsf@arm.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-ORIG-GUID: XWYX7qatfIqxhbQGpZdc7YbrQj6B7M8s
X-Proofpoint-GUID: M0GKh_ouSjtKIOIa7SGNpjk4LSRTLVLL
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.650,FMLib:17.12.28.16
 definitions=2024-06-03_09,2024-05-30_01,2024-05-17_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0
 impostorscore=0 malwarescore=0 spamscore=0 mlxlogscore=999
 priorityscore=1501 adultscore=0 clxscore=1015 suspectscore=0 mlxscore=0
 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2405010000 definitions=main-2406030115
X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_MANYTO,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hello Richard:

On 03/06/24 5:03 pm, Richard Sandiford wrote:
> Ajit Agarwal <aagarwa1@linux.ibm.com> writes:
>>> [...]
>>> If it is intentional, what distinguishes things like vperm and xxinsertw
>>> (and all other unspecs) from plain addition?
>>>
>>>   [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
>>>         (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")
>>> 		    (match_operand:VSX_F 2 "vsx_register_operand" "wa")))]
>>>
>>
>> Plain addition are not supported currently.
>> We have not seen many cases with plain addition and this patch
>> will not accept plain addition.
>>
>>  
>>> This is why the intention behind the patch is important.  As it stands,
>>> it isn't clear what criteria the patch is using to distinguish "valid"
>>> fuse candidates from "invalid" ones.
>>>
>>
>> Intention behind this patch all variants of UNSPEC instructions are
>> supported and uses without UNSPEC are not supported in this patch.
> 
> But why make the distinction this way though?  UNSPEC is a very
> GCC-specific concept.  Whether something is an UNSPEC or some other
> RTL code depends largely on historical accident.  E.g. we have specific
> codes for VEC_SELECT, VEC_MERGE, and VEC_DUPLICATE, but don't have one
> for VEC_PERM (even for VEC_PERM_EXPR exists in gimple).
> 
> It seems unlikely that GCC's choice about whether to represent something
> as an UNSPEC or as another RTL code lines up neatly with the kind of
> codegen decisions that a good assembly programmer would make.
> 
> I suppose another way of asking is to turn this around and say: what
> kind of uses are you trying to exclude?  Presumably things are worse
> if you remove this function override.  But what makes them worse?
> What kind of uses cause the regression?
> 

Uses of fused load where load with low address uses are modified with load with high address uses.

Similarly load with high address uses are modified with load low address
uses.

This is the semantics of lxvp instructions which can occur through
UNSPEC uses otherwise it breaks the functionality and seen failure
in almost all vect regressions and SPEC benchmarks.


>>>>>>>> [...]
>>>>>>>> +  // Given insn_info pair I1 and I2, return true if offsets are in order.
>>>>>>>> +  virtual bool should_handle_unordered_insns (rtl_ssa::insn_info *i1,
>>>>>>>> +					      rtl_ssa::insn_info *i2) = 0;
>>>>>>>> +
>>>>>>>
>>>>>>> This name seems a bit misleading.  The function is used in:
>>>>>>>
>>>>>>> @@ -2401,6 +2405,9 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>>>>>>>        reversed = true;
>>>>>>>      }
>>>>>>>  
>>>>>>> +  if (!m_pass->should_handle_unordered_insns (i1, i2))
>>>>>>> +    return false;
>>>>>>> +
>>>>>>>    rtx cand_mems[2];
>>>>>>>    rtx reg_ops[2];
>>>>>>>    rtx pats[2];
>>>>>>>
>>>>>>> and so it acts as a general opt-out.  The insns aren't known to be unordered.
>>>>>>>
>>>>>>> It looks like the rs6000 override requires the original insns to be
>>>>>>> in offset order.  Could you say why that's necessary?  (Both in email
>>>>>>> and as a comment in the code.)
>>>>>>>
>>>>>>
>>>>>> Yes rs6000 requires the original load insns to be in offset order.
>>>>>> Some regression tests like vect-outer-4f fails if we do load pair
>>>>>> fusion with load offsets are not in offset order as this breaks lxvp 
>>>>>> semantics.
>>>>>
>>>>> How does it break the semantics though?  In principle, the generic code
>>>>> only fuses if it has "proved" that the loads can happen in either order.
>>>>> So it shouldn't matter which order the hardware does things in.
>>>>>
>>>>> Could you give an example of the kind of situation that you want
>>>>> to avoid, and why it generates the wrong result?
>>>>>
>>>>
>>>> (insn 31 62 32 2 (set (reg:V16QI 177 [ MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 64B] ])
>>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>>                 (const_int 64 [0x40])) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 64B]+0 S16 A16]))  {vsx_movv16qi_64bit}
>>>>      (nil))
>>>> (insn 32 31 16 2 (set (reg:V16QI 178 [ MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 80B] ])
>>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>>                 (const_int 80 [0x50])) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 80B]+0 S16 A16]))  {vsx_movv16qi_64bit}
>>>>      (nil))
>>>> (insn 16 32 21 2 (set (reg:V16QI 159 [ MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 16B] ])
>>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>>                 (const_int 16 [0x10])) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 16B]+0 S16 A16]))  {vsx_movv16qi_64bit}
>>>>      (nil))
>>>> (insn 21 16 22 2 (set (reg:V16QI 165 [ MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 32B] ])
>>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>>                 (const_int 32 [0x20])) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 32B]+0 S16 A16])) {vsx_movv16qi_64bit}
>>>>      (nil))
>>>> (insn 22 21 37 2 (set (reg:V16QI 166 [ MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 48B] ])
>>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>>                 (const_int 48 [0x30])) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)vectp.62_36 + 48B]+0 S16 A16])) {vsx_movv16qi_64bit}
>>>>      (nil))
>>>>
>>>> insn 22 and insn 31 is merged in the failure case and breaks the code.
>>>
>>> What specifically goes wrong though?  This is just a sequence of loads
>>> from the same base pointer, with no interdependencies, so it should be
>>> possible to do the loads in any order.
>>>
>>
>> Here in fuse_pair we set first and second based  as follows:
>>
>>   insn_info *first = (*i1 < *i2) ? i1 : i2;
>>   insn_info *second = (first == i1) ? i2 : i1;
>>
>> This makes higher offset with first and lower offset with second.
>> if (*i1 > *i2).
>>
>> and in set_multiword_subreg interface we pass first and second.
>> Hence above code breaks because subreg offsets with 256 bits are not set properly.
>>
>> If we pass i1 and i2 in set_multiword_subreg (i1, i2, load_p)
>> in fuse_pair should_handle_unordered_insns is not required in try_fuse_pair.
>>
>> I will send the patch by removing the above interface check 
>> in try_fuse_pair and pass i1 and i2 in set_multiword_subreg
> 
> Thanks for looking into it.
> 
> I think it'd be better to resolve the unspec discussion before
> posting another version of the patch though.
> 
Sure.
> Richard

Thanks & Regards
Ajit