From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=41HV=NG=linux.ibm.com=aagarwa1@sourceware.org>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1])
	by sourceware.org (Postfix) with ESMTPS id 7A12D39A49B5
	for <gcc-patches@gcc.gnu.org>; Tue,  4 Jun 2024 16:01:09 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7A12D39A49B5
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7A12D39A49B5
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717516872; cv=none;
	b=C6ytfALNA+iFxhirkpVS/wh1j5xE5u5kLHryTFOK9xwdkq1+T0N6aT1CWLlRfIyfxcVMSGFmR6uQaSrhSI0Z/UzZWjbCuYQONcsTYKtuUFeJ+rR7/THH2I65XAFyepMQ6dOmh+cWS+7sDUA9y7yPTgM/H20WN+nAkxYs01sUFCs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1717516872; c=relaxed/simple;
	bh=OGnUEMxZjn3xKyV+4Llh7I90kAHF+/t7uvtjiYGMr2U=;
	h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:From:To; b=x/GcJjfLD1yjlneI1mpZat6JysVyeMpaSrMi4Q8PI5NrdpE/aPk+V1lz7KNQ1T/ZMfmtiUB5Kd/fz/K8gBJqaooQkmF/ZPjYgkWVs2+aRUT3h13Z0UnBJdv82h2zXqSXG9CKAE3FaC60Z1LwgEI8J3RQbry3Lrabd8/xS62X8Ps=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from pps.filterd (m0353728.ppops.net [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 454Fpo4A028131;
	Tue, 4 Jun 2024 16:01:06 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com;
 h=content-transfer-encoding : content-type : date : from : in-reply-to :
 message-id : mime-version : references : subject : to; s=pp1;
 bh=HUYLfL0Kpc2hHAikjwtR+0FVr/59rzDc+4rJq3LIP38=;
 b=Wn4N9dJueHvGIkMb3gLTg54NvHIRK0FJ4wu5diZU7AJ5/1idq/MsbV+mQNFuqrv3qVdV
 dG+l9fY4g0eVrZEpFFH7kji/fQxMivWoznbRgLzUAreOJisUwBPxGC+VbWDYgtXeGeKd
 T6kmr2EOjefvUo8Aok00YP8Xz42BGOovpQhjv3LNKTqosUsqZqKEseHt7oEf2T9fDg10
 VBKImV1pEzYjGtX7BfRAu23mzhq4saHBhtGuEg01S01HuYTqkHVkA+9D4Cvx4lq2uLa5
 jehzhuqW7dTCucdSbjp9fcSFJNxKk3xMFFdcoOJLco/nguIUKG9kPuF9fLKt6raisYHp nw== 
Received: from pps.reinject (localhost [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yj5xj018g-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Tue, 04 Jun 2024 16:01:04 +0000
Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1])
	by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 454G14Ew011803;
	Tue, 4 Jun 2024 16:01:04 GMT
Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yj5xj018d-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Tue, 04 Jun 2024 16:01:04 +0000
Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1])
	by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 454EVeVN026588;
	Tue, 4 Jun 2024 16:01:03 GMT
Received: from smtprelay03.wdc07v.mail.ibm.com ([172.16.1.70])
	by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3yggp2xete-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Tue, 04 Jun 2024 16:01:03 +0000
Received: from smtpav06.dal12v.mail.ibm.com (smtpav06.dal12v.mail.ibm.com [10.241.53.105])
	by smtprelay03.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 454G0xhj5309142
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Tue, 4 Jun 2024 16:01:01 GMT
Received: from smtpav06.dal12v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 0D17758068;
	Tue,  4 Jun 2024 16:00:59 +0000 (GMT)
Received: from smtpav06.dal12v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 0605D5807A;
	Tue,  4 Jun 2024 16:00:55 +0000 (GMT)
Received: from [9.43.79.23] (unknown [9.43.79.23])
	by smtpav06.dal12v.mail.ibm.com (Postfix) with ESMTP;
	Tue,  4 Jun 2024 16:00:54 +0000 (GMT)
Message-ID: <13fea2af-a64f-45c7-ae47-131191ac5871@linux.ibm.com>
Date: Tue, 4 Jun 2024 21:30:53 +0530
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [Patch, rs6000, aarch64, middle-end] Add implementation for
 different targets for pair mem fusion
From: Ajit Agarwal <aagarwa1@linux.ibm.com>
To: Alex Coplan <alex.coplan@arm.com>, "Kewen.Lin" <linkw@linux.ibm.com>,
        Segher Boessenkool <segher@kernel.crashing.org>,
        Michael Meissner <meissner@linux.ibm.com>,
        Peter Bergner <bergner@linux.ibm.com>,
        David Edelsohn <dje.gcc@gmail.com>,
        gcc-patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
References: <53ba46de-6c01-4c68-bd98-1ba6950a793a@linux.ibm.com>
 <mpt34py71e8.fsf@arm.com>
 <cc31b3fb-552b-43bd-9029-5b84f6e6c437@linux.ibm.com>
 <mpted9i59n1.fsf@arm.com>
 <95a33b0a-8090-4218-a62c-da1f53bebbb7@linux.ibm.com>
 <mpt4jaa2zh9.fsf@arm.com>
 <9efb06e2-74f1-42f1-8a52-931d13a57ebc@linux.ibm.com>
 <mpt5xuq1csh.fsf@arm.com>
 <957bd4b7-11dd-4ebd-adf1-1c0815884944@linux.ibm.com>
 <mptikyqyutw.fsf@arm.com>
 <99a53e50-ed04-4bd8-baa5-f13d5376585a@linux.ibm.com>
 <mpt34puyt3r.fsf@arm.com>
 <6a87ceb9-0de2-49f2-8998-17391c213c7d@linux.ibm.com>
Content-Language: en-US
In-Reply-To: <6a87ceb9-0de2-49f2-8998-17391c213c7d@linux.ibm.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: SoDzHqoFzBwR_EFwxjKIwepEF9j8eUqz
X-Proofpoint-ORIG-GUID: BpqS_cBhPstC-vSOUL9zHQ5jny2YTAOn
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16
 definitions=2024-06-04_09,2024-06-04_01,2024-05-17_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 spamscore=0
 lowpriorityscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0
 priorityscore=1501 phishscore=0 adultscore=0 impostorscore=0 clxscore=1015
 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2405010000 definitions=main-2406040128
X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_MANYTO,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hello Richard:

On 03/06/24 9:28 pm, Ajit Agarwal wrote:
> Hello Richard:
> 
> On 03/06/24 8:24 pm, Richard Sandiford wrote:
>> Ajit Agarwal <aagarwa1@linux.ibm.com> writes:
>>> Hello Richard:
>>>
>>> On 03/06/24 7:47 pm, Richard Sandiford wrote:
>>>> Ajit Agarwal <aagarwa1@linux.ibm.com> writes:
>>>>> On 03/06/24 5:03 pm, Richard Sandiford wrote:
>>>>>> Ajit Agarwal <aagarwa1@linux.ibm.com> writes:
>>>>>>>> [...]
>>>>>>>> If it is intentional, what distinguishes things like vperm and xxinsertw
>>>>>>>> (and all other unspecs) from plain addition?
>>>>>>>>
>>>>>>>>   [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
>>>>>>>>         (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")
>>>>>>>> 		    (match_operand:VSX_F 2 "vsx_register_operand" "wa")))]
>>>>>>>>
>>>>>>>
>>>>>>> Plain addition are not supported currently.
>>>>>>> We have not seen many cases with plain addition and this patch
>>>>>>> will not accept plain addition.
>>>>>>>
>>>>>>>  
>>>>>>>> This is why the intention behind the patch is important.  As it stands,
>>>>>>>> it isn't clear what criteria the patch is using to distinguish "valid"
>>>>>>>> fuse candidates from "invalid" ones.
>>>>>>>>
>>>>>>>
>>>>>>> Intention behind this patch all variants of UNSPEC instructions are
>>>>>>> supported and uses without UNSPEC are not supported in this patch.
>>>>>>
>>>>>> But why make the distinction this way though?  UNSPEC is a very
>>>>>> GCC-specific concept.  Whether something is an UNSPEC or some other
>>>>>> RTL code depends largely on historical accident.  E.g. we have specific
>>>>>> codes for VEC_SELECT, VEC_MERGE, and VEC_DUPLICATE, but don't have one
>>>>>> for VEC_PERM (even for VEC_PERM_EXPR exists in gimple).
>>>>>>
>>>>>> It seems unlikely that GCC's choice about whether to represent something
>>>>>> as an UNSPEC or as another RTL code lines up neatly with the kind of
>>>>>> codegen decisions that a good assembly programmer would make.
>>>>>>
>>>>>> I suppose another way of asking is to turn this around and say: what
>>>>>> kind of uses are you trying to exclude?  Presumably things are worse
>>>>>> if you remove this function override.  But what makes them worse?
>>>>>> What kind of uses cause the regression?
>>>>>>
>>>>>
>>>>> Uses of fused load where load with low address uses are modified with load with high address uses.
>>>>>
>>>>> Similarly load with high address uses are modified with load low address
>>>>> uses.
>>>>
>>>> It sounds like something is going wrong the subreg updates.
>>>> Can you give an example of where this occurs?  For instance...
>>>>
>>>>> This is the semantics of lxvp instructions which can occur through
>>>>> UNSPEC uses otherwise it breaks the functionality and seen failure
>>>>> in almost all vect regressions and SPEC benchmarks.
>>>>
>>>> ...could you take one of the simpler vect regressions, show the before
>>>> and after RTL, and why the transformation is wrong?
>>>
>>> Before the change:
>>>
>>> (insn 32 30 103 5 (set (reg:V16QI 127 [ _32 ])
>>>         (mem:V16QI (reg:DI 130 [ ivtmp.37 ]) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)_55]+0 S16 A128])) {vsx_movv16qi_64bit}
>>>      (nil))
>>> (insn 103 32 135 5 (set (reg:V16QI 173 [ _32 ])
>>>         (mem:V16QI (plus:DI (reg:DI 130 [ ivtmp.37 ])
>>>                 (const_int 16 [0x10])) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)_55]+0 S16 A128])) {vsx_movv16qi_64bit}
>>>      (nil))
>>> (insn 135 103 34 5 (set (reg:DI 155)
>>>         (plus:DI (reg:DI 130 [ ivtmp.37 ])
>>>             (const_int 16 [0x10]))) 66 {*adddi3}
>>>      (nil))
>>> (insn 34 135 104 5 (set (reg:V16QI 143 [ _27 ])
>>>         (unspec:V16QI [
>>>                 (reg:V16QI 127 [ _32 ]) repeated x2
>>>                 (reg:V16QI 152)
>>>             ] UNSPEC_VPERM))  {altivec_vperm_v16qi_direct}
>>>      (expr_list:REG_DEAD (reg:V16QI 127 [ _32 ])
>>>         (nil)))
>>> (insn 104 34 35 5 (set (reg:V16QI 174 [ _27 ])
>>>         (unspec:V16QI [
>>>                 (reg:V16QI 173 [ _32 ]) repeated x2
>>>                 (reg:V16QI 152)
>>>             ] UNSPEC_VPERM)) 
>>>  {altivec_vperm_v16qi_direct}
>>>
>>>
>>> After the change:
>>>
>>> (insn 103 30 135 5 (set (reg:OO 127 [ _32 ])
>>>         (mem:OO (reg:DI 130 [ ivtmp.37 ]) [1 MEM <vector(8) short unsigned int> [(short unsigned int *)_55]+0 S16 A128])) {*movoo}
>>>      (nil))
>>> (insn 135 103 34 5 (set (reg:DI 155)
>>>         (plus:DI (reg:DI 130 [ ivtmp.37 ])
>>>             (const_int 16 [0x10]))) 66 {*adddi3}
>>>      (nil))
>>> (insn 34 135 104 5 (set (reg:V16QI 143 [ _27 ])
>>>         (unspec:V16QI [
>>>                 (subreg:V16QI (reg:OO 127 [ _32 ]) 16)
>>>                 (subreg:V16QI (reg:OO 127 [ _32 ]) 16)
>>>                 (reg:V16QI 152)
>>>             ] UNSPEC_VPERM)) {altivec_vperm_v16qi_direct}
>>>      (expr_list:REG_DEAD (reg:OO 127 [ _32 ])
>>>         (nil)))
>>> (insn 104 34 35 5 (set (reg:V16QI 174 [ _27 ])
>>>         (unspec:V16QI [
>>>                 (subreg:V16QI (reg:OO 127 [ _32 ]) 0)
>>>                 (subreg:V16QI (reg:OO 127 [ _32 ]) 0)
>>>                 (reg:V16QI 152)
>>>             ] UNSPEC_VPERM))  {altivec_vperm_v16qi_direct}
>>>
>>> After the change the tests passes.
>>
>> But isn't this an example of the optimisation working on unspecs,
>> and working correctly?
>>
> 
> Yes this is working fine.
> 
>> I meant instead: could you give an example of the vect regressions
>> that you saw with the unspec test removed?  You mentioned that many
>> vect tests regressed without the unspec test, so it would be helpful
>> to see these failures in action.  That is, it'd be helpful to take
>> a compiler that doesn't have the unspec tests and show:
>>
>> - the relevant rtl of one of the failing tests before the pass runs
>>   (when the rtl is still correct)
>>
>> - the relevant rtl of one of the failing tests after the pass runs
>>   (when the rtl is now incorrect)
>>
>> - the reason why the rtl after the pass is wrong
>>
> 
> I meant to say this is the semantics of lxvp instructions which can occur through
> UNSPEC uses. If we dont use above semantics in UNSPEC the vect regressions and spec 
> fails functionality.
> 
> I will find a test without UNSPEC and let you know.
>

I have fixed all the issues with all RTL uses of fused load other than UNSPEC.
With the fixes all RTL codes uses of fused load are supported along with
UNSPEC.

Thanks for suggesting to use all RTL for fused load along with UNSPEC.

I will separate patch with all the fixes soon.

Thanks & Regards
Ajit 
> Thanks & Regards
> Ajit
>  
>> Thanks,
>> Richard