From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id D27FE3854813; Fri, 4 Jun 2021 02:15:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D27FE3854813 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 1542FfMh138843; Thu, 3 Jun 2021 22:15:45 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 38yb7rg03a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Jun 2021 22:15:45 -0400 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1542FiOk138896; Thu, 3 Jun 2021 22:15:44 -0400 Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 38yb7rg033-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Jun 2021 22:15:44 -0400 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1542DSO8023525; Fri, 4 Jun 2021 02:15:42 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma05fra.de.ibm.com with ESMTP id 38ud87sttb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 04 Jun 2021 02:15:42 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1542Fc5W31588774 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 4 Jun 2021 02:15:38 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BF7D5AE063; Fri, 4 Jun 2021 02:15:38 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 89E1DAE058; Fri, 4 Jun 2021 02:15:36 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.200.47.109]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Fri, 4 Jun 2021 02:15:36 +0000 (GMT) Subject: Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085] To: wschmidt@linux.ibm.com, Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, guojiufu@linux.ibm.com, linkw@gcc.gnu.org References: <20210602081932.2683429-1-luoxhu@linux.ibm.com> <20210602222003.GJ18427@gate.crashing.org> <9d59682e-e333-549e-14af-1566a9c7a740@linux.ibm.com> From: Xionghu Luo Message-ID: Date: Fri, 4 Jun 2021 10:15:33 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.0; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <9d59682e-e333-549e-14af-1566a9c7a740@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: mQv9Hd4gBg-gRPv2_U1XE3LqxpTCR5zS X-Proofpoint-GUID: zuCz4Q8ZAMyjUXbiikb4i_Ck9i_9UPgY X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-06-04_01:2021-06-04, 2021-06-03 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 adultscore=0 spamscore=0 priorityscore=1501 phishscore=0 mlxlogscore=999 bulkscore=0 lowpriorityscore=0 clxscore=1015 suspectscore=0 impostorscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2106040014 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Jun 2021 02:15:48 -0000 Hi, On 2021/6/3 21:09, Bill Schmidt wrote: > On 6/2/21 7:46 PM, Xionghu Luo wrote: >> Hi, >> >> On 2021/6/3 06:20, Segher Boessenkool wrote: >>> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote: >>>> On P8LE, extra rot64+rot64 load or store instructions are generated >>>> in float128 to vector __int128 conversion. >>>> >>>> This patch teaches pass swaps to also handle such pattens to remove >>>> extra swap instructions. >>> Did you check if this is already handled by simplify-rtx if the mode had >>> been TImode (not V1TImode)?  If not, why do you not handle it there? >> I tried to do it in combine or peephole, the later pass split2 >> or split3 will still split it to rotate + rotate again as we have split >> after reload, and this pattern is quite P8LE specific, so put it in pass >> swap.  The simplify-rtx could simplify >> r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register >> operations already. >> >> >> vsx.md: >> >> ;; The post-reload split requires that we re-permute the source >> ;; register in case it is still live. >> (define_split >>    [(set (match_operand:VSX_LE_128 0 "memory_operand") >>          (match_operand:VSX_LE_128 1 "vsx_register_operand"))] >>    "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && >> !TARGET_P9_VECTOR >>     && !altivec_indexed_or_indirect_operand (operands[0], mode)" >>    [(const_int 0)] >> { >>    rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); >>    rs6000_emit_le_vsx_permute (operands[0], operands[1], mode); >>    rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); >>    DONE; >> }) > > Note also that swap optimization can handle more general cases than > simplify-rtx.  In my view it's best to have it covered in both places. > But this pattern is after reload quite later than swap optimization, so it couldn't remove the swap operations as expected, I have a below example that matched the above pattern in pass split2, this may be not quite appropriate as there is a function call between the load and store. extern vector __int128 foo1 (__float128 a); int foo2 () { __binary128 f128 = {3.1415926535897932384626433832795028841971693993751058Q}; vector __int128 ret = foo1 (f128); return ret[0]; } 295r.split (*see insn 35, 36, 37*): ... Splitting with gen_split_558 (vsx.md:1079) ... (insn 33 12 34 2 (set (reg/f:DI 9 %r9 [121]) (high:DI (unspec:DI [ (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) (reg:DI 2 %r2) ] UNSPEC_TOCREL))) "pr100085.c":279:25 715 {*largetoc_high} (nil)) (insn 34 33 6 2 (set (reg/f:DI 9 %r9 [121]) (lo_sum:DI (reg/f:DI 9 %r9 [121]) (unspec:DI [ (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) (reg:DI 2 %r2) ] UNSPEC_TOCREL))) "pr100085.c":279:25 717 {*largetoc_low} (expr_list:REG_EQUAL (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) (nil))) (insn 6 34 8 2 (set (reg:V1TI 66 %v2 [123]) (rotate:V1TI (mem/c:V1TI (reg/f:DI 9 %r9 [121]) [1 f128+0 S16 A128]) (const_int 64 [0x40]))) "pr100085.c":279:25 1113 {*vsx_le_permute_v1ti} (nil)) (insn 8 6 9 2 (set (reg:V1TI 66 %v2) (rotate:V1TI (reg:V1TI 66 %v2 [123]) (const_int 64 [0x40]))) "pr100085.c":279:25 1113 {*vsx_le_permute_v1ti} (nil)) (call_insn 9 8 32 2 (parallel [ (set (reg:V1TI 66 %v2) (call (mem:SI (symbol_ref:DI ("foo1") [flags 0x41] ) [0 foo 1 S4 A8]) (const_int 0 [0]))) (use (const_int 0 [0])) (clobber (reg:DI 96 lr)) ]) "pr100085.c":279:25 735 {*call_value_nonlocal_aixdi} (expr_list:REG_CALL_DECL (symbol_ref:DI ("foo1") [flags 0x41] ) (nil)) (expr_list (use (reg:DI 2 %r2)) (expr_list:KF (use (reg:KF 66 %v2)) (nil)))) (insn 32 9 35 2 (set (reg:DI 9 %r9 [138]) (plus:DI (reg/f:DI 1 %r1) (const_int 32 [0x20]))) "pr100085.c":279:25 66 {*adddi3} (nil)) (insn 35 32 36 2 (set (reg:V1TI 66 %v2) (rotate:V1TI (reg:V1TI 66 %v2) (const_int 64 [0x40]))) "pr100085.c":279:25 1113 {*vsx_le_permute_v1ti} (nil)) (insn 36 35 37 2 (set (mem/c:V1TI (reg:DI 9 %r9 [138]) [2 %sfp+32 S16 A128]) (rotate:V1TI (reg:V1TI 66 %v2) (const_int 64 [0x40]))) "pr100085.c":279:25 1113 {*vsx_le_permute_v1ti} (nil)) (insn 37 36 28 2 (set (reg:V1TI 66 %v2) (rotate:V1TI (reg:V1TI 66 %v2) (const_int 64 [0x40]))) "pr100085.c":279:25 1113 {*vsx_le_permute_v1ti} (nil)) (insn 28 37 17 2 (set (reg:DI 3 %r3 [133]) (mem/c:DI (plus:DI (reg/f:DI 1 %r1) (const_int 32 [0x20])) [2 %sfp+32 S8 A128])) "pr100085.c":279:25 636 {*movdi_internal64} (nil)) (insn 17 28 18 2 (set (reg/i:DI 3 %r3) (sign_extend:DI (reg:SI 3 %r3 [129]))) "pr100085.c":281:1 31 {extendsidi2} (nil)) (insn 18 17 30 2 (use (reg/i:DI 3 %r3)) "pr100085.c":281:1 -1 (nil)) -- Thanks, Xionghu