From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 238973858C54 for ; Tue, 9 Aug 2022 03:01:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 238973858C54 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2792vdhC026018; Tue, 9 Aug 2022 03:01:14 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3huf8r02xx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Aug 2022 03:01:14 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 27931EF9013731; Tue, 9 Aug 2022 03:01:14 GMT Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3huf8r02w1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Aug 2022 03:01:14 +0000 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2792p0Yg029442; Tue, 9 Aug 2022 03:01:11 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma03fra.de.ibm.com with ESMTP id 3hsfx8t4qb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Aug 2022 03:01:11 +0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 279319Dp27656626 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 9 Aug 2022 03:01:09 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F40A742041; Tue, 9 Aug 2022 03:01:08 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1A55242042; Tue, 9 Aug 2022 03:01:07 +0000 (GMT) Received: from [9.197.252.244] (unknown [9.197.252.244]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 9 Aug 2022 03:01:06 +0000 (GMT) Message-ID: Date: Tue, 9 Aug 2022 11:01:05 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Content-Language: en-US To: Xionghu Luo Cc: segher@kernel.crashing.org, Xionghu Luo , gcc-patches@gcc.gnu.org, David Edelsohn References: <20220808034247.2618809-1-xionghuluo@tencent.com> From: "Kewen.Lin" In-Reply-To: <20220808034247.2618809-1-xionghuluo@tencent.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: __NsUcjT69wBZpid1O5G136LBrmrLekG X-Proofpoint-GUID: 0ZQVjcydtcSel9T1JcMg7FI_0DWR1HdE X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-08_14,2022-08-08_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 priorityscore=1501 mlxlogscore=999 impostorscore=0 bulkscore=0 spamscore=0 lowpriorityscore=0 adultscore=0 mlxscore=0 malwarescore=0 phishscore=0 clxscore=1011 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2208090013 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2022 03:01:18 -0000 Hi Xionghu, Thanks for the fix. on 2022/8/8 11:42, Xionghu Luo wrote: > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent. So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI > (subreg:V4SI (reg:V16QI 139) 0) > (subreg:V4SI (reg:V16QI 140) 0)) > [const_int 0 4 1 5])) > > Then combine pass could do the nested vec_select optimization > in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) > 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} > > => > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) > 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} > > The endianness check need only once at ASM generation finally. > ASM would be better due to nested vec_select simplified to simple scalar > load. > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} Sorry, no -m32 for LE testing. I noticed the attachement in that PR didn't include the test case (though the changelog has it), so I re-tested it again, nothing changed. :) > Linux(Thanks to Kewen), OK for master? Or should we revert r12-4496 to > restore to the UNSPEC implementation? > I have some concern on those changed "altivec_*_direct", IMHO the suffix "_direct" is normally to indicate the define_insn is mapped to the corresponding hw insn directly. With this change, for example, altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks misleading. Maybe we can add the corresponding _direct_le and _direct_be versions, both are mapped into the same insn but have different RTL patterns. Looking forward to Segher's and David's suggestions. > gcc/ChangeLog: > PR target/106069 > * config/rs6000/altivec.md (altivec_vmrghb): Emit same native > RTL for BE and LE. > (altivec_vmrghh): Likewise. > (altivec_vmrghw): Likewise. > (*altivec_vmrghsf): Adjust. > (altivec_vmrglb): Likewise. > (altivec_vmrglh): Likewise. > (altivec_vmrglw): Likewise. > (*altivec_vmrglsf): Adjust. > (altivec_vmrghb_direct): Emit different ASM for BE and LE. > (altivec_vmrghh_direct): Likewise. > (altivec_vmrghw_direct_): Likewise. > (altivec_vmrglb_direct): Likewise. > (altivec_vmrglh_direct): Likewise. > (altivec_vmrglw_direct_): Likewise. > (vec_widen_smult_hi_v16qi): Adjust. > (vec_widen_smult_lo_v16qi): Adjust. > (vec_widen_umult_hi_v16qi): Adjust. > (vec_widen_umult_lo_v16qi): Adjust. > (vec_widen_smult_hi_v8hi): Adjust. > (vec_widen_smult_lo_v8hi): Adjust. > (vec_widen_umult_hi_v8hi): Adjust. > (vec_widen_umult_lo_v8hi): Adjust. > * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same > native RTL for BE and LE. > * config/rs6000/vsx.md (vsx_xxmrghw_): Likewise. > (vsx_xxmrglw_): Likewise. > > gcc/testsuite/ChangeLog: > PR target/106069 > * gcc.target/powerpc/pr106069.C: New test. > > Signed-off-by: Xionghu Luo > --- > gcc/config/rs6000/altivec.md | 122 ++++++++++++-------- > gcc/config/rs6000/rs6000.cc | 36 +++--- > gcc/config/rs6000/vsx.md | 16 +-- > gcc/testsuite/gcc.target/powerpc/pr106069.C | 118 +++++++++++++++++++ > 4 files changed, 209 insertions(+), 83 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069.C > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 2c4940f2e21..8d9c0109559 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1144,11 +1144,7 @@ (define_expand "altivec_vmrghb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > - : gen_altivec_vmrglb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + emit_insn (gen_altivec_vmrghb_direct (operands[0], operands[1], operands[2])); > DONE; > }) > > @@ -1167,7 +1163,12 @@ (define_insn "altivec_vmrghb_direct" > (const_int 6) (const_int 22) > (const_int 7) (const_int 23)])))] > "TARGET_ALTIVEC" > - "vmrghb %0,%1,%2" > + { > + if (BYTES_BIG_ENDIAN) > + return "vmrghb %0,%1,%2"; > + else > + return "vmrglb %0,%2,%1"; > + } > [(set_attr "type" "vecperm")]) > > (define_expand "altivec_vmrghh" > @@ -1176,11 +1177,7 @@ (define_expand "altivec_vmrghh" > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct > - : gen_altivec_vmrglh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + emit_insn (gen_altivec_vmrghh_direct (operands[0], operands[1], operands[2])); > DONE; > }) > > @@ -1195,7 +1192,12 @@ (define_insn "altivec_vmrghh_direct" > (const_int 2) (const_int 10) > (const_int 3) (const_int 11)])))] > "TARGET_ALTIVEC" > - "vmrghh %0,%1,%2" > + { > + if (BYTES_BIG_ENDIAN) > + return "vmrghh %0,%1,%2"; > + else > + return "vmrglh %0,%2,%1"; > + } > [(set_attr "type" "vecperm")]) > > (define_expand "altivec_vmrghw" > @@ -1204,12 +1206,8 @@ (define_expand "altivec_vmrghw" > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si > - : gen_altivec_vmrglw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + emit_insn ( > + gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2])); > DONE; > }) > [snip] > [(set_attr "type" "vecperm")]) > diff --git a/gcc/testsuite/gcc.target/powerpc/pr106069.C b/gcc/testsuite/gcc.target/powerpc/pr106069.C > new file mode 100644 > index 00000000000..56219a74692 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr106069.C Since this is a C++ test case, it should be placed in gcc/testsuite/g++.target/powerpc/. > @@ -0,0 +1,118 @@ > +/* { dg-do run } */ This case requires altivec, it needs something like: /* { dg-require-effective-target vmx_hw } */ /* { dg-options "-maltivec" } */ BR, Kewen > + > +extern "C" void * > +memcpy (void *, const void *, unsigned long); > +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; > + > +union > +{ > + native_simd_type V; > + int R[4]; > +} store_le_vec; > + > +struct S > +{ > + S () = default; > + S (unsigned B0) > + { > + native_simd_type val{B0}; > + m_simd = val; > + } > + void store_le (unsigned int out[]) > + { > + store_le_vec.V = m_simd; > + unsigned int x0 = store_le_vec.R[0]; > + memcpy (out, &x0, 1); > + } > + S rotl (unsigned int r) > + { > + native_simd_type rot{r}; > + return __builtin_vec_rl (m_simd, rot); > + } > + void operator+= (S other) > + { > + m_simd = __builtin_vec_add (m_simd, other.m_simd); > + } > + void operator^= (S other) > + { > + m_simd = __builtin_vec_xor (m_simd, other.m_simd); > + } > + static void transpose (S &B0, S B1, S B2, S B3) > + { > + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); > + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); > + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); > + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); > + B0 = __builtin_vec_mergeh (T0, T1); > + B3 = __builtin_vec_mergel (T2, T3); > + } > + S (native_simd_type x) : m_simd (x) {} > + native_simd_type m_simd; > +}; > + > +void > +foo (unsigned int output[], unsigned state[]) > +{ > + S R00 = state[0]; > + S R01 = state[0]; > + S R02 = state[2]; > + S R03 = state[0]; > + S R05 = state[5]; > + S R06 = state[6]; > + S R07 = state[7]; > + S R08 = state[8]; > + S R09 = state[9]; > + S R10 = state[10]; > + S R11 = state[11]; > + S R12 = state[12]; > + S R13 = state[13]; > + S R14 = state[4]; > + S R15 = state[15]; > + for (int r = 0; r != 10; ++r) > + { > + R09 += R13; > + R11 += R15; > + R05 ^= R09; > + R06 ^= R10; > + R07 ^= R11; > + R07 = R07.rotl (7); > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 ^= R01; > + R13 ^= R02; > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 = R12.rotl (8); > + R13 = R13.rotl (8); > + R10 += R15; > + R11 += R12; > + R08 += R13; > + R09 += R14; > + R05 ^= R10; > + R06 ^= R11; > + R07 ^= R08; > + R05 = R05.rotl (7); > + R06 = R06.rotl (7); > + R07 = R07.rotl (7); > + } > + R00 += state[0]; > + S::transpose (R00, R01, R02, R03); > + R00.store_le (output); > +} > + > +unsigned int res[1]; > +unsigned main_state[]{1634760805, 60878, 2036477234, 6, > + 0, 825562964, 1471091955, 1346092787, > + 506976774, 4197066702, 518848283, 118491664, > + 0, 0, 0, 0}; > +int > +main () > +{ > + foo (res, main_state); > + if (res[0] != 0x41fcef98) > + __builtin_abort (); > +}