From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 10AA63858D35 for ; Wed, 18 Jan 2023 09:12:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 10AA63858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 30I8CwbD014625; Wed, 18 Jan 2023 09:12:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=4a2js0Ab99WoZkK/mKNnjQKSYzQI1LrwTk4OArGcinY=; b=DMJsrfc9lblISEDSLFr2HwomsWAZUAMygxNmw8T08WyxFzVGrux7q46EedNL9J/vr7SF 60NhubuCt0V6zicG58TDI+UiTiUpZkRUM2LMk6c0TMA729+DOr2HxVpRvPDDKLdJ5vs7 /PV6TvsLH44s1tP1c0ZFFAsrt62oA7w2Mu7HcO0wkN+s+lnvcgFPyyY4ljBMpPyKdYSt bN3nAIwPhr82+RKjPmaFsXIxUY1WUKpH6mX2ZHhzmtJLqO0Jzk//Inr+OLEsmZLqOhaw CxYA3V6s9MeE94s33ioH86JCju/1OKz/e8MVPPpswosqsr4twJ8sPBgv+xt7FfYlRpiM zg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3n6d2ds9vc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 18 Jan 2023 09:12:03 +0000 Received: from m0098421.ppops.net (m0098421.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 30I8GMZK028365; Wed, 18 Jan 2023 09:12:02 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3n6d2ds9v0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 18 Jan 2023 09:12:02 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 30HM8lPq004700; Wed, 18 Jan 2023 09:12:01 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3n3m16n121-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 18 Jan 2023 09:12:00 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 30I9BwB951184040 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 18 Jan 2023 09:11:58 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 45BFD2004E; Wed, 18 Jan 2023 09:11:58 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9C44E20043; Wed, 18 Jan 2023 09:11:55 +0000 (GMT) Received: from [9.197.238.43] (unknown [9.197.238.43]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 18 Jan 2023 09:11:55 +0000 (GMT) Message-ID: Date: Wed, 18 Jan 2023 17:11:54 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Content-Language: en-US To: Xionghu Luo , Segher Boessenkool Cc: Xionghu Luo , gcc-patches@gcc.gnu.org, David Edelsohn , Jakub Jelinek References: <20220808034247.2618809-1-xionghuluo@tencent.com> <76035a5e-f0d8-8bc5-93e9-cfb08b2127f8@gmail.com> <20220810170700.GA25951@gate.crashing.org> <472c1531-aae6-123e-6b0c-8827f5585879@gmail.com> <5df1a7fc-dacf-72e2-041d-66624926091f@linux.ibm.com> <37b57a54-f98e-96a3-edff-866c8aae4c7d@gmail.com> <5418ebd2-d544-f4cc-d930-bdde64ad2807@gmail.com> From: "Kewen.Lin" In-Reply-To: <5418ebd2-d544-f4cc-d930-bdde64ad2807@gmail.com> Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: GZBimEJH9Q_ZYd6uAvg3N-yFMFlTctl2 X-Proofpoint-ORIG-GUID: _w9luBEE71BOMEajqyyKOH7FAT5lWxPG Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.923,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-01-18_03,2023-01-17_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 mlxscore=0 mlxlogscore=999 malwarescore=0 lowpriorityscore=0 phishscore=0 adultscore=0 suspectscore=0 clxscore=1011 bulkscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2301180078 X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Segher, I guessed that this patch escaped from your radar. :) As Jakub asked the status in PR106069, I applied this attached patch from Xionghu to the latest trunk, re-tested it and confirmed that it's still bootstrapped and regtested on powerpc64-linux-gnu P8 and powerpc64le-linux-gnu P9 and P10. This new version has separated out direct le and be, it's more clear than before, it looked good to me. What do you think of this? Looking forward to your opinion. btw, the link in archives: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600169.html BR, Kewen on 2022/8/24 09:24, Xionghu Luo wrote: > 主题: > Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] > From: > Xionghu Luo > 日期: > 2022/8/24, 09:24 > > 收件人: > "Kewen.Lin" , Segher Boessenkool > 抄送: > Xionghu Luo , gcc-patches@gcc.gnu.org, David Edelsohn , Segher Boessenkool > > > Hi Segher, I'd like to resend and ping for this patch. Thanks. > > v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch > > From 23bffdacdf0eb1140c7a3571e6158797f4818d57 Mon Sep 17 00:00:00 2001 > From: Xionghu Luo > Date: Thu, 4 Aug 2022 03:44:58 +0000 > Subject: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the > UNSPECS [PR106069] > > v4: Update per comments. > v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match > the actual output ASM vmrglb. Likewise for all similar xxx_direct_le > patterns. > v2: Split the direct pattern to be and le with same RTL but different insn. > > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent. So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI > (subreg:V4SI (reg:V16QI 139) 0) > (subreg:V4SI (reg:V16QI 140) 0)) > [const_int 0 4 1 5])) > > Then combine pass could do the nested vec_select optimization > in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) > 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} > > => > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) > 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} > > The endianness check need only once at ASM generation finally. > ASM would be better due to nested vec_select simplified to simple scalar > load. > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} > Linux. > > gcc/ChangeLog: > > PR target/106069 > * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. > (altivec_vmrghb_direct_be): New pattern for BE. > (altivec_vmrghb_direct_le): New pattern for LE. > (altivec_vmrghh_direct): Remove. > (altivec_vmrghh_direct_be): New pattern for BE. > (altivec_vmrghh_direct_le): New pattern for LE. > (altivec_vmrghw_direct_): Remove. > (altivec_vmrghw_direct__be): New pattern for BE. > (altivec_vmrghw_direct__le): New pattern for LE. > (altivec_vmrglb_direct): Remove. > (altivec_vmrglb_direct_be): New pattern for BE. > (altivec_vmrglb_direct_le): New pattern for LE. > (altivec_vmrglh_direct): Remove. > (altivec_vmrglh_direct_be): New pattern for BE. > (altivec_vmrglh_direct_le): New pattern for LE. > (altivec_vmrglw_direct_): Remove. > (altivec_vmrglw_direct__be): New pattern for BE. > (altivec_vmrglw_direct__le): New pattern for LE. > * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): > Adjust. > * config/rs6000/vsx.md: Likewise. > > gcc/testsuite/ChangeLog: > > PR target/106069 > * g++.target/powerpc/pr106069.C: New test. > > Signed-off-by: Xionghu Luo > --- > gcc/config/rs6000/altivec.md | 222 ++++++++++++++------ > gcc/config/rs6000/rs6000.cc | 24 +-- > gcc/config/rs6000/vsx.md | 28 +-- > gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++ > 4 files changed, 307 insertions(+), 85 deletions(-) > create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 2c4940f2e21..c6a381908cb 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > - : gen_altivec_vmrglb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghb_direct" > +(define_insn "altivec_vmrghb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct" > (const_int 5) (const_int 21) > (const_int 6) (const_int 22) > (const_int 7) (const_int 23)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrghb %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 2 "register_operand" "v") > + (match_operand:V16QI 1 "register_operand" "v")) > + (parallel [(const_int 8) (const_int 24) > + (const_int 9) (const_int 25) > + (const_int 10) (const_int 26) > + (const_int 11) (const_int 27) > + (const_int 12) (const_int 28) > + (const_int 13) (const_int 29) > + (const_int 14) (const_int 30) > + (const_int 15) (const_int 31)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrghb %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh" > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct > - : gen_altivec_vmrglh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghh_direct" > +(define_insn "altivec_vmrghh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > - (vec_select:V8HI > + (vec_select:V8HI > (vec_concat:V16HI > (match_operand:V8HI 1 "register_operand" "v") > (match_operand:V8HI 2 "register_operand" "v")) > @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct" > (const_int 1) (const_int 9) > (const_int 2) (const_int 10) > (const_int 3) (const_int 11)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrghh %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 2 "register_operand" "v") > + (match_operand:V8HI 1 "register_operand" "v")) > + (parallel [(const_int 4) (const_int 12) > + (const_int 5) (const_int 13) > + (const_int 6) (const_int 14) > + (const_int 7) (const_int 15)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrghh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw" > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si > - : gen_altivec_vmrglw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghw_direct_" > +(define_insn "altivec_vmrghw_direct__be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat: > @@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 0) (const_int 4) > (const_int 1) (const_int 5)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrghw %x0,%x1,%x2 > + vmrghw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghw_direct__le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat: > + (match_operand:VSX_W 2 "register_operand" "wa,v") > + (match_operand:VSX_W 1 "register_operand" "wa,v")) > + (parallel [(const_int 2) (const_int 6) > + (const_int 3) (const_int 7)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > xxmrghw %x0,%x1,%x2 > vmrghw %0,%1,%2" > @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct > - : gen_altivec_vmrghb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglb_direct" > +(define_insn "altivec_vmrglb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct" > (const_int 13) (const_int 29) > (const_int 14) (const_int 30) > (const_int 15) (const_int 31)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrglb %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 2 "register_operand" "v") > + (match_operand:V16QI 1 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 16) > + (const_int 1) (const_int 17) > + (const_int 2) (const_int 18) > + (const_int 3) (const_int 19) > + (const_int 4) (const_int 20) > + (const_int 5) (const_int 21) > + (const_int 6) (const_int 22) > + (const_int 7) (const_int 23)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrglb %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh" > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct > - : gen_altivec_vmrghh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglh_direct" > +(define_insn "altivec_vmrglh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (vec_select:V8HI > (vec_concat:V16HI > @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct" > (const_int 5) (const_int 13) > (const_int 6) (const_int 14) > (const_int 7) (const_int 15)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrglh %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 2 "register_operand" "v") > + (match_operand:V8HI 1 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 8) > + (const_int 1) (const_int 9) > + (const_int 2) (const_int 10) > + (const_int 3) (const_int 11)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrglh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw" > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si > - : gen_altivec_vmrghw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglw_direct_" > +(define_insn "altivec_vmrglw_direct__be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat: > @@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 2) (const_int 6) > (const_int 3) (const_int 7)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrglw %x0,%x1,%x2 > + vmrglw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglw_direct__le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat: > + (match_operand:VSX_W 2 "register_operand" "wa,v") > + (match_operand:VSX_W 1 "register_operand" "wa,v")) > + (parallel [(const_int 0) (const_int 4) > + (const_int 1) (const_int 5)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > xxmrglw %x0,%x1,%x2 > vmrglw %0,%1,%2" > @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index df491bee2ea..c6ccd40e089 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, > CODE_FOR_altivec_vpkuwum_direct, > {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct > - : CODE_FOR_altivec_vmrglb_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be > + : CODE_FOR_altivec_vmrglb_direct_le, > {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct > - : CODE_FOR_altivec_vmrglh_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be > + : CODE_FOR_altivec_vmrglh_direct_le, > {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si > - : CODE_FOR_altivec_vmrglw_direct_v4si, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be > + : CODE_FOR_altivec_vmrglw_direct_v4si_le, > {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct > - : CODE_FOR_altivec_vmrghb_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be > + : CODE_FOR_altivec_vmrghb_direct_le, > {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct > - : CODE_FOR_altivec_vmrghh_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be > + : CODE_FOR_altivec_vmrghh_direct_le, > {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si > - : CODE_FOR_altivec_vmrghw_direct_v4si, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be > + : CODE_FOR_altivec_vmrghw_direct_v4si_le, > {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, > {OPTION_MASK_P8_VECTOR, > BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index e226a93bbe5..80f84e9b141 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_" > (const_int 1) (const_int 5)])))] > "VECTOR_MEM_VSX_P (mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_ > - : gen_altivec_vmrglw_direct_; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > } > [(set_attr "type" "vecperm")]) > @@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_" > (const_int 3) (const_int 7)])))] > "VECTOR_MEM_VSX_P (mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_ > - : gen_altivec_vmrghw_direct_; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > } > [(set_attr "type" "vecperm")]) > diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C > new file mode 100644 > index 00000000000..c89739ecb55 > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C > @@ -0,0 +1,118 @@ > +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ > +/* { dg-require-effective-target vmx_hw } */ > +/* { dg-do run } */ > + > +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; > + > +union > +{ > + native_simd_type V; > + int R[4]; > +} store_le_vec; > + > +struct S > +{ > + S () = default; > + S (unsigned B0) > + { > + native_simd_type val{B0}; > + m_simd = val; > + } > + void store_le (unsigned int out[]) > + { > + store_le_vec.V = m_simd; > + unsigned int x0 = store_le_vec.R[0]; > + __builtin_memcpy (out, &x0, 4); > + } > + S rotl (unsigned int r) > + { > + native_simd_type rot{r}; > + return __builtin_vec_rl (m_simd, rot); > + } > + void operator+= (S other) > + { > + m_simd = __builtin_vec_add (m_simd, other.m_simd); > + } > + void operator^= (S other) > + { > + m_simd = __builtin_vec_xor (m_simd, other.m_simd); > + } > + static void transpose (S &B0, S B1, S B2, S B3) > + { > + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); > + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); > + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); > + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); > + B0 = __builtin_vec_mergeh (T0, T1); > + B3 = __builtin_vec_mergel (T2, T3); > + } > + S (native_simd_type x) : m_simd (x) {} > + native_simd_type m_simd; > +}; > + > +void > +foo (unsigned int output[], unsigned state[]) > +{ > + S R00 = state[0]; > + S R01 = state[0]; > + S R02 = state[2]; > + S R03 = state[0]; > + S R05 = state[5]; > + S R06 = state[6]; > + S R07 = state[7]; > + S R08 = state[8]; > + S R09 = state[9]; > + S R10 = state[10]; > + S R11 = state[11]; > + S R12 = state[12]; > + S R13 = state[13]; > + S R14 = state[4]; > + S R15 = state[15]; > + for (int r = 0; r != 10; ++r) > + { > + R09 += R13; > + R11 += R15; > + R05 ^= R09; > + R06 ^= R10; > + R07 ^= R11; > + R07 = R07.rotl (7); > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 ^= R01; > + R13 ^= R02; > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 = R12.rotl (8); > + R13 = R13.rotl (8); > + R10 += R15; > + R11 += R12; > + R08 += R13; > + R09 += R14; > + R05 ^= R10; > + R06 ^= R11; > + R07 ^= R08; > + R05 = R05.rotl (7); > + R06 = R06.rotl (7); > + R07 = R07.rotl (7); > + } > + R00 += state[0]; > + S::transpose (R00, R01, R02, R03); > + R00.store_le (output); > +} > + > +unsigned int res[1]; > +unsigned main_state[]{1634760805, 60878, 2036477234, 6, > + 0, 825562964, 1471091955, 1346092787, > + 506976774, 4197066702, 518848283, 118491664, > + 0, 0, 0, 0}; > +int > +main () > +{ > + foo (res, main_state); > + if (res[0] != 0x41fcef98) > + __builtin_abort (); > +} > -- 2.27.0 > > 附件: > > v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch 25.1 K > BR, Kewen