From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 943623858C20 for ; Tue, 16 Aug 2022 06:53:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 943623858C20 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27G6gw0d008513; Tue, 16 Aug 2022 06:53:25 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3j067207rh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 16 Aug 2022 06:53:25 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 27G6nScr002150; Tue, 16 Aug 2022 06:53:24 GMT Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3j067207q3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 16 Aug 2022 06:53:24 +0000 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 27G6pMsb027225; Tue, 16 Aug 2022 06:53:22 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma05fra.de.ibm.com with ESMTP id 3hyp8sgkec-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 16 Aug 2022 06:53:21 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 27G6rJsg24379878 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 16 Aug 2022 06:53:19 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7137611C04C; Tue, 16 Aug 2022 06:53:19 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1C78E11C04A; Tue, 16 Aug 2022 06:53:17 +0000 (GMT) Received: from [9.197.241.24] (unknown [9.197.241.24]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 16 Aug 2022 06:53:16 +0000 (GMT) Message-ID: <5df1a7fc-dacf-72e2-041d-66624926091f@linux.ibm.com> Date: Tue, 16 Aug 2022 14:53:15 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Content-Language: en-US To: Xionghu Luo Cc: Xionghu Luo , gcc-patches@gcc.gnu.org, David Edelsohn , Segher Boessenkool References: <20220808034247.2618809-1-xionghuluo@tencent.com> <76035a5e-f0d8-8bc5-93e9-cfb08b2127f8@gmail.com> <20220810170700.GA25951@gate.crashing.org> <472c1531-aae6-123e-6b0c-8827f5585879@gmail.com> From: "Kewen.Lin" In-Reply-To: <472c1531-aae6-123e-6b0c-8827f5585879@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: RkrsPr0sr5_Tw7lR6QI0siLA2rI4_6kF X-Proofpoint-ORIG-GUID: Cn-ScPpFvIfF_HPTAK_NVgyRMCsNyIgf X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-16_06,2022-08-16_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 phishscore=0 adultscore=0 impostorscore=0 clxscore=1015 mlxscore=0 bulkscore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208160024 X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2022 06:53:31 -0000 Hi Xionghu, Thanks for the updated version of patch, some comments are inlined. on 2022/8/11 14:15, Xionghu Luo wrote: > > > On 2022/8/11 01:07, Segher Boessenkool wrote: >> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote: >>> On 2022/8/9 11:01, Kewen.Lin wrote: >>>> I have some concern on those changed "altivec_*_direct", IMHO the suffix >>>> "_direct" is normally to indicate the define_insn is mapped to the >>>> corresponding hw insn directly.  With this change, for example, >>>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks >>>> misleading.  Maybe we can add the corresponding _direct_le and _direct_be >>>> versions, both are mapped into the same insn but have different RTL >>>> patterns.  Looking forward to Segher's and David's suggestions. >>> >>> Thanks!  Do you mean same RTL patterns with different hw insn? >> >> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb >> instruction, never a vmrglb instead.  Misleading names are an expensive >> problem. >> >> > > Thanks.  Then on LE platforms, if user calls altivec_vmrghw,it will be > expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and > finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw". > For BE just strict forward, seems more clear :-), OK for master? > > > [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] > > v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match > the actual output ASM vmrglb. Likewise for all similar xxx_direct_le > patterns. > v2: Split the direct pattern to be and le with same RTL but different insn. > > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent.  So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI >            (subreg:V4SI (reg:V16QI 139) 0) >            (subreg:V4SI (reg:V16QI 140) 0)) >            [const_int 0 4 1 5])) > > Then combine pass could do the nested vec_select optimization > in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) > 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} > > => > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) > 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} > > The endianness check need only once at ASM generation finally. > ASM would be better due to nested vec_select simplified to simple scalar > load. > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64} > Linux(Thanks to Kewen). > > gcc/ChangeLog: > >     PR target/106069 >     * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. >     (altivec_vmrghb_direct_be): New pattern for BE. >     (altivec_vmrglb_direct_le): New pattern for LE. >     (altivec_vmrghh_direct): Remove. >     (altivec_vmrghh_direct_be): New pattern for BE. >     (altivec_vmrglh_direct_le): New pattern for LE. >     (altivec_vmrghw_direct_): Remove. >     (altivec_vmrghw_direct__be): New pattern for BE. >     (altivec_vmrglw_direct__le): New pattern for LE. >     (altivec_vmrglb_direct): Remove. >     (altivec_vmrglb_direct_be): New pattern for BE. >     (altivec_vmrghb_direct_le): New pattern for LE. >     (altivec_vmrglh_direct): Remove. >     (altivec_vmrglh_direct_be): New pattern for BE. >     (altivec_vmrghh_direct_le): New pattern for LE. >     (altivec_vmrglw_direct_): Remove. >     (altivec_vmrglw_direct__be): New pattern for BE. >     (altivec_vmrghw_direct__le): New pattern for LE. >     * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): >     Adjust. >     * config/rs6000/vsx.md: Likewise. > > gcc/testsuite/ChangeLog: > >     PR target/106069 >     * g++.target/powerpc/pr106069.C: New test. > > Signed-off-by: Xionghu Luo > --- >  gcc/config/rs6000/altivec.md                | 223 ++++++++++++++------ >  gcc/config/rs6000/rs6000.cc                 |  36 ++-- >  gcc/config/rs6000/vsx.md                    |  24 +-- >  gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++ >  4 files changed, 305 insertions(+), 98 deletions(-) >  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 2c4940f2e21..78245f470e9 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb" >     (use (match_operand:V16QI 2 "register_operand"))] >    "TARGET_ALTIVEC" >  { > -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > -                        : gen_altivec_vmrglb_direct; > -  if (!BYTES_BIG_ENDIAN) > -    std::swap (operands[1], operands[2]); > -  emit_insn (fun (operands[0], operands[1], operands[2])); > +  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17), > +              GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19), > +              GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21), > +              GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23)); > +  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); > +  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > +  emit_insn (gen_rtx_SET (operands[0], x)); >    DONE; >  }) I think you can just call gen_altivec_vmrghb_direct_be and gen_altivec_vmrghb_direct_le separately here. Similar for some other define_expands. >   > -(define_insn "altivec_vmrghb_direct" > +(define_insn "altivec_vmrghb_direct_be" >    [(set (match_operand:V16QI 0 "register_operand" "=v") >      (vec_select:V16QI >        (vec_concat:V32QI > @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct" >               (const_int 5) (const_int 21) >               (const_int 6) (const_int 22) >               (const_int 7) (const_int 23)])))] > -  "TARGET_ALTIVEC" > +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >    "vmrghb %0,%1,%2" >    [(set_attr "type" "vecperm")]) >   Could you move the following altivec_vmrghb_direct_le here? Then readers can easily check the difference between be and le for the same altivec_vmrghb_direct. Same comment applied for some other similar cases. > +(define_insn "altivec_vmrglb_direct_le" > +  [(set (match_operand:V16QI 0 "register_operand" "=v") > +    (vec_select:V16QI > +      (vec_concat:V32QI > +        (match_operand:V16QI 1 "register_operand" "v") > +        (match_operand:V16QI 2 "register_operand" "v")) > +      (parallel [(const_int 0) (const_int 16) > +             (const_int 1) (const_int 17) > +             (const_int 2) (const_int 18) > +             (const_int 3) (const_int 19) > +             (const_int 4) (const_int 20) > +             (const_int 5) (const_int 21) > +             (const_int 6) (const_int 22) > +             (const_int 7) (const_int 23)])))] > +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > +  "vmrglb %0,%2,%1" > +  [(set_attr "type" "vecperm")]) Could you update this pattern for assembly "vmrglb %0,%1,%2" instead of "vmrglb %0,%2,%1"? I checked the previous md before the culprit commit 0910c516a3d72af048, it emits "vmrglb %0,%1,%2" for altivec_vmrglb_direct. Same comment applied for some other similar cases. > + >  (define_expand "altivec_vmrghh" >    [(use (match_operand:V8HI 0 "register_operand")) >     (use (match_operand:V8HI 1 "register_operand")) >     (use (match_operand:V8HI 2 "register_operand"))] >    "TARGET_ALTIVEC" >  { > -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct > -                        : gen_altivec_vmrglh_direct; > -  if (!BYTES_BIG_ENDIAN) > -    std::swap (operands[1], operands[2]); > -  emit_insn (fun (operands[0], operands[1], operands[2])); > +  rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9), > +              GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11)); > +  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); > + > +  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > +  emit_insn (gen_rtx_SET (operands[0], x)); >    DONE; >  }) >   > -(define_insn "altivec_vmrghh_direct" > +(define_insn "altivec_vmrghh_direct_be" >    [(set (match_operand:V8HI 0 "register_operand" "=v") > -        (vec_select:V8HI > +    (vec_select:V8HI >        (vec_concat:V16HI >          (match_operand:V8HI 1 "register_operand" "v") >          (match_operand:V8HI 2 "register_operand" "v")) > @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct" >               (const_int 1) (const_int 9) >               (const_int 2) (const_int 10) >               (const_int 3) (const_int 11)])))] > -  "TARGET_ALTIVEC" > +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >    "vmrghh %0,%1,%2" >    [(set_attr "type" "vecperm")]) >   > +(define_insn "altivec_vmrglh_direct_le" > +  [(set (match_operand:V8HI 0 "register_operand" "=v") > +    (vec_select:V8HI > +      (vec_concat:V16HI > +        (match_operand:V8HI 1 "register_operand" "v") > +        (match_operand:V8HI 2 "register_operand" "v")) > +      (parallel [(const_int 0) (const_int 8) > +             (const_int 1) (const_int 9) > +             (const_int 2) (const_int 10) > +             (const_int 3) (const_int 11)])))] > +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > +  "vmrglh %0,%2,%1" > +  [(set_attr "type" "vecperm")]) > + >  (define_expand "altivec_vmrghw" >    [(use (match_operand:V4SI 0 "register_operand")) >     (use (match_operand:V4SI 1 "register_operand")) >     (use (match_operand:V4SI 2 "register_operand"))] >    "VECTOR_MEM_ALTIVEC_P (V4SImode)" >  { > -  rtx (*fun) (rtx, rtx, rtx); > -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si > -             : gen_altivec_vmrglw_direct_v4si; > -  if (!BYTES_BIG_ENDIAN) > -    std::swap (operands[1], operands[2]); > -  emit_insn (fun (operands[0], operands[1], operands[2])); > +  rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5)); > +  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); > +  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > +  emit_insn (gen_rtx_SET (operands[0], x)); >    DONE; >  }) >   > -(define_insn "altivec_vmrghw_direct_" > +(define_insn "altivec_vmrghw_direct__be" >    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >      (vec_select:VSX_W >        (vec_concat: > @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_" >          (match_operand:VSX_W 2 "register_operand" "wa,v")) >        (parallel [(const_int 0) (const_int 4) >               (const_int 1) (const_int 5)])))] > -  "TARGET_ALTIVEC" > +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > +  "@ > +  xxmrghw %x0,%x1,%x2 > +  vmrghw %0,%1,%2" > +  [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglw_direct__le" > +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > +    (vec_select:VSX_W > +      (vec_concat: > +        (match_operand:VSX_W 1 "register_operand" "wa,v") > +        (match_operand:VSX_W 2 "register_operand" "wa,v")) > +      (parallel [(const_int 0) (const_int 4) > +             (const_int 1) (const_int 5)])))] > +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >    "@ > -   xxmrghw %x0,%x1,%x2 > -   vmrghw %0,%1,%2" > +  xxmrglw %x0,%x2,%x1 > +  vmrglw %0,%2,%1" >    [(set_attr "type" "vecperm")]) >   >  (define_insn "*altivec_vmrghsf" > @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb" >     (use (match_operand:V16QI 2 "register_operand"))] >    "TARGET_ALTIVEC" >  { > -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct > -                        : gen_altivec_vmrghb_direct; > -  if (!BYTES_BIG_ENDIAN) > -    std::swap (operands[1], operands[2]); > -  emit_insn (fun (operands[0], operands[1], operands[2])); > +  rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25), > +              GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27), > +              GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29), > +              GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31)); > +  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); > +  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > +  emit_insn (gen_rtx_SET (operands[0], x)); >    DONE; >  }) >   > -(define_insn "altivec_vmrglb_direct" > +(define_insn "altivec_vmrglb_direct_be" >    [(set (match_operand:V16QI 0 "register_operand" "=v") >      (vec_select:V16QI >        (vec_concat:V32QI > @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct" >               (const_int 13) (const_int 29) >               (const_int 14) (const_int 30) >               (const_int 15) (const_int 31)])))] > -  "TARGET_ALTIVEC" > +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >    "vmrglb %0,%1,%2" >    [(set_attr "type" "vecperm")]) >   > +(define_insn "altivec_vmrghb_direct_le" > +  [(set (match_operand:V16QI 0 "register_operand" "=v") > +    (vec_select:V16QI > +      (vec_concat:V32QI > +        (match_operand:V16QI 1 "register_operand" "v") > +        (match_operand:V16QI 2 "register_operand" "v")) > +      (parallel [(const_int  8) (const_int 24) > +             (const_int  9) (const_int 25) > +             (const_int 10) (const_int 26) > +             (const_int 11) (const_int 27) > +             (const_int 12) (const_int 28) > +             (const_int 13) (const_int 29) > +             (const_int 14) (const_int 30) > +             (const_int 15) (const_int 31)])))] > +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > +  "vmrghb %0,%2,%1" > +  [(set_attr "type" "vecperm")]) > + >  (define_expand "altivec_vmrglh" >    [(use (match_operand:V8HI 0 "register_operand")) >     (use (match_operand:V8HI 1 "register_operand")) >     (use (match_operand:V8HI 2 "register_operand"))] >    "TARGET_ALTIVEC" >  { > -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct > -                        : gen_altivec_vmrghh_direct; > -  if (!BYTES_BIG_ENDIAN) > -    std::swap (operands[1], operands[2]); > -  emit_insn (fun (operands[0], operands[1], operands[2])); > +  rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13), > +              GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15)); > +  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); > +  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > +  emit_insn (gen_rtx_SET (operands[0], x)); >    DONE; >  }) >   > -(define_insn "altivec_vmrglh_direct" > +(define_insn "altivec_vmrglh_direct_be" >    [(set (match_operand:V8HI 0 "register_operand" "=v") >          (vec_select:V8HI >        (vec_concat:V16HI > @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct" >               (const_int 5) (const_int 13) >               (const_int 6) (const_int 14) >               (const_int 7) (const_int 15)])))] > -  "TARGET_ALTIVEC" > +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >    "vmrglh %0,%1,%2" >    [(set_attr "type" "vecperm")]) >   > +(define_insn "altivec_vmrghh_direct_le" > +  [(set (match_operand:V8HI 0 "register_operand" "=v") > +        (vec_select:V8HI > +      (vec_concat:V16HI > +        (match_operand:V8HI 1 "register_operand" "v") > +        (match_operand:V8HI 2 "register_operand" "v")) > +      (parallel [(const_int 4) (const_int 12) > +             (const_int 5) (const_int 13) > +             (const_int 6) (const_int 14) > +             (const_int 7) (const_int 15)])))] > +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > +  "vmrghh %0,%2,%1" > +  [(set_attr "type" "vecperm")]) > + >  (define_expand "altivec_vmrglw" >    [(use (match_operand:V4SI 0 "register_operand")) >     (use (match_operand:V4SI 1 "register_operand")) >     (use (match_operand:V4SI 2 "register_operand"))] >    "VECTOR_MEM_ALTIVEC_P (V4SImode)" >  { > -  rtx (*fun) (rtx, rtx, rtx); > -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si > -             : gen_altivec_vmrghw_direct_v4si; > -  if (!BYTES_BIG_ENDIAN) > -    std::swap (operands[1], operands[2]); > -  emit_insn (fun (operands[0], operands[1], operands[2])); > +  rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7)); > +  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); > +  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > +  emit_insn (gen_rtx_SET (operands[0], x)); >    DONE; >  }) >   > -(define_insn "altivec_vmrglw_direct_" > +(define_insn "altivec_vmrglw_direct__be" >    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >      (vec_select:VSX_W >        (vec_concat: > @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_" >          (match_operand:VSX_W 2 "register_operand" "wa,v")) >        (parallel [(const_int 2) (const_int 6) >               (const_int 3) (const_int 7)])))] > -  "TARGET_ALTIVEC" > +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > +  "@ > +  xxmrglw %x0,%x1,%x2 > +  vmrglw %0,%1,%2" > +  [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghw_direct__le" > +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > +    (vec_select:VSX_W > +      (vec_concat: > +        (match_operand:VSX_W 1 "register_operand" "wa,v") > +        (match_operand:VSX_W 2 "register_operand" "wa,v")) > +      (parallel [(const_int 2) (const_int 6) > +             (const_int 3) (const_int 7)])))] > +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >    "@ > -   xxmrglw %x0,%x1,%x2 > -   vmrglw %0,%1,%2" > +  xxmrghw %x0,%x2,%x1 > +  vmrghw %0,%2,%1" >    [(set_attr "type" "vecperm")]) >   >  (define_insn "*altivec_vmrglsf" > @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi" >      { >        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); >      } >    else >      { >        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); Note that if you change assembly "vmrghh %0,%2,%1" to "vmrghh %0,%1,%2", you need to change this to: emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); Same comment applied for some other similar cases. >      } >    DONE; >  }) > @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi" >      { >        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); >      } >    else >      { >        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); >      } >    DONE; >  }) > @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi" >      { >        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); >      } >    else >      { >        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); >      } >    DONE; >  }) > @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi" >      { >        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); >      } >    else >      { >        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); >      } >    DONE; >  }) > @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi" >      { >        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); >      } >    else >      { >        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); >      } >    DONE; >  }) > @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi" >      { >        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); >      } >    else >      { >        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); >      } >    DONE; >  }) > @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi" >      { >        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); >      } >    else >      { >        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); >      } >    DONE; >  }) > @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi" >      { >        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); >      } >    else >      { >        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); >        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); >      } >    DONE; >  }) > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index df491bee2ea..97da7706f63 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, >      {OPTION_MASK_ALTIVEC, >       CODE_FOR_altivec_vpkuwum_direct, >       {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, > -    {OPTION_MASK_ALTIVEC, > -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct > -              : CODE_FOR_altivec_vmrglb_direct, > +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be, >       {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, Before the culprit commit 0910c516a3d72af04, we have: { OPTION_MASK_ALTIVEC, (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct : CODE_FOR_altivec_vmrglb_direct), { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, I think we should use: { OPTION_MASK_ALTIVEC, (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be : CODE_FOR_altivec_vmrglb_direct_le), { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, here instead. Similar comment for those related below. > -    {OPTION_MASK_ALTIVEC, > -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct > -              : CODE_FOR_altivec_vmrglh_direct, > +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be, >       {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, > -    {OPTION_MASK_ALTIVEC, > -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si > -              : CODE_FOR_altivec_vmrglw_direct_v4si, > +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be, >       {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, > -    {OPTION_MASK_ALTIVEC, > -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct > -              : CODE_FOR_altivec_vmrghb_direct, > +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be, >       {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, > -    {OPTION_MASK_ALTIVEC, > -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct > -              : CODE_FOR_altivec_vmrghh_direct, > +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be, >       {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, > -    {OPTION_MASK_ALTIVEC, > -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si > -              : CODE_FOR_altivec_vmrghw_direct_v4si, > +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be, >       {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, >      {OPTION_MASK_P8_VECTOR, >       BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct > @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, >   >            /* For little-endian, the two input operands must be swapped >               (or swapped back) to ensure proper right-to-left numbering > -             from 0 to 2N-1.  */ > -      if (swapped ^ !BYTES_BIG_ENDIAN > -          && icode != CODE_FOR_vsx_xxpermdi_v16qi) > +         from 0 to 2N-1.  Excludes the vmrg[lh][bhw] and xxpermdi ops.  */ > +      if (swapped ^ !BYTES_BIG_ENDIAN) > +        if (!(icode == CODE_FOR_altivec_vmrghb_direct_be > +          || icode == CODE_FOR_altivec_vmrglb_direct_be > +          || icode == CODE_FOR_altivec_vmrghh_direct_be > +          || icode == CODE_FOR_altivec_vmrglh_direct_be > +          || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be > +          || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be > +          || icode == CODE_FOR_vsx_xxpermdi_v16qi)) >          std::swap (op0, op1); IIUC, we don't need this part of change once we fix the operand order in the assembly for those LE "direct"s. BR, Kewen >        if (imode != V16QImode) >          { > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index e226a93bbe5..2ae1bce131d 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -4688,12 +4688,12 @@ (define_expand "vsx_xxmrghw_" >               (const_int 1) (const_int 5)])))] >    "VECTOR_MEM_VSX_P (mode)" >  { > -  rtx (*fun) (rtx, rtx, rtx); > -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_ > -             : gen_altivec_vmrglw_direct_; > -  if (!BYTES_BIG_ENDIAN) > -    std::swap (operands[1], operands[2]); > -  emit_insn (fun (operands[0], operands[1], operands[2])); > +  if (BYTES_BIG_ENDIAN) > +    emit_insn ( > +    gen_altivec_vmrghw_direct_v4si_be (operands[0], operands[1], operands[2])); > +  else > +    emit_insn ( > +    gen_altivec_vmrglw_direct_v4si_le (operands[0], operands[1], operands[2])); >    DONE; >  } >    [(set_attr "type" "vecperm")]) > @@ -4708,12 +4708,12 @@ (define_expand "vsx_xxmrglw_" >               (const_int 3) (const_int 7)])))] >    "VECTOR_MEM_VSX_P (mode)" >  { > -  rtx (*fun) (rtx, rtx, rtx); > -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_ > -             : gen_altivec_vmrghw_direct_; > -  if (!BYTES_BIG_ENDIAN) > -    std::swap (operands[1], operands[2]); > -  emit_insn (fun (operands[0], operands[1], operands[2])); > +  if (BYTES_BIG_ENDIAN) > +    emit_insn ( > +    gen_altivec_vmrglw_direct_v4si_be (operands[0], operands[1], operands[2])); > +  else > +    emit_insn ( > +    gen_altivec_vmrghw_direct_v4si_le (operands[0], operands[1], operands[2])); >    DONE; >  } >    [(set_attr "type" "vecperm")]) > diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C > new file mode 100644 > index 00000000000..2cde9b821e3 > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C > @@ -0,0 +1,120 @@ > +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ > +/* { dg-require-effective-target vmx_hw } */ > +/* { dg-do run } */ > + > +extern "C" void * > +memcpy (void *, const void *, unsigned long); > +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; > + > +union > +{ > +  native_simd_type V; > +  int R[4]; > +} store_le_vec; > + > +struct S > +{ > +  S () = default; > +  S (unsigned B0) > +  { > +    native_simd_type val{B0}; > +    m_simd = val; > +  } > +  void store_le (unsigned int out[]) > +  { > +    store_le_vec.V = m_simd; > +    unsigned int x0 = store_le_vec.R[0]; > +    memcpy (out, &x0, 4); > +  } > +  S rotl (unsigned int r) > +  { > +    native_simd_type rot{r}; > +    return __builtin_vec_rl (m_simd, rot); > +  } > +  void operator+= (S other) > +  { > +    m_simd = __builtin_vec_add (m_simd, other.m_simd); > +  } > +  void operator^= (S other) > +  { > +    m_simd = __builtin_vec_xor (m_simd, other.m_simd); > +  } > +  static void transpose (S &B0, S B1, S B2, S B3) > +  { > +    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); > +    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); > +    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); > +    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); > +    B0 = __builtin_vec_mergeh (T0, T1); > +    B3 = __builtin_vec_mergel (T2, T3); > +  } > +  S (native_simd_type x) : m_simd (x) {} > +  native_simd_type m_simd; > +}; > + > +void > +foo (unsigned int output[], unsigned state[]) > +{ > +  S R00 = state[0]; > +  S R01 = state[0]; > +  S R02 = state[2]; > +  S R03 = state[0]; > +  S R05 = state[5]; > +  S R06 = state[6]; > +  S R07 = state[7]; > +  S R08 = state[8]; > +  S R09 = state[9]; > +  S R10 = state[10]; > +  S R11 = state[11]; > +  S R12 = state[12]; > +  S R13 = state[13]; > +  S R14 = state[4]; > +  S R15 = state[15]; > +  for (int r = 0; r != 10; ++r) > +    { > +      R09 += R13; > +      R11 += R15; > +      R05 ^= R09; > +      R06 ^= R10; > +      R07 ^= R11; > +      R07 = R07.rotl (7); > +      R00 += R05; > +      R01 += R06; > +      R02 += R07; > +      R15 ^= R00; > +      R12 ^= R01; > +      R13 ^= R02; > +      R00 += R05; > +      R01 += R06; > +      R02 += R07; > +      R15 ^= R00; > +      R12 = R12.rotl (8); > +      R13 = R13.rotl (8); > +      R10 += R15; > +      R11 += R12; > +      R08 += R13; > +      R09 += R14; > +      R05 ^= R10; > +      R06 ^= R11; > +      R07 ^= R08; > +      R05 = R05.rotl (7); > +      R06 = R06.rotl (7); > +      R07 = R07.rotl (7); > +    } > +  R00 += state[0]; > +  S::transpose (R00, R01, R02, R03); > +  R00.store_le (output); > +} > + > +unsigned int res[1]; > +unsigned main_state[]{1634760805, 60878,      2036477234, 6, > +              0,      825562964,  1471091955, 1346092787, > +              506976774,  4197066702, 518848283,  118491664, > +              0,      0,          0,      0}; > +int > +main () > +{ > +  foo (res, main_state); > +  if (res[0] != 0x41fcef98) > +    __builtin_abort (); > +}