From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=V7Vg=6F=gmail.com=yinyuefengyi@sourceware.org>
Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d])
	by sourceware.org (Postfix) with ESMTPS id 001C53858C50
	for <gcc-patches@gcc.gnu.org>; Thu,  9 Feb 2023 02:15:31 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 001C53858C50
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-x62d.google.com with SMTP id f6so1295048pln.12
        for <gcc-patches@gcc.gnu.org>; Wed, 08 Feb 2023 18:15:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject
         :user-agent:mime-version:date:message-id:from:to:cc:subject:date
         :message-id:reply-to;
        bh=gv976MsXVsXLEHDOdqOy67CB7bYx2Te63znPBHTmba0=;
        b=ZJIoSJmSEiGHTkrWPWtCxGLl5AOQORB9m7YtwEj1g4x8Ii1W4N/LT+ke98y41LswwD
         TAezYN9XtnNQU++S0gJqDUjcvylL/B6dRKU38oVASyJPYkGaScUm/Jo3Z5Igm2pijMWc
         F54+jnr6JhiS0yDqLO51GZmxfjG+a2VTbUNzL+4ppSBrK27OqoGNa6IgAwzRrs23QpNB
         9LoeTzCJyoBRVl2b3enaTpus+EG7+BCVe4AxopBEtNoYu7/QVRlV0mkKTQ7Q1sEtwQxP
         DMZ3LnASeXzy8oCYZfNftijBLVVYbVRjZ32Lqoin2y9+guHmiuVa7VXqYNwWwBOKebz8
         8eqw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject
         :user-agent:mime-version:date:message-id:x-gm-message-state:from:to
         :cc:subject:date:message-id:reply-to;
        bh=gv976MsXVsXLEHDOdqOy67CB7bYx2Te63znPBHTmba0=;
        b=lUtoOy2QEynab4qTRmVPqpf94/80OF0tbt65d0XM0FcA2nWly7lsFMxlx5/g1qubN+
         SYL8DObhX6VMhVM0mcCqs/kGaAtCEdHQHG08Dquy3g7Y48RiQOCzNi4b+YoCCZa/wIPF
         br1UZsMBHIiGh4bQwnNsT22iiH/cxyii/2KN6F/OYnlIH9gW13d0xqaNkGj2f+HKO0kI
         ObMxYWfVeCHDksa7UtgeuIziI4ZG1HkUAmksbHJ/0MAm0giABWw9AsATnjSWRH3AN3/j
         pxOGNXTaDsEa1lUB5qqSVXIzvPMPeHYDPsPdWJTi8DELdS2zf8NGuj8M9BNAu4r40tp0
         8NBg==
X-Gm-Message-State: AO0yUKXFcO1E2N45YAq5D2egNCpdKrHEf9OfiryXWH53c30B2grrZ4s6
	hH85rXt5Ts+aFuhAC8xzyU0=
X-Google-Smtp-Source: AK7set+6Bi+NAKGsFPM3TU/vK5qAsy34nIXs81zAggQKjlVD7L/Evvb4DzWr3oQAp1+K0jq/63Bbew==
X-Received: by 2002:a05:6a21:3609:b0:b6:99a4:66bd with SMTP id yg9-20020a056a21360900b000b699a466bdmr7696228pzb.38.1675908930763;
        Wed, 08 Feb 2023 18:15:30 -0800 (PST)
Received: from [192.168.255.10] ([103.7.29.32])
        by smtp.gmail.com with ESMTPSA id c22-20020aa78e16000000b005813f365afcsm101365pfr.189.2023.02.08.18.15.28
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Wed, 08 Feb 2023 18:15:30 -0800 (PST)
Message-ID: <c39c82ad-9d80-d5c7-70b6-26027a945e54@gmail.com>
Date: Thu, 9 Feb 2023 10:15:22 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
 Gecko/20100101 Thunderbird/102.6.1
Subject: Re: Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when
 removing the UNSPECS [PR106069]
To: "Kewen.Lin" <linkw@linux.ibm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>
Cc: Xionghu Luo <xionghuluo@tencent.com>, gcc-patches@gcc.gnu.org,
 David Edelsohn <dje.gcc@gmail.com>, Jakub Jelinek <jakub@redhat.com>
References: <20220808034247.2618809-1-xionghuluo@tencent.com>
 <ec28ad09-f23a-3ffc-3025-f0f52d0e773d@linux.ibm.com>
 <76035a5e-f0d8-8bc5-93e9-cfb08b2127f8@gmail.com>
 <20220810170700.GA25951@gate.crashing.org>
 <472c1531-aae6-123e-6b0c-8827f5585879@gmail.com>
 <5df1a7fc-dacf-72e2-041d-66624926091f@linux.ibm.com>
 <37b57a54-f98e-96a3-edff-866c8aae4c7d@gmail.com>
 <5418ebd2-d544-f4cc-d930-bdde64ad2807@gmail.com>
 <e8e69f0c-7f36-e671-6c3b-74401e4d8c48@linux.ibm.com>
From: Xionghu Luo <yinyuefengyi@gmail.com>
In-Reply-To: <e8e69f0c-7f36-e671-6c3b-74401e4d8c48@linux.ibm.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Thanks Kewen!
Ping this again @Segher.
Maybe we could also merge this patch if no objections from Segher as 
several reviews and tests taken on this already...


BR,
Xionghu


On 2023/1/18 17:11, Kewen.Lin wrote:
> Hi Segher,
> 
> I guessed that this patch escaped from your radar. :)
> 
> As Jakub asked the status in PR106069, I applied this attached patch from Xionghu
> to the latest trunk, re-tested it and confirmed that it's still bootstrapped and
> regtested on powerpc64-linux-gnu P8 and powerpc64le-linux-gnu P9 and P10.
> 
> This new version has separated out direct le and be, it's more clear than before,
> it looked good to me.  What do you think of this?  Looking forward to your opinion.
> 
> btw, the link in archives:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600169.html
> 
> BR,
> Kewen
> 
> on 2022/8/24 09:24, Xionghu Luo wrote:
>> 主题:
>> Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
>> From:
>> Xionghu Luo <yinyuefengyi@gmail.com>
>> 日期:
>> 2022/8/24, 09:24
>>
>> 收件人:
>> "Kewen.Lin" <linkw@linux.ibm.com>, Segher Boessenkool <segher@kernel.crashing.org>
>> 抄送:
>> Xionghu Luo <xionghuluo@tencent.com>, gcc-patches@gcc.gnu.org, David Edelsohn <dje.gcc@gmail.com>, Segher Boessenkool <segher@kernel.crashing.org>
>>
>>
>> Hi Segher, I'd like to resend and ping for this patch. Thanks.
>>
>> v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch
>>
>>  From 23bffdacdf0eb1140c7a3571e6158797f4818d57 Mon Sep 17 00:00:00 2001
>> From: Xionghu Luo <xionghuluo@tencent.com>
>> Date: Thu, 4 Aug 2022 03:44:58 +0000
>> Subject: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the
>>   UNSPECS [PR106069]
>>
>> v4: Update per comments.
>> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
>> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
>> patterns.
>> v2: Split the direct pattern to be and le with same RTL but different insn.
>>
>> The native RTL expression for vec_mrghw should be same for BE and LE as
>> they are register and endian-independent.  So both BE and LE need
>> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
>> with vec_select and vec_concat.
>>
>> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>> 		   (subreg:V4SI (reg:V16QI 139) 0)
>> 		   (subreg:V4SI (reg:V16QI 140) 0))
>> 		   [const_int 0 4 1 5]))
>>
>> Then combine pass could do the nested vec_select optimization
>> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
>>
>> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
>> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
>>
>> =>
>>
>> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
>> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
>>
>> The endianness check need only once at ASM generation finally.
>> ASM would be better due to nested vec_select simplified to simple scalar
>> load.
>>
>> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
>> Linux.
>>
>> gcc/ChangeLog:
>>
>> 	PR target/106069
>> 	* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
>> 	(altivec_vmrghb_direct_be): New pattern for BE.
>> 	(altivec_vmrghb_direct_le): New pattern for LE.
>> 	(altivec_vmrghh_direct): Remove.
>> 	(altivec_vmrghh_direct_be): New pattern for BE.
>> 	(altivec_vmrghh_direct_le): New pattern for LE.
>> 	(altivec_vmrghw_direct_<mode>): Remove.
>> 	(altivec_vmrghw_direct_<mode>_be): New pattern for BE.
>> 	(altivec_vmrghw_direct_<mode>_le): New pattern for LE.
>> 	(altivec_vmrglb_direct): Remove.
>> 	(altivec_vmrglb_direct_be): New pattern for BE.
>> 	(altivec_vmrglb_direct_le): New pattern for LE.
>> 	(altivec_vmrglh_direct): Remove.
>> 	(altivec_vmrglh_direct_be): New pattern for BE.
>> 	(altivec_vmrglh_direct_le): New pattern for LE.
>> 	(altivec_vmrglw_direct_<mode>): Remove.
>> 	(altivec_vmrglw_direct_<mode>_be): New pattern for BE.
>> 	(altivec_vmrglw_direct_<mode>_le): New pattern for LE.
>> 	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
>> 	Adjust.
>> 	* config/rs6000/vsx.md: Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR target/106069
>> 	* g++.target/powerpc/pr106069.C: New test.
>>
>> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
>> ---
>>   gcc/config/rs6000/altivec.md                | 222 ++++++++++++++------
>>   gcc/config/rs6000/rs6000.cc                 |  24 +--
>>   gcc/config/rs6000/vsx.md                    |  28 +--
>>   gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++
>>   4 files changed, 307 insertions(+), 85 deletions(-)
>>   create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
>>
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index 2c4940f2e21..c6a381908cb 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
>>      (use (match_operand:V16QI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
>> -						: gen_altivec_vmrglb_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (
>> +      gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
>> +  else
>> +    emit_insn (
>> +      gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrghb_direct"
>> +(define_insn "altivec_vmrghb_direct_be"
>>     [(set (match_operand:V16QI 0 "register_operand" "=v")
>>   	(vec_select:V16QI
>>   	  (vec_concat:V32QI
>> @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
>>   		     (const_int 5) (const_int 21)
>>   		     (const_int 6) (const_int 22)
>>   		     (const_int 7) (const_int 23)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "vmrghb %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrghb_direct_le"
>> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
>> +	(vec_select:V16QI
>> +	  (vec_concat:V32QI
>> +	    (match_operand:V16QI 2 "register_operand" "v")
>> +	    (match_operand:V16QI 1 "register_operand" "v"))
>> +	  (parallel [(const_int  8) (const_int 24)
>> +		     (const_int  9) (const_int 25)
>> +		     (const_int 10) (const_int 26)
>> +		     (const_int 11) (const_int 27)
>> +		     (const_int 12) (const_int 28)
>> +		     (const_int 13) (const_int 29)
>> +		     (const_int 14) (const_int 30)
>> +		     (const_int 15) (const_int 31)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "vmrghb %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
>>      (use (match_operand:V8HI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
>> -						: gen_altivec_vmrglh_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (
>> +      gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
>> +  else
>> +    emit_insn (
>> +      gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrghh_direct"
>> +(define_insn "altivec_vmrghh_direct_be"
>>     [(set (match_operand:V8HI 0 "register_operand" "=v")
>> -        (vec_select:V8HI
>> +	(vec_select:V8HI
>>   	  (vec_concat:V16HI
>>   	    (match_operand:V8HI 1 "register_operand" "v")
>>   	    (match_operand:V8HI 2 "register_operand" "v"))
>> @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
>>   		     (const_int 1) (const_int 9)
>>   		     (const_int 2) (const_int 10)
>>   		     (const_int 3) (const_int 11)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "vmrghh %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrghh_direct_le"
>> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
>> +        (vec_select:V8HI
>> +	  (vec_concat:V16HI
>> +	    (match_operand:V8HI 2 "register_operand" "v")
>> +	    (match_operand:V8HI 1 "register_operand" "v"))
>> +	  (parallel [(const_int 4) (const_int 12)
>> +		     (const_int 5) (const_int 13)
>> +		     (const_int 6) (const_int 14)
>> +		     (const_int 7) (const_int 15)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "vmrghh %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
>>      (use (match_operand:V4SI 2 "register_operand"))]
>>     "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
>> -			 : gen_altivec_vmrglw_direct_v4si;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
>> +						  operands[1],
>> +						  operands[2]));
>> +  else
>> +    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
>> +						  operands[2],
>> +						  operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrghw_direct_<mode>"
>> +(define_insn "altivec_vmrghw_direct_<mode>_be"
>>     [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>>   	(vec_select:VSX_W
>>   	  (vec_concat:<VS_double>
>> @@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>"
>>   	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
>>   	  (parallel [(const_int 0) (const_int 4)
>>   		     (const_int 1) (const_int 5)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "@
>> +   xxmrghw %x0,%x1,%x2
>> +   vmrghw %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrghw_direct_<mode>_le"
>> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>> +	(vec_select:VSX_W
>> +	  (vec_concat:<VS_double>
>> +	    (match_operand:VSX_W 2 "register_operand" "wa,v")
>> +	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
>> +	  (parallel [(const_int 2) (const_int 6)
>> +		     (const_int 3) (const_int 7)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "@
>>      xxmrghw %x0,%x1,%x2
>>      vmrghw %0,%1,%2"
>> @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
>>      (use (match_operand:V16QI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
>> -						: gen_altivec_vmrghb_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (
>> +      gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
>> +  else
>> +    emit_insn (
>> +      gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglb_direct"
>> +(define_insn "altivec_vmrglb_direct_be"
>>     [(set (match_operand:V16QI 0 "register_operand" "=v")
>>   	(vec_select:V16QI
>>   	  (vec_concat:V32QI
>> @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
>>   		     (const_int 13) (const_int 29)
>>   		     (const_int 14) (const_int 30)
>>   		     (const_int 15) (const_int 31)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "vmrglb %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrglb_direct_le"
>> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
>> +	(vec_select:V16QI
>> +	  (vec_concat:V32QI
>> +	    (match_operand:V16QI 2 "register_operand" "v")
>> +	    (match_operand:V16QI 1 "register_operand" "v"))
>> +	  (parallel [(const_int 0) (const_int 16)
>> +		     (const_int 1) (const_int 17)
>> +		     (const_int 2) (const_int 18)
>> +		     (const_int 3) (const_int 19)
>> +		     (const_int 4) (const_int 20)
>> +		     (const_int 5) (const_int 21)
>> +		     (const_int 6) (const_int 22)
>> +		     (const_int 7) (const_int 23)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "vmrglb %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
>>      (use (match_operand:V8HI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
>> -						: gen_altivec_vmrghh_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (
>> +      gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
>> +  else
>> +    emit_insn (
>> +      gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglh_direct"
>> +(define_insn "altivec_vmrglh_direct_be"
>>     [(set (match_operand:V8HI 0 "register_operand" "=v")
>>           (vec_select:V8HI
>>   	  (vec_concat:V16HI
>> @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
>>   		     (const_int 5) (const_int 13)
>>   		     (const_int 6) (const_int 14)
>>   		     (const_int 7) (const_int 15)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "vmrglh %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrglh_direct_le"
>> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
>> +	(vec_select:V8HI
>> +	  (vec_concat:V16HI
>> +	    (match_operand:V8HI 2 "register_operand" "v")
>> +	    (match_operand:V8HI 1 "register_operand" "v"))
>> +	  (parallel [(const_int 0) (const_int 8)
>> +		     (const_int 1) (const_int 9)
>> +		     (const_int 2) (const_int 10)
>> +		     (const_int 3) (const_int 11)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "vmrglh %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
>>      (use (match_operand:V4SI 2 "register_operand"))]
>>     "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
>> -			 : gen_altivec_vmrghw_direct_v4si;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
>> +						  operands[1],
>> +						  operands[2]));
>> +  else
>> +    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
>> +						  operands[2],
>> +						  operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglw_direct_<mode>"
>> +(define_insn "altivec_vmrglw_direct_<mode>_be"
>>     [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>>   	(vec_select:VSX_W
>>   	  (vec_concat:<VS_double>
>> @@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>"
>>   	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
>>   	  (parallel [(const_int 2) (const_int 6)
>>   		     (const_int 3) (const_int 7)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "@
>> +   xxmrglw %x0,%x1,%x2
>> +   vmrglw %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrglw_direct_<mode>_le"
>> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>> +	(vec_select:VSX_W
>> +	  (vec_concat:<VS_double>
>> +	    (match_operand:VSX_W 2 "register_operand" "wa,v")
>> +	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
>> +	  (parallel [(const_int 0) (const_int 4)
>> +		     (const_int 1) (const_int 5)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "@
>>      xxmrglw %x0,%x1,%x2
>>      vmrglw %0,%1,%2"
>> @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index df491bee2ea..c6ccd40e089 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>>        CODE_FOR_altivec_vpkuwum_direct,
>>        {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
>> -		      : CODE_FOR_altivec_vmrglb_direct,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
>> +		      : CODE_FOR_altivec_vmrglb_direct_le,
>>        {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
>> -		      : CODE_FOR_altivec_vmrglh_direct,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
>> +		      : CODE_FOR_altivec_vmrglh_direct_le,
>>        {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
>> -		      : CODE_FOR_altivec_vmrglw_direct_v4si,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
>> +		      : CODE_FOR_altivec_vmrglw_direct_v4si_le,
>>        {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
>> -		      : CODE_FOR_altivec_vmrghb_direct,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
>> +		      : CODE_FOR_altivec_vmrghb_direct_le,
>>        {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
>> -		      : CODE_FOR_altivec_vmrghh_direct,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
>> +		      : CODE_FOR_altivec_vmrghh_direct_le,
>>        {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
>> -		      : CODE_FOR_altivec_vmrghw_direct_v4si,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
>> +		      : CODE_FOR_altivec_vmrghw_direct_v4si_le,
>>        {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
>>       {OPTION_MASK_P8_VECTOR,
>>        BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index e226a93bbe5..80f84e9b141 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>"
>>   		     (const_int 1) (const_int 5)])))]
>>     "VECTOR_MEM_VSX_P (<MODE>mode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
>> -			 : gen_altivec_vmrglw_direct_<mode>;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
>> +						  operands[1],
>> +						  operands[2]));
>> +  else
>> +    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
>> +						  operands[2],
>> +						  operands[1]));
>>     DONE;
>>   }
>>     [(set_attr "type" "vecperm")])
>> @@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_<mode>"
>>   		     (const_int 3) (const_int 7)])))]
>>     "VECTOR_MEM_VSX_P (<MODE>mode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
>> -			 : gen_altivec_vmrghw_direct_<mode>;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
>> +						  operands[1],
>> +						  operands[2]));
>> +  else
>> +    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
>> +						  operands[2],
>> +						  operands[1]));
>>     DONE;
>>   }
>>     [(set_attr "type" "vecperm")])
>> diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
>> new file mode 100644
>> index 00000000000..c89739ecb55
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
>> @@ -0,0 +1,118 @@
>> +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
>> +/* { dg-require-effective-target vmx_hw } */
>> +/* { dg-do run } */
>> +
>> +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
>> +
>> +union
>> +{
>> +  native_simd_type V;
>> +  int R[4];
>> +} store_le_vec;
>> +
>> +struct S
>> +{
>> +  S () = default;
>> +  S (unsigned B0)
>> +  {
>> +    native_simd_type val{B0};
>> +    m_simd = val;
>> +  }
>> +  void store_le (unsigned int out[])
>> +  {
>> +    store_le_vec.V = m_simd;
>> +    unsigned int x0 = store_le_vec.R[0];
>> +    __builtin_memcpy (out, &x0, 4);
>> +  }
>> +  S rotl (unsigned int r)
>> +  {
>> +    native_simd_type rot{r};
>> +    return __builtin_vec_rl (m_simd, rot);
>> +  }
>> +  void operator+= (S other)
>> +  {
>> +    m_simd = __builtin_vec_add (m_simd, other.m_simd);
>> +  }
>> +  void operator^= (S other)
>> +  {
>> +    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
>> +  }
>> +  static void transpose (S &B0, S B1, S B2, S B3)
>> +  {
>> +    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
>> +    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
>> +    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
>> +    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
>> +    B0 = __builtin_vec_mergeh (T0, T1);
>> +    B3 = __builtin_vec_mergel (T2, T3);
>> +  }
>> +  S (native_simd_type x) : m_simd (x) {}
>> +  native_simd_type m_simd;
>> +};
>> +
>> +void
>> +foo (unsigned int output[], unsigned state[])
>> +{
>> +  S R00 = state[0];
>> +  S R01 = state[0];
>> +  S R02 = state[2];
>> +  S R03 = state[0];
>> +  S R05 = state[5];
>> +  S R06 = state[6];
>> +  S R07 = state[7];
>> +  S R08 = state[8];
>> +  S R09 = state[9];
>> +  S R10 = state[10];
>> +  S R11 = state[11];
>> +  S R12 = state[12];
>> +  S R13 = state[13];
>> +  S R14 = state[4];
>> +  S R15 = state[15];
>> +  for (int r = 0; r != 10; ++r)
>> +    {
>> +      R09 += R13;
>> +      R11 += R15;
>> +      R05 ^= R09;
>> +      R06 ^= R10;
>> +      R07 ^= R11;
>> +      R07 = R07.rotl (7);
>> +      R00 += R05;
>> +      R01 += R06;
>> +      R02 += R07;
>> +      R15 ^= R00;
>> +      R12 ^= R01;
>> +      R13 ^= R02;
>> +      R00 += R05;
>> +      R01 += R06;
>> +      R02 += R07;
>> +      R15 ^= R00;
>> +      R12 = R12.rotl (8);
>> +      R13 = R13.rotl (8);
>> +      R10 += R15;
>> +      R11 += R12;
>> +      R08 += R13;
>> +      R09 += R14;
>> +      R05 ^= R10;
>> +      R06 ^= R11;
>> +      R07 ^= R08;
>> +      R05 = R05.rotl (7);
>> +      R06 = R06.rotl (7);
>> +      R07 = R07.rotl (7);
>> +    }
>> +  R00 += state[0];
>> +  S::transpose (R00, R01, R02, R03);
>> +  R00.store_le (output);
>> +}
>> +
>> +unsigned int res[1];
>> +unsigned main_state[]{1634760805, 60878,      2036477234, 6,
>> +		      0,	  825562964,  1471091955, 1346092787,
>> +		      506976774,  4197066702, 518848283,  118491664,
>> +		      0,	  0,	      0,	  0};
>> +int
>> +main ()
>> +{
>> +  foo (res, main_state);
>> +  if (res[0] != 0x41fcef98)
>> +    __builtin_abort ();
>> +}
>> -- 2.27.0
>>
>> 附件：
>>
>> v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch	25.1 K
>>
> 
> 
> BR,
> Kewen