From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by sourceware.org (Postfix) with ESMTPS id 001C53858C50 for ; Thu, 9 Feb 2023 02:15:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 001C53858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x62d.google.com with SMTP id f6so1295048pln.12 for ; Wed, 08 Feb 2023 18:15:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=gv976MsXVsXLEHDOdqOy67CB7bYx2Te63znPBHTmba0=; b=ZJIoSJmSEiGHTkrWPWtCxGLl5AOQORB9m7YtwEj1g4x8Ii1W4N/LT+ke98y41LswwD TAezYN9XtnNQU++S0gJqDUjcvylL/B6dRKU38oVASyJPYkGaScUm/Jo3Z5Igm2pijMWc F54+jnr6JhiS0yDqLO51GZmxfjG+a2VTbUNzL+4ppSBrK27OqoGNa6IgAwzRrs23QpNB 9LoeTzCJyoBRVl2b3enaTpus+EG7+BCVe4AxopBEtNoYu7/QVRlV0mkKTQ7Q1sEtwQxP DMZ3LnASeXzy8oCYZfNftijBLVVYbVRjZ32Lqoin2y9+guHmiuVa7VXqYNwWwBOKebz8 8eqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=gv976MsXVsXLEHDOdqOy67CB7bYx2Te63znPBHTmba0=; b=lUtoOy2QEynab4qTRmVPqpf94/80OF0tbt65d0XM0FcA2nWly7lsFMxlx5/g1qubN+ SYL8DObhX6VMhVM0mcCqs/kGaAtCEdHQHG08Dquy3g7Y48RiQOCzNi4b+YoCCZa/wIPF br1UZsMBHIiGh4bQwnNsT22iiH/cxyii/2KN6F/OYnlIH9gW13d0xqaNkGj2f+HKO0kI ObMxYWfVeCHDksa7UtgeuIziI4ZG1HkUAmksbHJ/0MAm0giABWw9AsATnjSWRH3AN3/j pxOGNXTaDsEa1lUB5qqSVXIzvPMPeHYDPsPdWJTi8DELdS2zf8NGuj8M9BNAu4r40tp0 8NBg== X-Gm-Message-State: AO0yUKXFcO1E2N45YAq5D2egNCpdKrHEf9OfiryXWH53c30B2grrZ4s6 hH85rXt5Ts+aFuhAC8xzyU0= X-Google-Smtp-Source: AK7set+6Bi+NAKGsFPM3TU/vK5qAsy34nIXs81zAggQKjlVD7L/Evvb4DzWr3oQAp1+K0jq/63Bbew== X-Received: by 2002:a05:6a21:3609:b0:b6:99a4:66bd with SMTP id yg9-20020a056a21360900b000b699a466bdmr7696228pzb.38.1675908930763; Wed, 08 Feb 2023 18:15:30 -0800 (PST) Received: from [192.168.255.10] ([103.7.29.32]) by smtp.gmail.com with ESMTPSA id c22-20020aa78e16000000b005813f365afcsm101365pfr.189.2023.02.08.18.15.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 08 Feb 2023 18:15:30 -0800 (PST) Message-ID: Date: Thu, 9 Feb 2023 10:15:22 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] To: "Kewen.Lin" , Segher Boessenkool Cc: Xionghu Luo , gcc-patches@gcc.gnu.org, David Edelsohn , Jakub Jelinek References: <20220808034247.2618809-1-xionghuluo@tencent.com> <76035a5e-f0d8-8bc5-93e9-cfb08b2127f8@gmail.com> <20220810170700.GA25951@gate.crashing.org> <472c1531-aae6-123e-6b0c-8827f5585879@gmail.com> <5df1a7fc-dacf-72e2-041d-66624926091f@linux.ibm.com> <37b57a54-f98e-96a3-edff-866c8aae4c7d@gmail.com> <5418ebd2-d544-f4cc-d930-bdde64ad2807@gmail.com> From: Xionghu Luo In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Thanks Kewen! Ping this again @Segher. Maybe we could also merge this patch if no objections from Segher as several reviews and tests taken on this already... BR, Xionghu On 2023/1/18 17:11, Kewen.Lin wrote: > Hi Segher, > > I guessed that this patch escaped from your radar. :) > > As Jakub asked the status in PR106069, I applied this attached patch from Xionghu > to the latest trunk, re-tested it and confirmed that it's still bootstrapped and > regtested on powerpc64-linux-gnu P8 and powerpc64le-linux-gnu P9 and P10. > > This new version has separated out direct le and be, it's more clear than before, > it looked good to me. What do you think of this? Looking forward to your opinion. > > btw, the link in archives: > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600169.html > > BR, > Kewen > > on 2022/8/24 09:24, Xionghu Luo wrote: >> 主题: >> Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] >> From: >> Xionghu Luo >> 日期: >> 2022/8/24, 09:24 >> >> 收件人: >> "Kewen.Lin" , Segher Boessenkool >> 抄送: >> Xionghu Luo , gcc-patches@gcc.gnu.org, David Edelsohn , Segher Boessenkool >> >> >> Hi Segher, I'd like to resend and ping for this patch. Thanks. >> >> v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch >> >> From 23bffdacdf0eb1140c7a3571e6158797f4818d57 Mon Sep 17 00:00:00 2001 >> From: Xionghu Luo >> Date: Thu, 4 Aug 2022 03:44:58 +0000 >> Subject: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the >> UNSPECS [PR106069] >> >> v4: Update per comments. >> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match >> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le >> patterns. >> v2: Split the direct pattern to be and le with same RTL but different insn. >> >> The native RTL expression for vec_mrghw should be same for BE and LE as >> they are register and endian-independent. So both BE and LE need >> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw >> with vec_select and vec_concat. >> >> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI >> (subreg:V4SI (reg:V16QI 139) 0) >> (subreg:V4SI (reg:V16QI 140) 0)) >> [const_int 0 4 1 5])) >> >> Then combine pass could do the nested vec_select optimization >> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) >> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} >> >> => >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) >> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} >> >> The endianness check need only once at ASM generation finally. >> ASM would be better due to nested vec_select simplified to simple scalar >> load. >> >> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} >> Linux. >> >> gcc/ChangeLog: >> >> PR target/106069 >> * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. >> (altivec_vmrghb_direct_be): New pattern for BE. >> (altivec_vmrghb_direct_le): New pattern for LE. >> (altivec_vmrghh_direct): Remove. >> (altivec_vmrghh_direct_be): New pattern for BE. >> (altivec_vmrghh_direct_le): New pattern for LE. >> (altivec_vmrghw_direct_): Remove. >> (altivec_vmrghw_direct__be): New pattern for BE. >> (altivec_vmrghw_direct__le): New pattern for LE. >> (altivec_vmrglb_direct): Remove. >> (altivec_vmrglb_direct_be): New pattern for BE. >> (altivec_vmrglb_direct_le): New pattern for LE. >> (altivec_vmrglh_direct): Remove. >> (altivec_vmrglh_direct_be): New pattern for BE. >> (altivec_vmrglh_direct_le): New pattern for LE. >> (altivec_vmrglw_direct_): Remove. >> (altivec_vmrglw_direct__be): New pattern for BE. >> (altivec_vmrglw_direct__le): New pattern for LE. >> * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): >> Adjust. >> * config/rs6000/vsx.md: Likewise. >> >> gcc/testsuite/ChangeLog: >> >> PR target/106069 >> * g++.target/powerpc/pr106069.C: New test. >> >> Signed-off-by: Xionghu Luo >> --- >> gcc/config/rs6000/altivec.md | 222 ++++++++++++++------ >> gcc/config/rs6000/rs6000.cc | 24 +-- >> gcc/config/rs6000/vsx.md | 28 +-- >> gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++ >> 4 files changed, 307 insertions(+), 85 deletions(-) >> create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C >> >> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md >> index 2c4940f2e21..c6a381908cb 100644 >> --- a/gcc/config/rs6000/altivec.md >> +++ b/gcc/config/rs6000/altivec.md >> @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb" >> (use (match_operand:V16QI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct >> - : gen_altivec_vmrglb_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + if (BYTES_BIG_ENDIAN) >> + emit_insn ( >> + gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2])); >> + else >> + emit_insn ( >> + gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1])); >> DONE; >> }) >> >> -(define_insn "altivec_vmrghb_direct" >> +(define_insn "altivec_vmrghb_direct_be" >> [(set (match_operand:V16QI 0 "register_operand" "=v") >> (vec_select:V16QI >> (vec_concat:V32QI >> @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct" >> (const_int 5) (const_int 21) >> (const_int 6) (const_int 22) >> (const_int 7) (const_int 23)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> + "vmrghb %0,%1,%2" >> + [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrghb_direct_le" >> + [(set (match_operand:V16QI 0 "register_operand" "=v") >> + (vec_select:V16QI >> + (vec_concat:V32QI >> + (match_operand:V16QI 2 "register_operand" "v") >> + (match_operand:V16QI 1 "register_operand" "v")) >> + (parallel [(const_int 8) (const_int 24) >> + (const_int 9) (const_int 25) >> + (const_int 10) (const_int 26) >> + (const_int 11) (const_int 27) >> + (const_int 12) (const_int 28) >> + (const_int 13) (const_int 29) >> + (const_int 14) (const_int 30) >> + (const_int 15) (const_int 31)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> "vmrghb %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh" >> (use (match_operand:V8HI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct >> - : gen_altivec_vmrglh_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + if (BYTES_BIG_ENDIAN) >> + emit_insn ( >> + gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2])); >> + else >> + emit_insn ( >> + gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1])); >> DONE; >> }) >> >> -(define_insn "altivec_vmrghh_direct" >> +(define_insn "altivec_vmrghh_direct_be" >> [(set (match_operand:V8HI 0 "register_operand" "=v") >> - (vec_select:V8HI >> + (vec_select:V8HI >> (vec_concat:V16HI >> (match_operand:V8HI 1 "register_operand" "v") >> (match_operand:V8HI 2 "register_operand" "v")) >> @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct" >> (const_int 1) (const_int 9) >> (const_int 2) (const_int 10) >> (const_int 3) (const_int 11)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> + "vmrghh %0,%1,%2" >> + [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrghh_direct_le" >> + [(set (match_operand:V8HI 0 "register_operand" "=v") >> + (vec_select:V8HI >> + (vec_concat:V16HI >> + (match_operand:V8HI 2 "register_operand" "v") >> + (match_operand:V8HI 1 "register_operand" "v")) >> + (parallel [(const_int 4) (const_int 12) >> + (const_int 5) (const_int 13) >> + (const_int 6) (const_int 14) >> + (const_int 7) (const_int 15)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> "vmrghh %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw" >> (use (match_operand:V4SI 2 "register_operand"))] >> "VECTOR_MEM_ALTIVEC_P (V4SImode)" >> { >> - rtx (*fun) (rtx, rtx, rtx); >> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si >> - : gen_altivec_vmrglw_direct_v4si; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + if (BYTES_BIG_ENDIAN) >> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], >> + operands[1], >> + operands[2])); >> + else >> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], >> + operands[2], >> + operands[1])); >> DONE; >> }) >> >> -(define_insn "altivec_vmrghw_direct_" >> +(define_insn "altivec_vmrghw_direct__be" >> [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> (vec_select:VSX_W >> (vec_concat: >> @@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_" >> (match_operand:VSX_W 2 "register_operand" "wa,v")) >> (parallel [(const_int 0) (const_int 4) >> (const_int 1) (const_int 5)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> + "@ >> + xxmrghw %x0,%x1,%x2 >> + vmrghw %0,%1,%2" >> + [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrghw_direct__le" >> + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> + (vec_select:VSX_W >> + (vec_concat: >> + (match_operand:VSX_W 2 "register_operand" "wa,v") >> + (match_operand:VSX_W 1 "register_operand" "wa,v")) >> + (parallel [(const_int 2) (const_int 6) >> + (const_int 3) (const_int 7)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> "@ >> xxmrghw %x0,%x1,%x2 >> vmrghw %0,%1,%2" >> @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb" >> (use (match_operand:V16QI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct >> - : gen_altivec_vmrghb_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + if (BYTES_BIG_ENDIAN) >> + emit_insn ( >> + gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2])); >> + else >> + emit_insn ( >> + gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1])); >> DONE; >> }) >> >> -(define_insn "altivec_vmrglb_direct" >> +(define_insn "altivec_vmrglb_direct_be" >> [(set (match_operand:V16QI 0 "register_operand" "=v") >> (vec_select:V16QI >> (vec_concat:V32QI >> @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct" >> (const_int 13) (const_int 29) >> (const_int 14) (const_int 30) >> (const_int 15) (const_int 31)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> + "vmrglb %0,%1,%2" >> + [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrglb_direct_le" >> + [(set (match_operand:V16QI 0 "register_operand" "=v") >> + (vec_select:V16QI >> + (vec_concat:V32QI >> + (match_operand:V16QI 2 "register_operand" "v") >> + (match_operand:V16QI 1 "register_operand" "v")) >> + (parallel [(const_int 0) (const_int 16) >> + (const_int 1) (const_int 17) >> + (const_int 2) (const_int 18) >> + (const_int 3) (const_int 19) >> + (const_int 4) (const_int 20) >> + (const_int 5) (const_int 21) >> + (const_int 6) (const_int 22) >> + (const_int 7) (const_int 23)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> "vmrglb %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh" >> (use (match_operand:V8HI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct >> - : gen_altivec_vmrghh_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + if (BYTES_BIG_ENDIAN) >> + emit_insn ( >> + gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2])); >> + else >> + emit_insn ( >> + gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1])); >> DONE; >> }) >> >> -(define_insn "altivec_vmrglh_direct" >> +(define_insn "altivec_vmrglh_direct_be" >> [(set (match_operand:V8HI 0 "register_operand" "=v") >> (vec_select:V8HI >> (vec_concat:V16HI >> @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct" >> (const_int 5) (const_int 13) >> (const_int 6) (const_int 14) >> (const_int 7) (const_int 15)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> + "vmrglh %0,%1,%2" >> + [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrglh_direct_le" >> + [(set (match_operand:V8HI 0 "register_operand" "=v") >> + (vec_select:V8HI >> + (vec_concat:V16HI >> + (match_operand:V8HI 2 "register_operand" "v") >> + (match_operand:V8HI 1 "register_operand" "v")) >> + (parallel [(const_int 0) (const_int 8) >> + (const_int 1) (const_int 9) >> + (const_int 2) (const_int 10) >> + (const_int 3) (const_int 11)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> "vmrglh %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw" >> (use (match_operand:V4SI 2 "register_operand"))] >> "VECTOR_MEM_ALTIVEC_P (V4SImode)" >> { >> - rtx (*fun) (rtx, rtx, rtx); >> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si >> - : gen_altivec_vmrghw_direct_v4si; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + if (BYTES_BIG_ENDIAN) >> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], >> + operands[1], >> + operands[2])); >> + else >> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], >> + operands[2], >> + operands[1])); >> DONE; >> }) >> >> -(define_insn "altivec_vmrglw_direct_" >> +(define_insn "altivec_vmrglw_direct__be" >> [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> (vec_select:VSX_W >> (vec_concat: >> @@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_" >> (match_operand:VSX_W 2 "register_operand" "wa,v")) >> (parallel [(const_int 2) (const_int 6) >> (const_int 3) (const_int 7)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> + "@ >> + xxmrglw %x0,%x1,%x2 >> + vmrglw %0,%1,%2" >> + [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrglw_direct__le" >> + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> + (vec_select:VSX_W >> + (vec_concat: >> + (match_operand:VSX_W 2 "register_operand" "wa,v") >> + (match_operand:VSX_W 1 "register_operand" "wa,v")) >> + (parallel [(const_int 0) (const_int 4) >> + (const_int 1) (const_int 5)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> "@ >> xxmrglw %x0,%x1,%x2 >> vmrglw %0,%1,%2" >> @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi" >> { >> emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); >> } >> DONE; >> }) >> @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi" >> { >> emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); >> } >> DONE; >> }) >> @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi" >> { >> emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); >> } >> DONE; >> }) >> @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi" >> { >> emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); >> } >> DONE; >> }) >> @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi" >> { >> emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); >> } >> DONE; >> }) >> @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi" >> { >> emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); >> } >> DONE; >> }) >> @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi" >> { >> emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); >> } >> DONE; >> }) >> @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi" >> { >> emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); >> } >> DONE; >> }) >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index df491bee2ea..c6ccd40e089 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, >> CODE_FOR_altivec_vpkuwum_direct, >> {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, >> {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct >> - : CODE_FOR_altivec_vmrglb_direct, >> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be >> + : CODE_FOR_altivec_vmrglb_direct_le, >> {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, >> {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct >> - : CODE_FOR_altivec_vmrglh_direct, >> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be >> + : CODE_FOR_altivec_vmrglh_direct_le, >> {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, >> {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si >> - : CODE_FOR_altivec_vmrglw_direct_v4si, >> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be >> + : CODE_FOR_altivec_vmrglw_direct_v4si_le, >> {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, >> {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct >> - : CODE_FOR_altivec_vmrghb_direct, >> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be >> + : CODE_FOR_altivec_vmrghb_direct_le, >> {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, >> {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct >> - : CODE_FOR_altivec_vmrghh_direct, >> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be >> + : CODE_FOR_altivec_vmrghh_direct_le, >> {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, >> {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si >> - : CODE_FOR_altivec_vmrghw_direct_v4si, >> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be >> + : CODE_FOR_altivec_vmrghw_direct_v4si_le, >> {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, >> {OPTION_MASK_P8_VECTOR, >> BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct >> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md >> index e226a93bbe5..80f84e9b141 100644 >> --- a/gcc/config/rs6000/vsx.md >> +++ b/gcc/config/rs6000/vsx.md >> @@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_" >> (const_int 1) (const_int 5)])))] >> "VECTOR_MEM_VSX_P (mode)" >> { >> - rtx (*fun) (rtx, rtx, rtx); >> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_ >> - : gen_altivec_vmrglw_direct_; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + if (BYTES_BIG_ENDIAN) >> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], >> + operands[1], >> + operands[2])); >> + else >> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], >> + operands[2], >> + operands[1])); >> DONE; >> } >> [(set_attr "type" "vecperm")]) >> @@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_" >> (const_int 3) (const_int 7)])))] >> "VECTOR_MEM_VSX_P (mode)" >> { >> - rtx (*fun) (rtx, rtx, rtx); >> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_ >> - : gen_altivec_vmrghw_direct_; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + if (BYTES_BIG_ENDIAN) >> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], >> + operands[1], >> + operands[2])); >> + else >> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], >> + operands[2], >> + operands[1])); >> DONE; >> } >> [(set_attr "type" "vecperm")]) >> diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C >> new file mode 100644 >> index 00000000000..c89739ecb55 >> --- /dev/null >> +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C >> @@ -0,0 +1,118 @@ >> +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ >> +/* { dg-require-effective-target vmx_hw } */ >> +/* { dg-do run } */ >> + >> +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; >> + >> +union >> +{ >> + native_simd_type V; >> + int R[4]; >> +} store_le_vec; >> + >> +struct S >> +{ >> + S () = default; >> + S (unsigned B0) >> + { >> + native_simd_type val{B0}; >> + m_simd = val; >> + } >> + void store_le (unsigned int out[]) >> + { >> + store_le_vec.V = m_simd; >> + unsigned int x0 = store_le_vec.R[0]; >> + __builtin_memcpy (out, &x0, 4); >> + } >> + S rotl (unsigned int r) >> + { >> + native_simd_type rot{r}; >> + return __builtin_vec_rl (m_simd, rot); >> + } >> + void operator+= (S other) >> + { >> + m_simd = __builtin_vec_add (m_simd, other.m_simd); >> + } >> + void operator^= (S other) >> + { >> + m_simd = __builtin_vec_xor (m_simd, other.m_simd); >> + } >> + static void transpose (S &B0, S B1, S B2, S B3) >> + { >> + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); >> + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); >> + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); >> + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); >> + B0 = __builtin_vec_mergeh (T0, T1); >> + B3 = __builtin_vec_mergel (T2, T3); >> + } >> + S (native_simd_type x) : m_simd (x) {} >> + native_simd_type m_simd; >> +}; >> + >> +void >> +foo (unsigned int output[], unsigned state[]) >> +{ >> + S R00 = state[0]; >> + S R01 = state[0]; >> + S R02 = state[2]; >> + S R03 = state[0]; >> + S R05 = state[5]; >> + S R06 = state[6]; >> + S R07 = state[7]; >> + S R08 = state[8]; >> + S R09 = state[9]; >> + S R10 = state[10]; >> + S R11 = state[11]; >> + S R12 = state[12]; >> + S R13 = state[13]; >> + S R14 = state[4]; >> + S R15 = state[15]; >> + for (int r = 0; r != 10; ++r) >> + { >> + R09 += R13; >> + R11 += R15; >> + R05 ^= R09; >> + R06 ^= R10; >> + R07 ^= R11; >> + R07 = R07.rotl (7); >> + R00 += R05; >> + R01 += R06; >> + R02 += R07; >> + R15 ^= R00; >> + R12 ^= R01; >> + R13 ^= R02; >> + R00 += R05; >> + R01 += R06; >> + R02 += R07; >> + R15 ^= R00; >> + R12 = R12.rotl (8); >> + R13 = R13.rotl (8); >> + R10 += R15; >> + R11 += R12; >> + R08 += R13; >> + R09 += R14; >> + R05 ^= R10; >> + R06 ^= R11; >> + R07 ^= R08; >> + R05 = R05.rotl (7); >> + R06 = R06.rotl (7); >> + R07 = R07.rotl (7); >> + } >> + R00 += state[0]; >> + S::transpose (R00, R01, R02, R03); >> + R00.store_le (output); >> +} >> + >> +unsigned int res[1]; >> +unsigned main_state[]{1634760805, 60878, 2036477234, 6, >> + 0, 825562964, 1471091955, 1346092787, >> + 506976774, 4197066702, 518848283, 118491664, >> + 0, 0, 0, 0}; >> +int >> +main () >> +{ >> + foo (res, main_state); >> + if (res[0] != 0x41fcef98) >> + __builtin_abort (); >> +} >> -- 2.27.0 >> >> 附件: >> >> v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch 25.1 K >> > > > BR, > Kewen