From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by sourceware.org (Postfix) with ESMTPS id E2A2B3858D1E for ; Wed, 17 Aug 2022 06:23:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E2A2B3858D1E Received: by mail-pj1-x1032.google.com with SMTP id t2-20020a17090a4e4200b001f21572f3a4so991771pjl.0 for ; Tue, 16 Aug 2022 23:23:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc; bh=rKZ1566Wro4UHn4d+6q9L4G9j6YGxqfAcB1eXwG62no=; b=NkKpCCoXRL3QsZOHvmMtLTvM+giqROx6i6RnqbuP3mW4pEHbJ5KXoiUCwnP73xTyeu uwgo4f7QQZ9pESZUuFkGTP9chdbTGviQHm0dKJNMaUQ9CV1ZOFMLXO1Rv0YDuxU0noew ep4bbLdR/LNwBt0jTqrvNGqyixnZj7uVBFhQnfloPC/TnTN22KT40Pp2joQ7Jvtxc7uK MqadQQKy1AYH8+xd91+GPQqNPfibaKiCb56LQtpRaA62+p6mJdNzp/oY7Ck00Nly27/F q+tUxcOmRWMrg3Mzd7rWI8H2YACagHhnUAWq2diZb2sVL1oYXUTLr6pPKM8UMSLInQGy A4+A== X-Gm-Message-State: ACgBeo1OorhqtT16/lt7o+tfXIqzcpj9Od6mmKvmUOVv478ZTUl8G0sv Nbt6X+Dv9VNscEKaLovDDVIZRJzjH3iY/w== X-Google-Smtp-Source: AA6agR5QZXs5rHYuqofmr1hBs/skiNAyvHtUa1n0QTp8nUguwcDh0KMamspUunITyPm5s7hY5D36jA== X-Received: by 2002:a17:902:ce0e:b0:172:69cc:60aa with SMTP id k14-20020a170902ce0e00b0017269cc60aamr14536564plg.31.1660717433577; Tue, 16 Aug 2022 23:23:53 -0700 (PDT) Received: from [192.168.255.10] ([103.7.29.32]) by smtp.gmail.com with ESMTPSA id v127-20020a626185000000b0052aaff953aesm9549174pfb.115.2022.08.16.23.23.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 16 Aug 2022 23:23:52 -0700 (PDT) Message-ID: <37b57a54-f98e-96a3-edff-866c8aae4c7d@gmail.com> Date: Wed, 17 Aug 2022 14:23:44 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 Subject: Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] To: "Kewen.Lin" Cc: Xionghu Luo , gcc-patches@gcc.gnu.org, David Edelsohn , Segher Boessenkool References: <20220808034247.2618809-1-xionghuluo@tencent.com> <76035a5e-f0d8-8bc5-93e9-cfb08b2127f8@gmail.com> <20220810170700.GA25951@gate.crashing.org> <472c1531-aae6-123e-6b0c-8827f5585879@gmail.com> <5df1a7fc-dacf-72e2-041d-66624926091f@linux.ibm.com> From: Xionghu Luo In-Reply-To: <5df1a7fc-dacf-72e2-041d-66624926091f@linux.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2022 06:24:01 -0000 On 2022/8/16 14:53, Kewen.Lin wrote: > Hi Xionghu, > > Thanks for the updated version of patch, some comments are inlined. > > on 2022/8/11 14:15, Xionghu Luo wrote: >> >> >> On 2022/8/11 01:07, Segher Boessenkool wrote: >>> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote: >>>> On 2022/8/9 11:01, Kewen.Lin wrote: >>>>> I have some concern on those changed "altivec_*_direct", IMHO the suffix >>>>> "_direct" is normally to indicate the define_insn is mapped to the >>>>> corresponding hw insn directly.  With this change, for example, >>>>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks >>>>> misleading.  Maybe we can add the corresponding _direct_le and _direct_be >>>>> versions, both are mapped into the same insn but have different RTL >>>>> patterns.  Looking forward to Segher's and David's suggestions. >>>> >>>> Thanks!  Do you mean same RTL patterns with different hw insn? >>> >>> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb >>> instruction, never a vmrglb instead.  Misleading names are an expensive >>> problem. >>> >>> >> >> Thanks.  Then on LE platforms, if user calls altivec_vmrghw,it will be >> expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and >> finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw". >> For BE just strict forward, seems more clear :-), OK for master? >> >> >> [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] >> >> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match >> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le >> patterns. >> v2: Split the direct pattern to be and le with same RTL but different insn. >> >> The native RTL expression for vec_mrghw should be same for BE and LE as >> they are register and endian-independent.  So both BE and LE need >> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw >> with vec_select and vec_concat. >> >> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI >>            (subreg:V4SI (reg:V16QI 139) 0) >>            (subreg:V4SI (reg:V16QI 140) 0)) >>            [const_int 0 4 1 5])) >> >> Then combine pass could do the nested vec_select optimization >> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) >> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} >> >> => >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) >> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} >> >> The endianness check need only once at ASM generation finally. >> ASM would be better due to nested vec_select simplified to simple scalar >> load. >> >> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64} >> Linux(Thanks to Kewen). >> >> gcc/ChangeLog: >> >>     PR target/106069 >>     * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. >>     (altivec_vmrghb_direct_be): New pattern for BE. >>     (altivec_vmrglb_direct_le): New pattern for LE. >>     (altivec_vmrghh_direct): Remove. >>     (altivec_vmrghh_direct_be): New pattern for BE. >>     (altivec_vmrglh_direct_le): New pattern for LE. >>     (altivec_vmrghw_direct_): Remove. >>     (altivec_vmrghw_direct__be): New pattern for BE. >>     (altivec_vmrglw_direct__le): New pattern for LE. >>     (altivec_vmrglb_direct): Remove. >>     (altivec_vmrglb_direct_be): New pattern for BE. >>     (altivec_vmrghb_direct_le): New pattern for LE. >>     (altivec_vmrglh_direct): Remove. >>     (altivec_vmrglh_direct_be): New pattern for BE. >>     (altivec_vmrghh_direct_le): New pattern for LE. >>     (altivec_vmrglw_direct_): Remove. >>     (altivec_vmrglw_direct__be): New pattern for BE. >>     (altivec_vmrghw_direct__le): New pattern for LE. >>     * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): >>     Adjust. >>     * config/rs6000/vsx.md: Likewise. >> >> gcc/testsuite/ChangeLog: >> >>     PR target/106069 >>     * g++.target/powerpc/pr106069.C: New test. >> >> Signed-off-by: Xionghu Luo >> --- >>  gcc/config/rs6000/altivec.md                | 223 ++++++++++++++------ >>  gcc/config/rs6000/rs6000.cc                 |  36 ++-- >>  gcc/config/rs6000/vsx.md                    |  24 +-- >>  gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++ >>  4 files changed, 305 insertions(+), 98 deletions(-) >>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C >> >> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md >> index 2c4940f2e21..78245f470e9 100644 >> --- a/gcc/config/rs6000/altivec.md >> +++ b/gcc/config/rs6000/altivec.md >> @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb" >>     (use (match_operand:V16QI 2 "register_operand"))] >>    "TARGET_ALTIVEC" >>  { >> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct >> -                        : gen_altivec_vmrglb_direct; >> -  if (!BYTES_BIG_ENDIAN) >> -    std::swap (operands[1], operands[2]); >> -  emit_insn (fun (operands[0], operands[1], operands[2])); >> +  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17), >> +              GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19), >> +              GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21), >> +              GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23)); >> +  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); >> +  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> +  emit_insn (gen_rtx_SET (operands[0], x)); >>    DONE; >>  }) > > I think you can just call gen_altivec_vmrghb_direct_be and > gen_altivec_vmrghb_direct_le separately here. Similar for some other > define_expands. > >> >> -(define_insn "altivec_vmrghb_direct" >> +(define_insn "altivec_vmrghb_direct_be" >>    [(set (match_operand:V16QI 0 "register_operand" "=v") >>      (vec_select:V16QI >>        (vec_concat:V32QI >> @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct" >>               (const_int 5) (const_int 21) >>               (const_int 6) (const_int 22) >>               (const_int 7) (const_int 23)])))] >> -  "TARGET_ALTIVEC" >> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >>    "vmrghb %0,%1,%2" >>    [(set_attr "type" "vecperm")]) >> > > Could you move the following altivec_vmrghb_direct_le here? > Then readers can easily check the difference between be and > le for the same altivec_vmrghb_direct. > > Same comment applied for some other similar cases. > >> +(define_insn "altivec_vmrglb_direct_le" >> +  [(set (match_operand:V16QI 0 "register_operand" "=v") >> +    (vec_select:V16QI >> +      (vec_concat:V32QI >> +        (match_operand:V16QI 1 "register_operand" "v") >> +        (match_operand:V16QI 2 "register_operand" "v")) >> +      (parallel [(const_int 0) (const_int 16) >> +             (const_int 1) (const_int 17) >> +             (const_int 2) (const_int 18) >> +             (const_int 3) (const_int 19) >> +             (const_int 4) (const_int 20) >> +             (const_int 5) (const_int 21) >> +             (const_int 6) (const_int 22) >> +             (const_int 7) (const_int 23)])))] >> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> +  "vmrglb %0,%2,%1" >> +  [(set_attr "type" "vecperm")]) > > Could you update this pattern for assembly "vmrglb %0,%1,%2" > instead of "vmrglb %0,%2,%1"? I checked the previous md > before the culprit commit 0910c516a3d72af048, it emits > "vmrglb %0,%1,%2" for altivec_vmrglb_direct. > > Same comment applied for some other similar cases. > >> + >>  (define_expand "altivec_vmrghh" >>    [(use (match_operand:V8HI 0 "register_operand")) >>     (use (match_operand:V8HI 1 "register_operand")) >>     (use (match_operand:V8HI 2 "register_operand"))] >>    "TARGET_ALTIVEC" >>  { >> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct >> -                        : gen_altivec_vmrglh_direct; >> -  if (!BYTES_BIG_ENDIAN) >> -    std::swap (operands[1], operands[2]); >> -  emit_insn (fun (operands[0], operands[1], operands[2])); >> +  rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9), >> +              GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11)); >> +  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); >> + >> +  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> +  emit_insn (gen_rtx_SET (operands[0], x)); >>    DONE; >>  }) >> >> -(define_insn "altivec_vmrghh_direct" >> +(define_insn "altivec_vmrghh_direct_be" >>    [(set (match_operand:V8HI 0 "register_operand" "=v") >> -        (vec_select:V8HI >> +    (vec_select:V8HI >>        (vec_concat:V16HI >>          (match_operand:V8HI 1 "register_operand" "v") >>          (match_operand:V8HI 2 "register_operand" "v")) >> @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct" >>               (const_int 1) (const_int 9) >>               (const_int 2) (const_int 10) >>               (const_int 3) (const_int 11)])))] >> -  "TARGET_ALTIVEC" >> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >>    "vmrghh %0,%1,%2" >>    [(set_attr "type" "vecperm")]) >> >> +(define_insn "altivec_vmrglh_direct_le" >> +  [(set (match_operand:V8HI 0 "register_operand" "=v") >> +    (vec_select:V8HI >> +      (vec_concat:V16HI >> +        (match_operand:V8HI 1 "register_operand" "v") >> +        (match_operand:V8HI 2 "register_operand" "v")) >> +      (parallel [(const_int 0) (const_int 8) >> +             (const_int 1) (const_int 9) >> +             (const_int 2) (const_int 10) >> +             (const_int 3) (const_int 11)])))] >> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> +  "vmrglh %0,%2,%1" >> +  [(set_attr "type" "vecperm")]) >> + >>  (define_expand "altivec_vmrghw" >>    [(use (match_operand:V4SI 0 "register_operand")) >>     (use (match_operand:V4SI 1 "register_operand")) >>     (use (match_operand:V4SI 2 "register_operand"))] >>    "VECTOR_MEM_ALTIVEC_P (V4SImode)" >>  { >> -  rtx (*fun) (rtx, rtx, rtx); >> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si >> -             : gen_altivec_vmrglw_direct_v4si; >> -  if (!BYTES_BIG_ENDIAN) >> -    std::swap (operands[1], operands[2]); >> -  emit_insn (fun (operands[0], operands[1], operands[2])); >> +  rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5)); >> +  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); >> +  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> +  emit_insn (gen_rtx_SET (operands[0], x)); >>    DONE; >>  }) >> >> -(define_insn "altivec_vmrghw_direct_" >> +(define_insn "altivec_vmrghw_direct__be" >>    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >>      (vec_select:VSX_W >>        (vec_concat: >> @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_" >>          (match_operand:VSX_W 2 "register_operand" "wa,v")) >>        (parallel [(const_int 0) (const_int 4) >>               (const_int 1) (const_int 5)])))] >> -  "TARGET_ALTIVEC" >> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> +  "@ >> +  xxmrghw %x0,%x1,%x2 >> +  vmrghw %0,%1,%2" >> +  [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrglw_direct__le" >> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> +    (vec_select:VSX_W >> +      (vec_concat: >> +        (match_operand:VSX_W 1 "register_operand" "wa,v") >> +        (match_operand:VSX_W 2 "register_operand" "wa,v")) >> +      (parallel [(const_int 0) (const_int 4) >> +             (const_int 1) (const_int 5)])))] >> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >>    "@ >> -   xxmrghw %x0,%x1,%x2 >> -   vmrghw %0,%1,%2" >> +  xxmrglw %x0,%x2,%x1 >> +  vmrglw %0,%2,%1" >>    [(set_attr "type" "vecperm")]) >> >>  (define_insn "*altivec_vmrghsf" >> @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb" >>     (use (match_operand:V16QI 2 "register_operand"))] >>    "TARGET_ALTIVEC" >>  { >> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct >> -                        : gen_altivec_vmrghb_direct; >> -  if (!BYTES_BIG_ENDIAN) >> -    std::swap (operands[1], operands[2]); >> -  emit_insn (fun (operands[0], operands[1], operands[2])); >> +  rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25), >> +              GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27), >> +              GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29), >> +              GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31)); >> +  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); >> +  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> +  emit_insn (gen_rtx_SET (operands[0], x)); >>    DONE; >>  }) >> >> -(define_insn "altivec_vmrglb_direct" >> +(define_insn "altivec_vmrglb_direct_be" >>    [(set (match_operand:V16QI 0 "register_operand" "=v") >>      (vec_select:V16QI >>        (vec_concat:V32QI >> @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct" >>               (const_int 13) (const_int 29) >>               (const_int 14) (const_int 30) >>               (const_int 15) (const_int 31)])))] >> -  "TARGET_ALTIVEC" >> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >>    "vmrglb %0,%1,%2" >>    [(set_attr "type" "vecperm")]) >> >> +(define_insn "altivec_vmrghb_direct_le" >> +  [(set (match_operand:V16QI 0 "register_operand" "=v") >> +    (vec_select:V16QI >> +      (vec_concat:V32QI >> +        (match_operand:V16QI 1 "register_operand" "v") >> +        (match_operand:V16QI 2 "register_operand" "v")) >> +      (parallel [(const_int  8) (const_int 24) >> +             (const_int  9) (const_int 25) >> +             (const_int 10) (const_int 26) >> +             (const_int 11) (const_int 27) >> +             (const_int 12) (const_int 28) >> +             (const_int 13) (const_int 29) >> +             (const_int 14) (const_int 30) >> +             (const_int 15) (const_int 31)])))] >> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> +  "vmrghb %0,%2,%1" >> +  [(set_attr "type" "vecperm")]) >> + >>  (define_expand "altivec_vmrglh" >>    [(use (match_operand:V8HI 0 "register_operand")) >>     (use (match_operand:V8HI 1 "register_operand")) >>     (use (match_operand:V8HI 2 "register_operand"))] >>    "TARGET_ALTIVEC" >>  { >> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct >> -                        : gen_altivec_vmrghh_direct; >> -  if (!BYTES_BIG_ENDIAN) >> -    std::swap (operands[1], operands[2]); >> -  emit_insn (fun (operands[0], operands[1], operands[2])); >> +  rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13), >> +              GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15)); >> +  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); >> +  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> +  emit_insn (gen_rtx_SET (operands[0], x)); >>    DONE; >>  }) >> >> -(define_insn "altivec_vmrglh_direct" >> +(define_insn "altivec_vmrglh_direct_be" >>    [(set (match_operand:V8HI 0 "register_operand" "=v") >>          (vec_select:V8HI >>        (vec_concat:V16HI >> @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct" >>               (const_int 5) (const_int 13) >>               (const_int 6) (const_int 14) >>               (const_int 7) (const_int 15)])))] >> -  "TARGET_ALTIVEC" >> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >>    "vmrglh %0,%1,%2" >>    [(set_attr "type" "vecperm")]) >> >> +(define_insn "altivec_vmrghh_direct_le" >> +  [(set (match_operand:V8HI 0 "register_operand" "=v") >> +        (vec_select:V8HI >> +      (vec_concat:V16HI >> +        (match_operand:V8HI 1 "register_operand" "v") >> +        (match_operand:V8HI 2 "register_operand" "v")) >> +      (parallel [(const_int 4) (const_int 12) >> +             (const_int 5) (const_int 13) >> +             (const_int 6) (const_int 14) >> +             (const_int 7) (const_int 15)])))] >> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> +  "vmrghh %0,%2,%1" >> +  [(set_attr "type" "vecperm")]) >> + >>  (define_expand "altivec_vmrglw" >>    [(use (match_operand:V4SI 0 "register_operand")) >>     (use (match_operand:V4SI 1 "register_operand")) >>     (use (match_operand:V4SI 2 "register_operand"))] >>    "VECTOR_MEM_ALTIVEC_P (V4SImode)" >>  { >> -  rtx (*fun) (rtx, rtx, rtx); >> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si >> -             : gen_altivec_vmrghw_direct_v4si; >> -  if (!BYTES_BIG_ENDIAN) >> -    std::swap (operands[1], operands[2]); >> -  emit_insn (fun (operands[0], operands[1], operands[2])); >> +  rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7)); >> +  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); >> +  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> +  emit_insn (gen_rtx_SET (operands[0], x)); >>    DONE; >>  }) >> >> -(define_insn "altivec_vmrglw_direct_" >> +(define_insn "altivec_vmrglw_direct__be" >>    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >>      (vec_select:VSX_W >>        (vec_concat: >> @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_" >>          (match_operand:VSX_W 2 "register_operand" "wa,v")) >>        (parallel [(const_int 2) (const_int 6) >>               (const_int 3) (const_int 7)])))] >> -  "TARGET_ALTIVEC" >> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> +  "@ >> +  xxmrglw %x0,%x1,%x2 >> +  vmrglw %0,%1,%2" >> +  [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrghw_direct__le" >> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> +    (vec_select:VSX_W >> +      (vec_concat: >> +        (match_operand:VSX_W 1 "register_operand" "wa,v") >> +        (match_operand:VSX_W 2 "register_operand" "wa,v")) >> +      (parallel [(const_int 2) (const_int 6) >> +             (const_int 3) (const_int 7)])))] >> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >>    "@ >> -   xxmrglw %x0,%x1,%x2 >> -   vmrglw %0,%1,%2" >> +  xxmrghw %x0,%x2,%x1 >> +  vmrghw %0,%2,%1" >>    [(set_attr "type" "vecperm")]) >> >>  (define_insn "*altivec_vmrglsf" >> @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi" >>      { >>        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); >> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); >>      } >>    else >>      { >>        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); >> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); > > Note that if you change assembly "vmrghh %0,%2,%1" to "vmrghh %0,%1,%2", > you need to change this to: > > emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > > Same comment applied for some other similar cases. > >>      } >>    DONE; >>  }) >> @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi" >>      { >>        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); >> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); >>      } >>    else >>      { >>        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); >> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); >>      } >>    DONE; >>  }) >> @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi" >>      { >>        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); >> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); >>      } >>    else >>      { >>        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); >> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); >>      } >>    DONE; >>  }) >> @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi" >>      { >>        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); >> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); >>      } >>    else >>      { >>        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); >> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); >>      } >>    DONE; >>  }) >> @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi" >>      { >>        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); >> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); >>      } >>    else >>      { >>        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); >> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); >>      } >>    DONE; >>  }) >> @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi" >>      { >>        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); >> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); >>      } >>    else >>      { >>        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); >> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); >>      } >>    DONE; >>  }) >> @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi" >>      { >>        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); >> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); >>      } >>    else >>      { >>        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); >> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); >>      } >>    DONE; >>  }) >> @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi" >>      { >>        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); >> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); >>      } >>    else >>      { >>        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); >>        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); >> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); >> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); >>      } >>    DONE; >>  }) >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index df491bee2ea..97da7706f63 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, >>      {OPTION_MASK_ALTIVEC, >>       CODE_FOR_altivec_vpkuwum_direct, >>       {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, >> -    {OPTION_MASK_ALTIVEC, >> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct >> -              : CODE_FOR_altivec_vmrglb_direct, >> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be, >>       {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, > > Before the culprit commit 0910c516a3d72af04, we have: > > { OPTION_MASK_ALTIVEC, > (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct > : CODE_FOR_altivec_vmrglb_direct), > { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, > > I think we should use: > > { OPTION_MASK_ALTIVEC, > (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be > : CODE_FOR_altivec_vmrglb_direct_le), > { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, > > here instead. Similar comment for those related below. > >> -    {OPTION_MASK_ALTIVEC, >> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct >> -              : CODE_FOR_altivec_vmrglh_direct, >> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be, >>       {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, >> -    {OPTION_MASK_ALTIVEC, >> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si >> -              : CODE_FOR_altivec_vmrglw_direct_v4si, >> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be, >>       {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, >> -    {OPTION_MASK_ALTIVEC, >> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct >> -              : CODE_FOR_altivec_vmrghb_direct, >> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be, >>       {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, >> -    {OPTION_MASK_ALTIVEC, >> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct >> -              : CODE_FOR_altivec_vmrghh_direct, >> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be, >>       {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, >> -    {OPTION_MASK_ALTIVEC, >> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si >> -              : CODE_FOR_altivec_vmrghw_direct_v4si, >> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be, >>       {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, >>      {OPTION_MASK_P8_VECTOR, >>       BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct >> @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, >> >>            /* For little-endian, the two input operands must be swapped >>               (or swapped back) to ensure proper right-to-left numbering >> -             from 0 to 2N-1.  */ >> -      if (swapped ^ !BYTES_BIG_ENDIAN >> -          && icode != CODE_FOR_vsx_xxpermdi_v16qi) >> +         from 0 to 2N-1.  Excludes the vmrg[lh][bhw] and xxpermdi ops.  */ >> +      if (swapped ^ !BYTES_BIG_ENDIAN) >> +        if (!(icode == CODE_FOR_altivec_vmrghb_direct_be >> +          || icode == CODE_FOR_altivec_vmrglb_direct_be >> +          || icode == CODE_FOR_altivec_vmrghh_direct_be >> +          || icode == CODE_FOR_altivec_vmrglh_direct_be >> +          || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be >> +          || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be >> +          || icode == CODE_FOR_vsx_xxpermdi_v16qi)) >>          std::swap (op0, op1); > > IIUC, we don't need this part of change once we fix the operand order in > the assembly for those LE "direct"s. > > BR, > Kewen > Thanks. Addressed all the comments as v4. v4: Update per comments. v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match the actual output ASM vmrglb. Likewise for all similar xxx_direct_le patterns. v2: Split the direct pattern to be and le with same RTL but different insn. The native RTL expression for vec_mrghw should be same for BE and LE as they are register and endian-independent. So both BE and LE need generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw with vec_select and vec_concat. (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 139) 0) (subreg:V4SI (reg:V16QI 140) 0)) [const_int 0 4 1 5])) Then combine pass could do the nested vec_select optimization in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} => 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} The endianness check need only once at ASM generation finally. ASM would be better due to nested vec_select simplified to simple scalar load. Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} Linux. gcc/ChangeLog: PR target/106069 * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. (altivec_vmrghb_direct_be): New pattern for BE. (altivec_vmrghb_direct_le): New pattern for LE. (altivec_vmrghh_direct): Remove. (altivec_vmrghh_direct_be): New pattern for BE. (altivec_vmrghh_direct_le): New pattern for LE. (altivec_vmrghw_direct_): Remove. (altivec_vmrghw_direct__be): New pattern for BE. (altivec_vmrghw_direct__le): New pattern for LE. (altivec_vmrglb_direct): Remove. (altivec_vmrglb_direct_be): New pattern for BE. (altivec_vmrglb_direct_le): New pattern for LE. (altivec_vmrglh_direct): Remove. (altivec_vmrglh_direct_be): New pattern for BE. (altivec_vmrglh_direct_le): New pattern for LE. (altivec_vmrglw_direct_): Remove. (altivec_vmrglw_direct__be): New pattern for BE. (altivec_vmrglw_direct__le): New pattern for LE. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Adjust. * config/rs6000/vsx.md: Likewise. gcc/testsuite/ChangeLog: PR target/106069 * g++.target/powerpc/pr106069.C: New test. Signed-off-by: Xionghu Luo --- gcc/config/rs6000/altivec.md | 230 ++++++++++++++------ gcc/config/rs6000/rs6000.cc | 24 +- gcc/config/rs6000/vsx.md | 28 ++- gcc/testsuite/g++.target/powerpc/pr106069.C | 120 ++++++++++ 4 files changed, 313 insertions(+), 89 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 2c4940f2e21..962df4657e6 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct - : gen_altivec_vmrglb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrghb_direct" +(define_insn "altivec_vmrghb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct" (const_int 5) (const_int 21) (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrghb %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 2 "register_operand" "v") + (match_operand:V16QI 1 "register_operand" "v")) + (parallel [(const_int 8) (const_int 24) + (const_int 9) (const_int 25) + (const_int 10) (const_int 26) + (const_int 11) (const_int 27) + (const_int 12) (const_int 28) + (const_int 13) (const_int 29) + (const_int 14) (const_int 30) + (const_int 15) (const_int 31)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrghb %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct - : gen_altivec_vmrglh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrghh_direct" +(define_insn "altivec_vmrghh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") - (vec_select:V8HI + (vec_select:V8HI (vec_concat:V16HI (match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")) @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct" (const_int 1) (const_int 9) (const_int 2) (const_int 10) (const_int 3) (const_int 11)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrghh %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 2 "register_operand" "v") + (match_operand:V8HI 1 "register_operand" "v")) + (parallel [(const_int 4) (const_int 12) + (const_int 5) (const_int 13) + (const_int 6) (const_int 14) + (const_int 7) (const_int 15)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrghh %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si - : gen_altivec_vmrglw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; }) -(define_insn "altivec_vmrghw_direct_" +(define_insn "altivec_vmrghw_direct__be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat: @@ -1221,10 +1257,24 @@ (define_insn "altivec_vmrghw_direct_" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrghw %x0,%x1,%x2 + vmrghw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghw_direct__le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat: + (match_operand:VSX_W 2 "register_operand" "wa,v") + (match_operand:VSX_W 1 "register_operand" "wa,v")) + (parallel [(const_int 2) (const_int 6) + (const_int 3) (const_int 7)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrghw %x0,%x1,%x2 - vmrghw %0,%1,%2" + xxmrghw %x0,%x1,%x2 + vmrghw %0,%1,%2" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrghsf" @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct - : gen_altivec_vmrghb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrglb_direct" +(define_insn "altivec_vmrglb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct" (const_int 13) (const_int 29) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrglb %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 2 "register_operand" "v") + (match_operand:V16QI 1 "register_operand" "v")) + (parallel [(const_int 0) (const_int 16) + (const_int 1) (const_int 17) + (const_int 2) (const_int 18) + (const_int 3) (const_int 19) + (const_int 4) (const_int 20) + (const_int 5) (const_int 21) + (const_int 6) (const_int 22) + (const_int 7) (const_int 23)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrglb %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct - : gen_altivec_vmrghh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrglh_direct" +(define_insn "altivec_vmrglh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") (vec_select:V8HI (vec_concat:V16HI @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct" (const_int 5) (const_int 13) (const_int 6) (const_int 14) (const_int 7) (const_int 15)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrglh %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 2 "register_operand" "v") + (match_operand:V8HI 1 "register_operand" "v")) + (parallel [(const_int 0) (const_int 8) + (const_int 1) (const_int 9) + (const_int 2) (const_int 10) + (const_int 3) (const_int 11)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrglh %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si - : gen_altivec_vmrghw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; }) -(define_insn "altivec_vmrglw_direct_" +(define_insn "altivec_vmrglw_direct__be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat: @@ -1327,10 +1413,24 @@ (define_insn "altivec_vmrglw_direct_" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrglw %x0,%x1,%x2 + vmrglw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglw_direct__le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat: + (match_operand:VSX_W 2 "register_operand" "wa,v") + (match_operand:VSX_W 1 "register_operand" "wa,v")) + (parallel [(const_int 0) (const_int 4) + (const_int 1) (const_int 5)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrglw %x0,%x1,%x2 - vmrglw %0,%1,%2" + xxmrglw %x0,%x1,%x2 + vmrglw %0,%1,%2" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrglsf" @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index df491bee2ea..c6ccd40e089 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, CODE_FOR_altivec_vpkuwum_direct, {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct - : CODE_FOR_altivec_vmrglb_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be + : CODE_FOR_altivec_vmrglb_direct_le, {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct - : CODE_FOR_altivec_vmrglh_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be + : CODE_FOR_altivec_vmrglh_direct_le, {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si - : CODE_FOR_altivec_vmrglw_direct_v4si, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be + : CODE_FOR_altivec_vmrglw_direct_v4si_le, {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct - : CODE_FOR_altivec_vmrghb_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be + : CODE_FOR_altivec_vmrghb_direct_le, {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct - : CODE_FOR_altivec_vmrghh_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be + : CODE_FOR_altivec_vmrghh_direct_le, {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si - : CODE_FOR_altivec_vmrghw_direct_v4si, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be + : CODE_FOR_altivec_vmrghw_direct_v4si_le, {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index e226a93bbe5..80f84e9b141 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_" (const_int 1) (const_int 5)])))] "VECTOR_MEM_VSX_P (mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_ - : gen_altivec_vmrglw_direct_; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; } [(set_attr "type" "vecperm")]) @@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_" (const_int 3) (const_int 7)])))] "VECTOR_MEM_VSX_P (mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_ - : gen_altivec_vmrghw_direct_; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; } [(set_attr "type" "vecperm")]) diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C new file mode 100644 index 00000000000..2cde9b821e3 --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C @@ -0,0 +1,120 @@ +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-do run } */ + +extern "C" void * +memcpy (void *, const void *, unsigned long); +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; + +union +{ + native_simd_type V; + int R[4]; +} store_le_vec; + +struct S +{ + S () = default; + S (unsigned B0) + { + native_simd_type val{B0}; + m_simd = val; + } + void store_le (unsigned int out[]) + { + store_le_vec.V = m_simd; + unsigned int x0 = store_le_vec.R[0]; + memcpy (out, &x0, 4); + } + S rotl (unsigned int r) + { + native_simd_type rot{r}; + return __builtin_vec_rl (m_simd, rot); + } + void operator+= (S other) + { + m_simd = __builtin_vec_add (m_simd, other.m_simd); + } + void operator^= (S other) + { + m_simd = __builtin_vec_xor (m_simd, other.m_simd); + } + static void transpose (S &B0, S B1, S B2, S B3) + { + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); + B0 = __builtin_vec_mergeh (T0, T1); + B3 = __builtin_vec_mergel (T2, T3); + } + S (native_simd_type x) : m_simd (x) {} + native_simd_type m_simd; +}; + +void +foo (unsigned int output[], unsigned state[]) +{ + S R00 = state[0]; + S R01 = state[0]; + S R02 = state[2]; + S R03 = state[0]; + S R05 = state[5]; + S R06 = state[6]; + S R07 = state[7]; + S R08 = state[8]; + S R09 = state[9]; + S R10 = state[10]; + S R11 = state[11]; + S R12 = state[12]; + S R13 = state[13]; + S R14 = state[4]; + S R15 = state[15]; + for (int r = 0; r != 10; ++r) + { + R09 += R13; + R11 += R15; + R05 ^= R09; + R06 ^= R10; + R07 ^= R11; + R07 = R07.rotl (7); + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 ^= R01; + R13 ^= R02; + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 = R12.rotl (8); + R13 = R13.rotl (8); + R10 += R15; + R11 += R12; + R08 += R13; + R09 += R14; + R05 ^= R10; + R06 ^= R11; + R07 ^= R08; + R05 = R05.rotl (7); + R06 = R06.rotl (7); + R07 = R07.rotl (7); + } + R00 += state[0]; + S::transpose (R00, R01, R02, R03); + R00.store_le (output); +} + +unsigned int res[1]; +unsigned main_state[]{1634760805, 60878, 2036477234, 6, + 0, 825562964, 1471091955, 1346092787, + 506976774, 4197066702, 518848283, 118491664, + 0, 0, 0, 0}; +int +main () +{ + foo (res, main_state); + if (res[0] != 0x41fcef98) + __builtin_abort (); +} -- 2.27.0